Skip to content

SceneSeeker: Distributed Video Retrieval Engine

SceneSeeker is a distributed system that indexes video content by dialogue and enables precise scene retrieval. Users can input a text query (e.g., "Seven! Seven! Seven!"), and the engine returns a frame-perfect video clip of that specific scene.

🚀 Key Features

  • Distributed Architecture: Decouples the API (Go) from the heavy processing (Python) using a message broker.
  • SRT-Based Indexing: Utilizes existing embedded subtitle streams for ultra-fast indexing (0.01x realtime).
  • Full-Text Search: Uses PostgreSQL tsvector for high-performance dialogue matching.
  • Smart Clipping: Performs frame-accurate video cutting and transcoding using FFmpeg.
  • S3-Compatible Storage: Uses MinIO for scalable object storage.

🏗 System Architecture

The system follows a Controller-Worker pattern. The Go backend handles user interactions and state, while stateless Python workers handle media processing.

The Stack

ComponentTechnologyResponsibility
OrchestratorGo (Golang)HTTP API, Job Scheduling, State Management.
WorkerPythonFFmpeg operations, Subtitle parsing (pysrt).
BrokerRedisAsynchronous job queues (queue:ingest, queue:clip).
DatabasePostgreSQLRelational data & Full Text Search (tsvector).
StorageMinIOS3-compatible object storage for raw videos and clips.

Visual Workflow

mermaid
sequenceDiagram
    participant User
    participant Go as Go Orchestrator
    participant Redis
    participant Py as Python Worker
    participant S3 as MinIO
    participant DB as Postgres

    Note over Go, Py: Flow 1: Ingestion
    User->>Go: Upload Video (Stream)
    Go->>S3: Stream to Bucket
    Go->>Redis: Push Job (Ingest)
    Redis->>Py: Pop Job
    Py->>S3: Read Header/Stream
    Py->>Py: Extract SRT (FFmpeg)
    Py->>DB: Index Subtitles
    
    Note over Go, Py: Flow 2: Retrieval
    User->>Go: Search "Seven!"
    Go->>DB: FTS Query
    DB-->>Go: Hit (VideoID, StartTime)
    Go->>Redis: Push Job (Clip)
    Redis->>Py: Pop Job
    Py->>S3: Seek & Transcode (FFmpeg)
    Py->>S3: Upload Clip
    Py-->>Go: Job Complete
    Go->>User: Return Clip URL

💾 Database Schema

We use PostgreSQL for its robust text search capabilities.

videos

Metadata for the raw files.

sql
CREATE TABLE videos (
    id SERIAL PRIMARY KEY,
    filename TEXT NOT NULL,
    s3_path TEXT NOT NULL,
    duration FLOAT,
    status VARCHAR(20) DEFAULT 'PENDING' -- PENDING, INDEXED, FAILED
);

segments

The searchable dialogue chunks.

sql
CREATE TABLE segments (
    id SERIAL PRIMARY KEY,
    video_id INT REFERENCES videos(id),
    start_time FLOAT NOT NULL,
    end_time FLOAT NOT NULL,
    content TEXT NOT NULL,
    -- The Magic: Pre-computed search vector
    content_vector TSVECTOR GENERATED ALWAYS AS (to_tsvector('english', content)) STORED
);

-- Index for lightning-fast text search
CREATE INDEX idx_segments_content ON segments USING GIN(content_vector);

🛠 Project Structure

bash
.
├── docker-compose.yml      # Orchestrates Go, Python, Redis, Postgres, MinIO
├── Makefile                # Build scripts and Proto generation
├── api/                    # The Go Orchestrator
   ├── cmd/main.go         # Entry point
   ├── internal/
   ├── handlers/       # HTTP Controllers
   ├── queue/          # Redis Producer logic
   └── db/             # Postgres repositories
├── worker/                 # The Python Processor
   ├── src/
   ├── main.py         # Worker Loop (Redis Consumer)
   ├── extractor.py    # Subtitle extraction logic
   └── clipper.py      # FFmpeg wrapping logic
   ├── Dockerfile
   └── requirements.txt
└── protobuf/               # Shared Protocol Buffers (if using gRPC for status)

⚡️ Ingestion Logic (Python)

Instead of using heavy AI models (Whisper), we extract embedded tracks. This makes ingestion extremely lightweight.

  1. Probe: Check video for codec_type='subtitle'.
  2. Extract: Run ffmpeg -i video.mkv -map 0:s:0 subs.srt.
  3. Parse: Use pysrt to parse timestamps.
  4. Index: Bulk insert into Postgres.

✂️ Clipping Logic (FFmpeg)

To ensure clips are playable and frame-accurate, we re-encode the specific segment rather than stream-copying (which requires I-frames).

python
# The logic inside worker/src/clipper.py
ffmpeg.input(s3_url, ss=start_time)
      .output("clip.mp4", t=duration, vcodec="libx264", acodec="aac")
      .run()

🚀 Getting Started

  1. Start Infrastructure:

    bash
    docker-compose up -d postgres redis minio
  2. Run Migrations:

    bash
    # (Assuming migrate tool is installed)
    migrate -path ./migrations -database "postgres://user:pass@localhost:5432/sceneseeker" up
  3. Start Services:

    bash
    docker-compose up --build