Video Understanding AI Hackathon · Northeastern · April 2026

What's Missing
From Your Dataset?

The best model in the world can't learn from data that isn't there. We built a tool that finds the gaps.

Built by GAP SEEKERS

Data curation beats
model complexity.

The winning submission in the 2025 Elderly Action Recognition Challenge used a 2019-era model. It won by 5+ points — by choosing better training data.

— CV4Smalls Workshop findings

In high-stakes computer vision domains — elderly care, wildlife monitoring, workplace safety — labeled video data is scarce and expensive to collect.

Researchers collect data blindly. They know what they have, but they have no way to see what they're missing.

A dataset with 500 videos of "normal walking" and zero videos of "person falling" is dangerously imbalanced — but nothing tells you that today.

See what's missing.
Collect what matters.

Load

Load any video dataset into FiftyOne

Embed

Embed every video with Twelve Labs Marengo

Cluster

Group videos by semantic similarity

Describe

Auto-label clusters with Pegasus

Detect Gaps

Find what's sparse, what's missing, what to collect next

All from a single operator click in the FiftyOne App.

Under the hood

Marengo Embeddings

Every video becomes a 512-dimensional vector in a shared multimodal space. Videos that look similar live close together. Videos that don't, live apart.

Smart Clustering + UMAP

KMeans groups your videos. UMAP collapses 512 dimensions into 2 so you can actually see the structure. Outliers light up. Sparse clusters get flagged.

Gap Detection

Tell us what categories you expect. We embed your categories, compare them to what exists, and show you exactly what's missing — with a coverage score.

What you get

═══════════════════════════════════════════════
 VIDEO CONTENT GAP ANALYZER — Coverage Report
═══════════════════════════════════════════════

Dataset: Safe_and_Unsafe_Behaviours (691 videos)
Clusters: 6

Cluster 0: "Workers walking in designated areas"     — 187 samples 
Cluster 1: "Forklift loading operations"              —  23 samples ⚠ sparse
Cluster 2: "Worker at conveyor belt"                  — 145 samples 
Cluster 3: "Unauthorized zone entry"                  —  41 samples 
Cluster 4: "Supervisor inspection walkthrough"        —  12 samples ⚠ sparse
Cluster 5: "Vehicle movement in loading dock"         —  38 samples 

MISSING CATEGORIES:
 "Person falling or tripping"       — max similarity: 0.18 (NO MATCH)
 "Emergency evacuation"             — max similarity: 0.12 (NO MATCH)
 "Fire or smoke detection"          — max similarity: 0.09 (NO MATCH)

Coverage Score: 0%
Recommendation: Collect data for 3 missing categories.
═══════════════════════════════════════════════

Built for domains where data is scarce and stakes are high

Elderly Care

Workplace Safety

Wildlife Conservation

Clinical Monitoring

In these domains, missing a category in your training data means missing real events in production.

Built with

FiftyOne by Voxel51 Twelve Labs Marengo 3.0 Twelve Labs Pegasus 1.2 scikit-learn UMAP Python

Get running in 60 seconds

1

Install

Clone the repo and install as a FiftyOne plugin.

2

Configure

Set your Twelve Labs API key (free tier: 600 min indexing).

3

Run

Launch the demo to see the full pipeline on real data.

# Clone & install
git clone https://github.com/rishimule/video-content-gap-analyzer.git
cd video-content-gap-analyzer
pip install -r requirements.txt
fiftyone plugins create video-content-gap-analyzer --from-dir .

# Set API key
export TWELVELABS_API_KEY="your_key_here"

# Run demo
python demo.py

Meet the Gap Seekers

Rishi Mule

Surabhi Gade