When Production Computer Vision Meets the Real World

Jul 11, 2022
3 min read

The Situation:

An AI research company acquired a product built around a fleet of edge devices deployed in physical retail environments. Each device (a webcam, a small i3 computer, and a display) showed advertisements while running lightweight computer vision models to detect nearby customers and estimate age and gender.

The system worked, but barely. The models were fast and cheap, but the data they produced was unreliable due to low detection and classification quality. The company wanted to improve model performance and expand product capabilities without exceeding the strict compute and latency budget of the hardware.

This was a production system already deployed at scale, not a greenfield research project.

The Approach: The first step was to understand the performance tradeoffs of the existing system. Because the original implementation used simple blob detection and a pre-trained classification model, we could evaluate alternatives directly.

We assembled large evaluation datasets for person detection and demographic classification, measuring not just accuracy but also model size and inference latency, all of which matter for edge deployment.

As expected, the existing models were small and fast, but accuracy was poor.

The product also needed to evolve. Beyond detecting people and estimating demographics, the system needed to determine whether customers were actually looking at the display. That required face detection and orientation estimation.

To support these capabilities within the device's limits, we moved to a multi-stage, multi-task pipeline:

Person detection using a modern model
Face detection within the detected person region
Age and gender classification from the detected face
Orientation estimation to infer ad engagement

This pipeline improved detection and classification quality while still meeting latency and memory constraints on the devices.

Because we now detected people rather than generic motion, we could assign temporary IDs to individuals across frames, allowing the system to cache demographic estimates instead of recomputing them continuously. This reduced compute load and improved data consistency.

The models were deployed across the fleet, and we began receiving higher-quality telemetry from production devices.

When Reality Changed: Shortly after deployment, COVID introduced an unexpected constraint: face masks.

Face detection performance dropped immediately. No available models had not been trained on occluded faces, which meant downstream age, gender, and orientation estimation also degraded.

This became a data problem.

We collected and curated hundreds of thousands of images of masked faces, labeled them for detection and classification tasks, and retrained the models with this augmented dataset. Performance recovered, not to pre-mask levels, but far beyond the original system.

The result was likely one of the earliest production deployments of face-aware models adapted to widespread mask usage.

Results:

After these changes, the device fleet could:

Detect customers more reliably
Estimate age and gender more accurately
Determine whether customers were looking at advertisements
Assign temporary IDs to track individuals across frames
Continue operating effectively even when faces were partially occluded

The system produced higher-quality demographic and engagement data while remaining within the strict hardware constraints of the edge devices.

Why this matters: Edge AI systems fail differently than cloud systems. You don't get to scale hardware freely, and latency matters immediately.

This project succeeded because we treated model performance, hardware constraints, and data quality as a single system, and adapted quickly when the environment itself changed.

Production AI is rarely about finding the perfect model, but instead about building systems that keep working when assumptions break.

________________________________________________

If your AI models work in theory but struggle in production, I'm open to conversations about stabilizing and scaling them.

Start a Conversation

Get In Touch