When Production Computer Vision Meets the Real World
- Jul 11, 2022
- 3 min read
Updated: Feb 8
The Situation:
An AI research company acquired a product built around a fleet of edge devices deployed in physical retail environments. Each device — a webcam, a small i3 computer, and a display — showed advertisements while running lightweight computer vision models to detect nearby customers and estimate age and gender.
The system worked, but barely. The models were fast and cheap, but the data they produced was unreliable due to low detection and classification quality. The company wanted to improve model performance and expand product capabilities without exceeding the strict compute and latency constraints of the hardware.
This was a production system already deployed at scale — not a greenfield research project.
The Approach: The first step was to understand the performance tradeoffs of the existing system. Because the original implementation used simple blob detection and a pre-trained classification model, we could evaluate alternatives directly.
We assembled large evaluation datasets for person detection and demographic classification, measuring not just accuracy, but also model size and inference latency — all critical constraints for edge deployment.
As expected, the existing models were small and fast, but accuracy was poor.
The product also needed to evolve. Beyond detecting people and estimating demographics, the system needed to determine whether customers were actually looking at the display. That required face detection and orientation estimation.
To support these capabilities within device constraints, we moved to a multi-stage, multi-task pipeline:
Person detection using a modern detection model
Face detection within the detected person region
Age and gender classification from the detected face
Orientation estimation to infer ad engagement
This pipeline dramatically improved detection and classification quality while still meeting latency and memory constraints on the devices.
Because we now detected people rather than generic motion, we could assign temporary IDs to individuals across frames, allowing the system to cache demographic estimates instead of recomputing them continuously. This reduced compute load and improved data consistency.
The models were deployed across the fleet, and we began receiving significantly higher-quality telemetry from production devices.
When Reality Changed: Shortly after deployment, COVID introduced an unexpected constraint: face masks.
Face detection performance dropped immediately. Most available models had not been trained on occluded faces, which meant downstream age, gender, and orientation estimation also degraded.
This became a data problem.
We collected and curated hundreds of thousands of images of masked faces, labeled them for detection and classification tasks, and retrained the models with this augmented dataset. Performance recovered substantially — not to pre-mask levels, but far beyond the original system.
The result was likely one of the earliest production deployments of face-aware models adapted to widespread mask usage.
Results:
After these changes, the device fleet could:
Detect customers more reliably
Estimate age and gender with significantly improved accuracy
Determine whether customers were looking at advertisements
Assign temporary IDs to track individuals across frames
Continue operating effectively even when faces were partially occluded
The system produced higher-quality demographic and engagement data while remaining within the strict hardware constraints of the edge devices.
Why this matters: Edge AI systems fail differently than cloud systems. You don’t get to scale hardware freely, latency matters immediately, and model quality must improve without increasing operational complexity.
This project succeeded because we treated model performance, hardware constraints, and data quality as a single system — and adapted quickly when the environment itself changed.
Production AI is rarely about finding the perfect model. It’s about building systems that keep working when assumptions break.
________________________________________________
If your AI models work in theory but struggle in production, I’m always open to conversations about stabilizing and scaling them.