top of page

Upgrading a Data Pipeline for a Fleet of 1,500+ Edge Devices

  • Writer: Jasmine Sandhu
    Jasmine Sandhu
  • Jul 10, 2022
  • 2 min read

Updated: Jan 21

The Situation:

An AI software startup operated a fleet of 1,500+ edge devices generating real-time sensor data. Over time, the data pipeline had evolved into a fragile web of SQL jobs with minimal documentation and growing performance bottlenecks.


The situation became critical when the engineer who originally built and maintained the system left the company.


At that point, the pipeline wasn’t just slow—it was a single point of failure no one fully understood.


The Challenge:

The company needed to:

  • Stabilize a production system already under load

  • Restore confidence in data correctness and availability

  • Enable the pipeline to scale with a growing device fleet

  • Do all of this without disrupting ongoing operations


This required both short-term containment and a longer-term architectural reset.


The Approach:

I worked closely with a software engineer to first make the existing system legible.


That meant:

  • Tracing and documenting the full data flow end-to-end

  • Identifying bottlenecks, failure modes, and unnecessary complexity

  • Fixing high-impact bugs to stabilize the current pipeline


In parallel, I designed and implemented a new architecture optimized for real-time ingestion, transformation, storage, and visualization at scale. The goal was not a like-for-like rewrite, but a simpler system with clear boundaries and fewer moving parts.


We transitioned from the legacy ETL setup to a pipeline built on the Elastic stack, allowing us to migrate incrementally while keeping the business running.


Results:

The new system delivered immediate and measurable improvements:

  • ~90% reduction in pipeline code complexity

  • Real-time processing and visualization of device data

  • Horizontal scalability as the fleet continued to grow

  • Dramatically lower maintenance burden for the engineering team


Just as importantly, the pipeline was now understood. Clear documentation and simpler system design meant the team could reason about behavior, diagnose issues, and extend the system without relying on institutional memory.


Why this matters:

Many data pipelines fail quietly—becoming slower, harder to debug, and increasingly fragile as systems scale. The real risk isn’t performance alone; it’s loss of organizational understanding.


This engagement succeeded because we treated the problem as both a technical and systems ownership failure, addressing root causes rather than layering fixes on top.

_________________________________________________


If your data pipeline has become a bottleneck—or a liability—I’m open to conversations where the goal is long-term stability, not temporary patches.



 
 
 

Comments


Get In Touch

We'll be in touch!

bottom of page