Experience Flow Software Tech Pvt LtdApr 2021 — Nov 2023

Legacy-to-cloud data modernization

Modernized fragmented legacy pipelines into a cloud-oriented data platform without breaking reporting during the transition.

Legacy modernizationAirflow orchestrationBatch + streaming interoperability

Business problem

Legacy services and fragmented pipelines slowed analytics delivery, made reliability difficult to scale, and increased migration risk whenever the team tried to modernize a critical workflow.

Platform scope

Legacy services, scheduled pipelines, and event-driven workloads

Cadence

Backfills, SLA-driven jobs, and near-real-time processing

Consumers

Internal analytics teams and business reporting workflows

Thinking model

  • Modernize orchestration and platform in parallel to avoid migration deadlock.
  • Improve reliability first, then optimize throughput.
  • Keep real-time and batch flows interoperable so the platform does not fork into separate systems.

Constraints

  • Modernization could not break business reporting, so the cutover strategy had to support hybrid legacy and cloud paths.
  • Operational complexity had to be reduced even while the platform itself was in transition.

Architecture

Ingest

Source systems + event streams

Kafka + Python ingest services

Storage

ADLS Gen2 + Snowflake staging

Process

Airflow orchestration + Spark/dbt

Serve

Analytics serving

Ops

SLA + dependency monitoring

Operational guardrails

Backfill control

Replay and catch-up workflows were treated as first-class production operations.

Dependency safety

Scheduling logic prevented downstream jobs from running on incomplete inputs.

Migration validation

Legacy and modern outputs were compared during cutover to protect reporting continuity.

SLA monitoring

Operational alerting centered on freshness breaches and dependency failures.

Flow checkpoints

ingestionSource systems + event streamsKafka + Python ingest serviceslanding + stagingKafka + Python ingest servicesADLS Gen2 + Snowflake stagingscheduled + event jobsADLS Gen2 + Snowflake stagingAirflow orchestration + Spark/dbtmodeled outputsAirflow orchestration + Spark/dbtAnalytics servingruntime signalsAirflow orchestration + Spark/dbtSLA + dependency monitoring

Design note

The migration path kept business reporting available while internal services were modernized in parallel.

Design note

Orchestration controls stayed explicit for backfills, SLAs, and dependency safety so the new platform was operable from day one.

Delivery

Platform work

  • Built ingestion and processing pipelines with Python, Kafka, MySQL, and Elasticsearch.
  • Orchestrated production workflows in Airflow with backfills, SLAs, and dependency management.
  • Modernized legacy services from Flask to FastAPI and expanded real-time processing with Kafka, Redis, and Spark.

Quality controls

  • Dependency-aware scheduling to prevent incomplete downstream runs.
  • Migration-era validation checks between legacy and modernized outputs.

Observability

  • Operational alerts centered on SLA breaches and pipeline dependency failures.
  • Run-level visibility for backfill and replay operations.

Impact

Migration safety

Legacy and modern flows ran in parallel during cutover so reporting continuity was protected.

Operational resilience

Airflow-based backfill, SLA, and dependency controls formalized production operations.

Modern platform path

Event-driven services and modern APIs made batch and real-time processing interoperable.

Tradeoffs

  • Ran hybrid legacy and modern paths during migration to reduce cutover risk.
  • Accepted temporary operational complexity to keep business reporting stable throughout the transition.

Confidentiality note

  • Internal system names and exact dataset shapes are generalized for confidentiality.

Work with me

Planning a legacy-to-cloud migration?

I help teams modernize orchestration, cutover safely, and reduce the operational drag that keeps migrations half-finished.

Start the modernization