Saratchandra Patnaik

Backend, Full-Stack & AI Systems Engineer

MS CS · Arizona State UniversityEx-Amagi Media LabsAWS · Kubernetes · Python · GoBackend · Full-Stack · AI Systems

I build AI-native backend systems, RAG applications, agentic workflows, and cloud-native infrastructure using Python, Go, FastAPI, React, Kubernetes, and AWS.

At Amagi Media Labs I owned reliability for 15+ microservices on a global SaaS platform serving broadcasters and OTT providers across 40+ countries — shipping async Python/FastAPI services, LLM-powered observability tooling, and GitOps infrastructure on AWS EKS. MS CS from Arizona State University.

View My Work Download Resume

Experience

Graduate Research Assistant

Arizona State University · Tempe, AZ

Jan 2026 – Present

Software Verification, Validation & Testing

▹Research fault analysis and correctness properties for distributed system components, contributing to active work in software verification, validation, and automated testing.
▹Prepare publication-ready manuscripts through technical synthesis, literature analysis, and structured academic writing in collaboration with faculty.
▹Develop course materials on formal verification techniques and automated testing frameworks for graduate-level instruction.

Software Engineer

Amagi Media Labs · Bengaluru, India

Global SaaS · Cloud-native AI platform for media & entertainment · 40+ countries

Aug 2022 – Nov 2023

AWS EKS · Python · FastAPI · Kubernetes · Docker · ArgoCD · Terraform · Linux

15+Microservices

99.9%Release Stability

60%Faster RCA

90%Setup Reduction

95%Fewer CDN Failures

Built and operated backend services and AI-powered pipelines on a global SaaS platform for broadcast and OTT media delivery — shipping async Python/FastAPI microservices, LLM-driven observability tooling, ML captioning services, and GitOps-based infrastructure across 15+ production services on AWS EKS.

▹Reduced media processing latency by 93.75% — from 8 minutes to 30 seconds — by rewriting the frame and metadata pipeline as an async Python/FastAPI service with CV model integration.
▹Built an AI-powered log analysis service that parsed raw server logs and surfaced root causes automatically, cutting manual debugging time by 60%.
▹Increased deployment frequency by 30% across 15+ microservices by migrating to ArgoCD GitOps on AWS EKS, maintaining 99.9% rollout stability.
▹Hardened the Kubernetes platform with Network Policies for microservice isolation and Terraform-based IP whitelisting for network perimeter security.
▹Led the first production rollout of native speech-to-text captioning services (Capsequo, Akashvani), owning provisioning, feature flags, validation, and team enablement.
▹Resolved a peak-hour live broadcast outage in 40 minutes by diagnosing primary-feed audio silence and coordinating a controlled failover to the healthy secondary stream.

AI-Powered Observability & RCA Automation

Problem: RCA was a bottleneck — engineers repeatedly inspected the same logs to diagnose recurring failures, with each incident costing hours of manual investigation.

Solution: Built an AI-powered observability pipeline that ingests server logs into LLMs for automated semantic pattern analysis, surfacing likely failure causes without manual inspection.

Impact: Reduced manual debugging time by 60%. Also deployed the Capsequo ML captioning pipeline, cutting video stream processing latency by 36%.

Media Workflow Latency Optimization — 8 Minutes to 30 Seconds

Problem: Cue-point generation from video content was a bottleneck. The existing pipeline used simple frame-transition logic (pauses, cuts) rather than computer vision models. Long-form videos (~1 hour) took ~8 minutes per file to process, creating downstream delays across the media pipeline.

Approach: Built an asynchronous Python/FastAPI pipeline that aligned frame analysis and metadata processing into a unified flow. Moved from basic cut detection toward CV-model-based frame understanding. Eliminated unnecessary waiting between pipeline stages to improve throughput for long video assets.

Result: Reduced per-video media workflow latency by 93.75% — from 8 minutes to 30 seconds. Reduced reliance on manually supplied customer cue sheets and accelerated all downstream video operations that depended on cue-point output.

Stream Reliability & Incident Response

Incident: A critical client's live stream fell back to rescue content due to an audio silence condition — a silent failure that required tracing logs across EC2 instances, Kubernetes pods, and multiple microservices to diagnose.

Finding: The provider had primary and secondary input streams but no automated health check or switching mechanism. Audio silence propagated undetected until the playout system gave up and triggered rescue content.

Fix: Implemented threshold-based failover logic that monitors audio presence and switches to the healthy secondary stream when silence persists beyond a defined window — eliminating rescue fallback and hardening the client's stream against input failures.

First Production Deployment in a New GitOps Environment

Scope: Amagi was migrating from a legacy Terraform-based provisioning model to a new GitOps-style deployment environment where YAML configs committed to Git triggered ArgoCD CI/CD pipelines to provision and manage resources. There was no established end-to-end runbook yet — I became the first engineer to deploy a real customer environment in the new model.

Finding: The migration wasn't just a tooling change — it affected backend service rollout, microservice deployment, update handling, and rollback behavior across the platform. Applied the new model to DraftKings, a complex customer with substantial customization, as the real-world production validation.

Fix: Mapped the full new deployment workflow end to end, validated rollout safety and rollback paths via Git-versioned state, and successfully delivered the DraftKings environment in ~1 week — down from the prior typical timeline of ~4 weeks. Standardized the process into a repeatable organizational runbook so future deployments wouldn't depend on tribal knowledge.

Streaming Platform & Protocol Debugging

Scope: Investigated manifest behavior, proxy chain issues, and SCTE marker handling across production streaming services — crossing platform, infrastructure, and client-facing boundaries to isolate live traffic failures.

Codec & Protocol Work: Debugged H.264/H.265 codec configurations and AWS MediaLive settings to resolve client-facing playback errors across WebRTC and WebRTS workflows — reduced playback errors by 95%.

Backend: Built non-blocking FastAPI services for high-concurrency streaming control; integrated StreamDeck hardware for real-time broadcast operator control.

First Customer Rollout of Native Subtitle Generation (Capsequo & Akashvani)

Scope: Amagi built native subtitle generation to eliminate dependency on customers supplying external .srt files. The feature had never been deployed for a real customer. I led the first-ever production rollout across both channel types.

Finding: Two distinct workflows existed — Capsequo (real-time speech-to-text via Google API) for Live channels like sports/news, and Akashvani (serverless, S3-triggered) for FAST channels. No proven deployment path existed end to end.

Fix: Mapped deployment requirements for both workflows, provisioned required components during channel setup, validated feature flags and service dependencies, and ran end-to-end testing through the full subtitle lifecycle. After successful rollout, wrote a full runbook and trained the wider engineering org — converting a first-time deployment into a repeatable organizational standard.

Infrastructure Automation & GitOps

GitOps: Moved deployments to ArgoCD for version-controlled, reproducible releases — 99.9% rollout stability and 30% higher deployment frequency across 15+ services.

IaC: Automated customer environment provisioning with Terraform (VPCs, subnets, CDN) — cut manual setup by 90% and CDN delivery failures by 95%.

Software Engineer Intern

Blueplanet Solutions Inc. · India

Apr 2021 – Jun 2021

MySQL · PHP · JavaScript · Linux

▹Database Optimization: Analyzed MySQL execution plans, refactored queries into optimized stored procedures with proper indexing — cut query execution time by 50%, enabling sub-second retrieval for 6,000+ user profiles.
▹Root Cause Analysis: Traced intermittent memory leaks to unclosed DB connections in legacy PHP by parsing web server logs. Patched connection handling — improved stability by 30% and eliminated recurring crashes.
▹Frontend: Built an async search UI with JavaScript (AJAX) and PHP to replace full-page reloads — improved time-to-result by 60%.

Skills

AI Engineering

RAGMCPAgentic WorkflowsTool CallingChromaDBEmbeddingsEval WorkflowsOpenAI APIAnthropic APIGPT-4 VisionClaude CodeCursor

Backend Engineering

PythonFastAPIGoREST APIsMicroservicesAsync / Non-blocking I/ONode.jsFlaskExpress.js

Cloud & Infrastructure

AWS EKSEC2LambdaS3GreengrassMediaLiveLoad BalancerKubernetesDockerLinuxTerraformAzure

Reliability & DevOps

GitOpsArgoCDGitHub ActionsJenkinsGrafanaShell ScriptingObservabilityOn-call / Incident Ops

Data & Storage

PostgreSQLRedisChromaDBMySQLFirebaseMariaDB

Systems Programming

C++CUDA / PyCUDAMulti-threadingPOSIX IPCUDP / TCPNetwork ProtocolsShared MemoryConcurrency

Languages

Result

Real-time graph-backed streaming pipeline enabling live traversal queries over NYC taxi trip relationships.