Saratchandra Patnaik

Backend & Distributed Systems Engineer

MS CS · Arizona State UniversityEx-Amagi Media LabsAWS · Kubernetes · Python · GoBackend · Reliability · Distributed Systems

I specialize in backend engineering, cloud infrastructure, and systems reliability — with production depth in distributed streaming systems, concurrent programming in C++, edge ML inference, and AI-driven observability tooling.

At Amagi Media Labs I owned reliability for 15+ microservices serving live broadcast clients on AWS EKS. I hold an MS in Computer Science from Arizona State University and currently research fault analysis and software verification in distributed systems.

Saratchandra Patnaik — Backend & Distributed Systems Engineer

Experience

Graduate Research Assistant

Arizona State University · Tempe, AZ

Jan 2026 – Present

Software Verification, Validation & Testing

  • Research fault analysis and correctness properties for distributed system components, contributing to active work in software verification, validation, and automated testing.
  • Prepare publication-ready manuscripts through technical synthesis, literature analysis, and structured academic writing in collaboration with faculty.
  • Develop course materials on formal verification techniques and automated testing frameworks for graduate-level instruction.

Software Implementation Engineer

Amagi Media Labs · Bengaluru, India

Aug 2022 – Nov 2023

AWS EKS · Python · FastAPI · Kubernetes · Docker · ArgoCD · Terraform · Linux

15+Microservices
99.9%Release Stability
60%Faster RCA
90%Setup Reduction
95%Fewer CDN Failures

Owned reliability and operations for large-scale broadcast streaming infrastructure on AWS EKS — debugging live failures, building failover logic, and running the automation layer that kept 15+ production microservices stable for major media clients.

  • Optimized media workflow latency by 93.75%, from 8 minutes to 30 seconds, by building asynchronous Python/FastAPI frame and metadata pipelines with CV models.
  • Reduced manual debugging time by 60% by designing an AI-powered observability pipeline that ingested server logs into LLMs to automate root cause analysis.
  • Led the first successful deployment of Amagi-native ML speech-to-text services, Capsequo and Akashvani, and enabled org-wide adoption by training teams across the organization on deployment workflows.
  • Improved deployment frequency by 30% for a Kubernetes platform hosting 15+ microservices by owning AWS EKS releases, implementing ArgoCD GitOps, and maintaining 99.9% rollout stability.

Stream Reliability & Incident Response

Incident: A critical client's live stream fell back to rescue content due to an audio silence condition — a silent failure that required tracing logs across EC2 instances, Kubernetes pods, and multiple microservices to diagnose.

Finding: The provider had primary and secondary input streams but no automated health check or switching mechanism. Audio silence propagated undetected until the playout system gave up and triggered rescue content.

Fix: Implemented threshold-based failover logic that monitors audio presence and switches to the healthy secondary stream when silence persists beyond a defined window — eliminating rescue fallback and hardening the client's stream against input failures.

Software Engineer Intern

Blueplanet Solutions Inc. · India

Apr 2021 – Jun 2021

MySQL · PHP · JavaScript · Linux

  • Database Optimization: Analyzed MySQL execution plans, refactored queries into optimized stored procedures with proper indexing — cut query execution time by 50%, enabling sub-second retrieval for 1,000+ user profiles.
  • Root Cause Analysis: Traced intermittent memory leaks to unclosed DB connections in legacy PHP by parsing web server logs. Patched connection handling — improved stability by 30% and eliminated recurring crashes.
  • Frontend: Built an async search UI with JavaScript (AJAX) and PHP to replace full-page reloads — improved time-to-result by 60%.

Skills

Cloud & Infrastructure

AWS EKSEC2LambdaS3GreengrassMediaLiveLoad BalancerKubernetesDockerLinuxTerraformAzure

Backend Engineering

PythonFastAPIGoREST APIsMicroservicesAsync / Non-blocking I/ONode.jsFlaskExpress.js

Reliability & DevOps

GitOpsArgoCDGitHub ActionsJenkinsGrafanaShell ScriptingObservabilityOn-call / Incident Ops

Systems Programming

C++CUDA / PyCUDAMulti-threadingPOSIX IPCUDP / TCPNetwork ProtocolsShared MemoryConcurrency

AI & ML

LLMs (GPT-4, Claude)RAGLangChainComputer VisionPyTorchTensorFlowScikit-learnOpenAI APIAnthropic API

Data & Storage

PostgreSQLRedisChromaDBMySQLFirebaseMariaDB

Languages

PythonC++GoTypeScriptJavaSQLKotlinC

Projects

SYSTEMS / C++

Multithreaded UDP Packet Processing Server

C++UDPPOSIX IPCMulti-threadingShared Memory

Problem: Build a telecom-grade server that receives encrypted binary UDP packets at high throughput and processes them without dropping or reordering under concurrent load.

Approach: Producer-consumer architecture with a thread pool and mutex-locked queues. Producer threads read raw UDP datagrams off the socket; consumers decrypt and process binary payloads in parallel. Statistics are published via POSIX shared memory IPC for external monitoring.

Result: High packet ingestion at sustained throughput with ordered processing guarantees and zero data loss under concurrent load.

View on GitHub
EDGE / CLOUD

AWS IoT Greengrass Edge Face Recognition

AWS IoT GreengrassLambdaEC2MQTTSQSPyTorchFaceNetMTCNN

Problem: Run real-time face recognition on an edge device with no pip access — cloud-based inference introduced too much latency and the Greengrass Lambda environment had no package manager.

Approach: Packaged raw PyTorch MTCNN and FaceNet models into a custom deployment bundle that runs in a pip-free Lambda environment on Greengrass. Edge inference events stream asynchronously to the cloud via MQTT and SQS, fully decoupling edge processing from cloud consumption.

Result: Sub-second face recognition at the edge with a cloud-synchronized event stream and no internet dependency at inference time.

Private repository
PERFORMANCE / GPU

CUDA GPU Accelerated Image Processing

CUDAPyCUDAC++Python

Problem: CPU-based 2D Gaussian filtering was the bottleneck for image processing workloads — single-threaded and unable to scale with image resolution.

Approach: Parallel CUDA kernel with shared memory tiling to eliminate redundant global memory reads, loop unrolling to maximize instruction throughput, and grid-stride loops to handle arbitrary image sizes. Validated output fidelity against CPU reference using PSNR and SSIM.

Result: 20× speedup over CPU — full-resolution frames processed in milliseconds instead of seconds.

View on GitHub
AI / ML

Personal AI Agent — RAG System

PythonFastAPIReactChromaDBOpenAI APIAnthropic APIGPT-4 Vision

Problem: Build a composable AI assistant for vehicle diagnostics, ATS resume optimization, and document Q&A — without locking the system to a single LLM provider.

Approach: Clean Architecture to decouple domain logic from model providers (OpenAI, Anthropic). ChromaDB handles vector retrieval for document grounding; GPT-4 Vision processes image inputs for diagnostics. Each capability is an isolated use case wired through a shared retrieval layer. React frontend, FastAPI backend.

Result: Fully swappable model backends with consistent retrieval quality across task types and a usable dashboard for diagnostics and resume analysis.

View on GitHub