LLMs
From Load Balancers to LLMs
A technical blog by Udbhav Somani exploring the intersection of Distributed Systems, Backend Engineering, and AI Infrastructure.
About
This publication documents a "0 to 1" journey of a Backend Developer into the world of AI at Scale. The content focuses on the architectural "plumbing" of AI—how models are served, scaled, and orchestrated across distributed clusters.
Core Topics
- Agentic AI: Research and implementation of autonomous agent systems.
- Agentic AI at Scale: Orchestrating and scaling agent collaborations in production.
- Distributed Inference: High-performance model serving and low-latency architectures.
- Big Tech Paper Deep Dives: Simplified architectural breakdowns of engineering papers from Uber (Michelangelo), Netflix (Metaflow), Google, and others.
Target Audience
Backend Engineers, System Architects, and AI Infrastructure (MLOps) professionals.
Writing Style
- Systems-first approach.
- Focus on latency, throughput, and reliability over theoretical math.
- Heavy use of architectural diagrams and infrastructure-as-code patterns.
Social & Contact
- Website: https://udbhavsomani.com
- GitHub: https://github.com/udbhavsomani
- LinkedIn: https://linkedin.com/in/udbhavsomani
- X (Twitter): https://x.com/udbhavsomani
Key Links
- /about: The mission statement and roadmap.
- /rss: XML feed for automated content discovery.

