Live streaming at world-record scale with Ashutosh Agrawal

The Pragmatic Engineer

0:00

-1:02:12

Live streaming at world-record scale with Ashutosh Agrawal

In May 2023, a live streaming world record was set with 32 million concurrent viewers watching the finale of the Indian Premier League (a cricket game). A chat with the architect behind this system

Gergely Orosz

Feb 12, 2025

Transcript

Stream the Latest Episode

Available now on YouTube, Apple and Spotify. See the episode transcript at the top of this page, and a summary at the bottom.

Brought to You By

• WorkOS — The modern identity platform for B2B SaaS

• CodeRabbit — Cut code review time and bugs in half. Use the code PRAGMATIC to get one month free.

• Augment Code — AI coding assistant that pro engineering teams love

—

In This Episode

How do you architect a live streaming system to deal with more load than any similar system has dealt with before? Today, we hear from an architect of such a system: Ashutosh Agrawal, formerly Chief Architect of JioCinema (and currently Staff Software Engineer at Google DeepMind.) In May 2023, JioCinema set the live-streaming world record, serving 32 million concurrent viewers tuning in to the finale of Indian Premier League (a cricket game.)

We take a deep dive into video streaming architecture, tackling the complexities of live streaming at scale (at tens of millions of parallel streams) and the challenges engineers face in delivering seamless experiences. We talk about the following topics:

• How large-scale live streaming architectures are designed

• Tradeoffs in optimizing performance

• Early warning signs of streaming failures and how to detect them

• Why capacity planning for streaming is SO difficult

• The technical hurdles of streaming in APAC regions

• Why Ashutosh hates APMs (Application Performance Management systems)

• Ashutosh’s advice for those looking to improve their systems design expertise

• And much more!

Takeaways

My biggest takeaways from this episode:

1. The architecture behind live streaming systems is surprisingly logical. In the episode, Ashutosh explains how the live streaming system works, starting from the physical cameras on-site, through the production control room (PCR), streams being sliced-and-diced, and the HLS protocol (HTTP Live Streaming) used.

2. There are a LOT of tradeoffs you can play with when live streaming! The tradeoffs between server load, latency, server resources vs client caching are hard decisions to make. Want to reduce the server load? Serve longer chunks to clients, resulting in fewer requests per minute, per client… at the expense of clients potentially lagging more behind. This is just one of many possible decisions to make.

3. At massive video streaming scale, capacity planning can start a year ahead! It was surprising to hear how Ashutosh had to convince with telecoms and data centers to invest more in their server infrastructure, so they can handle the load, come peak viewership months later. This kind of challenge will be nonexistent for most of us engineers/ Still, it’s interesting to consider that when you are serving a scale that’s not been done before, you need to worry about the underlying infra!

4. “Game day” is such a neat load testing concept. The team at Jio would simulate “game day” load months before the event. They did tell teams when the load test will start: but did not share anything else! Preparing for a “Game day” test is a lot of work, but it can pay off to find parts of the system that shutter under extreme load.

The Pragmatic Engineer deepdives relevant for this episode

• Software architect archetypes

• Engineering leadership skill set overlaps

• Software architecture with Grady Booch

Timestamps

(00:00) Intro

(01:28) The world record-breaking live stream and how support works with live events

(05:57) An overview of streaming architecture

(21:48) The differences between internet streaming and traditional television.l

(22:26) How adaptive bitrate streaming works

(25:30) How throttling works on the mobile tower side

(27:46) Leading indicators of streaming problems and the data visualization needed

(31:03) How metrics are set

(33:38) Best practices for capacity planning

(35:50) Which resources are planned for in capacity planning

(37:10) How streaming services plan for future live events with vendors

(41:01) APAC specific challenges

(44:48) Horizontal scaling vs. vertical scaling

(46:10) Why auto-scaling doesn’t work

(47:30) Concurrency: the golden metric to scale against

(48:17) User journeys that cause problems

(49:59) Recommendations for learning more about video streaming

(51:11) How Ashutosh learned on the job

(55:21) Advice for engineers who would like to get better at systems

(1:00:10) Rapid fire round

A summary of the conversation

The Live Streaming Pipeline

The journey of a live stream starts with the cameras at the event’s venue. These cameras are connected by fiber to a Production Control Room (PCR).
- In the PCR, a director selects which camera feeds to use, much like in a movie production.
Source feed (or production feed) is then sent to a contribution encoder. This encoder compresses the high-bandwidth source feed to a more manageable size.
- The compressed feed is then transmitted to the cloud using a private peer-to-peer link.
Distribution encoder: prepares the stream in various formats for end-user consumption, such as HLS and DASH.
- Over 100 stream variants can be generated for various devices – and up to 500 (!) when different languages are included.
Orchestrator: the one managing the pipeline, from the contribution encoding to the cloud infrastructure. The orchestrator decides which endpoints to push to, the configuration of the distribution encoder, and the CDN endpoints.
- Playback URLs: generated by the orchestrator. URLs are specific to the device and format being used.
When a user clicks play, a separate playback system takes over. This system verifies user authorization, deals with encryption, and handles Digital Rights Management (DRM). The playback system then provides the client app with an encrypted URL to stream the content.
Live streaming systems are more complex than Video on Demand (VOD) systems because of the need to manage multiple real-time streams and user authentication and authorization for those streams, all while keeping latency low.

Content Delivery

Content delivery relies on Content Delivery Networks (CDNs).
The core technology used is HLS or DASH, where the video is broken down into segments.
HLS uses a master manifest file (e.g., master.m3u8) that lists different video quality levels. Each quality level refers to a child manifest.
- Child manifests list individual video segments. These segments are typically between four to six seconds long.
- The client player requests a child manifest every segment duration and the segments that it lists.
CDN: works at the segment level rather than at a millisecond level.
- Correctly setting up CDN configurations, such as the Time To Live (TTL) values for the cached segments, is crucial to ensure a smooth stream without stale data.
Latency is introduced at various stages of the live-streaming process. This includes encoding, network transmission, and client-side buffering.
Encoding techniques: using a look-back period, or Group of Pictures (GOP) are used to achieve more efficient compression. The GOP might be 1, ,2 or 4 seconds long.
Client-side buffering is used to give a smoother streaming experience, even if there are small network issues. This means the user might be watching the stream a few seconds behind the real-time live point.
There are trade-offs between latency, smooth playbac,k and infrastructure demands. Reducing the segment duration increases calls to the CDN, impacting infrastructure needs.
Adaptive bitrate streaming is used to adjust the video quality in response to the user's network conditions.
- The client-side player measures the download speed and chooses an appropriate video quality level, matching the user's network capacity.
- If the network speed slows down, the client can switch to a lower-quality video (e.g., from 720p to 240p).
- The server can also degrade the user's stream by limiting the number of available video quality options, for example during very high load. The server can also adjust the segment length in response to system load.
The client player is always starting playback a few seconds behind the live point, to avoid any interruption in playback if a segment is missed.
If a segment is missed on a TV, the TV will continue playing at the live point. However, on the internet, the client is using a buffer and will try to avoid missing a segment.

Monitoring, Metrics, and Scaling

Monitoring is based on leading and trailing indicators.
- Leading indicators help to identify potential problems in realtime. Examples include buffer time and playback failure rates. These leading indicator metrics are given priority in the system.
- Trailing indicators are used to perform a detailed analysis of issues after they occur.
- Client-side metrics are collected and quickly processed by the server in less than a minute or sometimes within 30 seconds.
- Server-side metrics, such as bandwidth, the number of requests, and latency, are also tracked.
- The frequency of data collection is adjusted based on the system load. When there is higher traffic, the amount of data collected is sampled to manage the volume of data collected and processed.
Capacity planning is a very complex process involving infrastructure, network, and power, and is started at the end of the prior year, for the next year.
- Capacity planning involves coordination with several infra providers to make sure they can scale their infrastructure for the events.
- The planning focuses on metrics such as compute, RAM, disk, and network usage. The main metric that becomes the limiting factor is vCPUs.
Cloud resources are not infinite at the scale required for major live events. There is a finite amount of resources in a given location – at this scale of streaming, that is!
- Providers need to purchase real estate, install links, and deploy servers.
- Horizontal scaling is preferred for compute resources as it is easy to add boxes to the pool.
- Databases and caches are scaled preemptively to avoid the need to scale them on the fly during events.
Auto-scaling is not effective for live events because it is too slow to respond to the rapid changes in traffic. Custom scaling systems are preferred.
- The custom scaling system uses a concurrency metric, that is the number of users watching the stream, to scale services. All systems are scaled against a common concurrency metric.
- The custom scaler also looks at user journeys, such as when users press the back button and return to the home page. This can cause a spike in traffic to the home page API.

APAC-specific live streaming challenges

Mobility is a significant challenge because most users in India watch live streams on mobile devices and are often on the move. This means that users are constantly switching between cell towers.
Battery consumption is also a key factor. Video streaming can quickly drain mobile phone batteries.
- The video profile, polling frequency, and encoding algorithms are often chosen to reduce battery consumption.
“Game day simulation”: something Jio did to simulate peak load conditions.
- Involved in generating synthetic traffic and the teams needed to scale systems and follow operational protocols in response.
- The teams did not have access to the traffic dashboard, so the traffic patterns were unknown to the teams.

Advice for engineers to become better architects

Understand this: anything that can fail, will fail. Overconfidence in systems can lead to problems! Most people underestimate or overlook the potential failure modes.
- Look at every aspect of your system including configurations and code as even the smallest things can cause problems.
Detailed metrics and measurements are vital. Both to see potential problems and to be able to debug effectively.
- Ensure you are measuring metrics correctly. For example, response time should be measured from when the request is queued, not when it enters the processing function.
Do not rely too heavily on APMs. It is better to understand the low-level details and measure and fine-tune every aspect of your code.
To learn more about video encoding: look up documentation on GitHub and online. Look for resources going into how image compression is done, and how images are turned into video.
Most of the learning happens on the job. There isn't a lot of public information about problems at this kind of scale! Hopefully, this podcast was helpful in sharing more details!