Blog 2 — Uber: Scaling Kafka to 1.1 Trillion Messages/Day

Qubits of DPK

March 21, 2026

Core Case Studies

Source: https://www.uber.com/en-IN/blog/engineering/

Core Concept: Event-driven architecture, Message queues at scale, Partitioning strategy

Why SDE-2 Critical: Messaging/event systems appear in every backend architecture — Zomato, Swiggy, Amazon all use Kafka

Status: Draft notes ready

Quick Revision

Problem: How to move trillions of events without tightly coupling services.
Core pattern: Partitioned Kafka topics plus consumer groups.
Interview one-liner: Kafka turns synchronous service chains into durable async pipelines.

️ Architecture Overview

javascript

QUBITS OF DPK

1Producer (App Server)
2     │
3     ▼
4Kafka Broker Cluster
5  ├── Topic: order-events     (partitioned)
6  ├── Topic: payment-events   (partitioned)
7  └── Topic: driver-events    (partitioned)
8     │
9     ▼
10Consumer Groups
11  ├── Notification Service
12  ├── Analytics Service
13  └── Billing Service

Core Concepts

What is Kafka?

Distributed message queue / event streaming platform
Producers write events → Kafka stores them → Consumers read at their own pace
Events are persisted on disk — not lost after consumption unlike traditional queues

Why Uber Needed Kafka

1.1 trillion messages per day = ~12.7 million messages per second
Services are decoupled — payment service doesn't directly call notification service
Handles traffic spikes gracefully — consumers process at their own pace

Partitioning Strategy

javascript

QUBITS OF DPK

1Topic: trip-events
2  ├── Partition 0 → all events for city_id: Mumbai
3  ├── Partition 1 → all events for city_id: Delhi
4  └── Partition 2 → all events for city_id: Bangalore

Same city's events always go to same partition → ordering guaranteed per city
Multiple partitions → parallelism across consumer instances

Consumer Groups

javascript

QUBITS OF DPK

1Notification Consumer Group:
2  Instance 1 → reads Partition 0
3  Instance 2 → reads Partition 1
4  Instance 3 → reads Partition 2
5
6Analytics Consumer Group:
7  Instance 1 → reads ALL partitions independently

Multiple services can consume the SAME event independently
Each group maintains its own offset (position)

Scale Achieved

5 Interview Questions This Blog Unlocks

Q1. Design a real-time notification system for Uber

Answer: Producer (trip service) → Kafka topic (trip-events) → Consumer (notification service) → Push notification. Kafka decouples them — if notification service is slow, trip service is unaffected.

Q2. Why use Kafka instead of direct API calls between services?

Answer: Direct calls = tight coupling. If notification service is down, trip service fails too. Kafka = async buffer. Trip service writes event and moves on. Notification service processes when ready. Resilience + decoupling.

Q3. How does Kafka guarantee message ordering?

Answer: Ordering is guaranteed within a partition only. If you need all events for a user in order → use userId as partition key → all their events go to same partition → ordered.

Q4. What is consumer lag and why does it matter?

Answer: Consumer lag = difference between latest offset in partition and consumer's current offset. High lag = consumer is falling behind. At Uber scale, monitoring lag is critical — it signals a slow consumer before it becomes an outage.

Q5. How does Kafka handle a broker failure?

Answer: Each partition has a leader and replicas on other brokers. If leader fails → one replica is automatically elected as new leader. Producers and consumers connect to new leader. Replication factor of 3 means 2 broker failures are tolerated.

Quick Revision

️ Architecture Overview

Core Concepts

What is Kafka?

Why Uber Needed Kafka

Partitioning Strategy

Consumer Groups

Scale Achieved

5 Interview Questions This Blog Unlocks

Q1. Design a real-time notification system for Uber

Q2. Why use Kafka instead of direct API calls between services?

Q3. How does Kafka guarantee message ordering?

Q4. What is consumer lag and why does it matter?

Q5. How does Kafka handle a broker failure?

Key Engineering Lessons