13 December 2025

How Trading Apps Handle 1 Million+ Concurrent WebSocket Connections

(Inspired by Zerodha, Upstox, Groww, Robinhood, TD Ameritrade)

Introduction

Modern trading platforms like Zerodha, Upstox, Groww, Robinhood, etc., have one major technical challenge:

How do you stream real-time prices, order book depth, and trade ticks to millions of users simultaneously — with sub-100ms latency and zero downtime?

This is one of the toughest engineering problems in FinTech because market data arrives fast, unpredictably, and at massive scale.

This blog breaks down how real trading apps handle 1M+ concurrent WebSocket connections while staying real-time, consistent, and cost-efficient.


Understanding the Scale

A typical day in Indian markets:

This is real high-throughput + low-latency engineering.


High-Level Architecture Overview

A trading app’s real-time architecture looks like this:

Basic-Trading-Architecture

Let’s break it down.


1. Market Data Ingestion Layer

Trading apps receive raw market feeds from exchanges:

These come in binary formats via leased lines.

To process them, firms use:

Latency budget here is around 5ms.


2. Tick Normalizer + Broadcaster

Since exchanges send raw binary messages, brokers must:

Tools used:

Throughput: millions of ticks per second


3. WebSocket Delivery Architecture

This is the core part.

Why WebSockets?

More scalable than REST, less heavy than MQTT.


4. WebSocket Cluster (Horizontal Scaling)

To handle 1M+ concurrent connections, brokers deploy large clusters:

Typical technology stack:

Layer Tools
Load Balancer NGINX, Envoy, HAProxy, AWS ALB/NLB
WebSocket Server Go, Node.js, Elixir/Phoenix, Java Netty, C++
Pub/Sub Redis, Kafka, NATS, Pulsar
Routing Consistent Hashing, Service Mesh

Companies like Zerodha use Go and Redis, while Upstox and Robinhood heavily use Kafka.


5. Connection Distribution Strategy

To avoid overloading a single node:

Technique 1: Consistent Hashing

Users subscribe to symbols (e.g., NIFTY, RELIANCE). Same symbol streams are always pushed from the same few nodes.

Technique 2: Sharded Streams

Symbols are partitioned into shards:

Shard 1 → Node A
Shard 2 → Node B
Shard 3 → Node C

Technique 3: Sticky Sessions

Load balancer keeps the same client on the same WebSocket server.


6. Backpressure & Throttling

If a user has a slow network, you cannot push 50 ticks/second.

Trading systems use:

Drop old messages (send only the latest)

Because only latest price matters.

Throttle updates

Send only 5–10 messages/second.

Avoid queue buildup per-client

If queue grows, disconnect the client (industry standard).


7. In-Memory Caching for Ticks

To achieve sub-10ms fan-out latency, brokers use in-memory stores:

They maintain:

All updated in real-time via message streams.


8. High Availability & Fault Tolerance

Trading platforms use:

Multi-zone WebSocket clusters

One AZ failing should not drop 500,000 users.

Hot standby nodes

Auto-join cluster when load increases.

Message replay using Kafka

If WebSocket node crashes, it reconnects to Kafka and resumes streaming instantly.

Health-based routing

Faulty node = removed by LB.


9. Handling Market Open Explosion (The 9:15 AM Spike)

At 9:15 AM, traffic increases 20x within 5 seconds.

Techniques:


10. Client-Side Architecture (Mobile/Web)

On mobile:

On web:


11. Cost Optimization at Scale

Running 200 WebSocket servers costs a lot.

Brokers reduce cost using:

Zerodha and Upstox famously use co-located servers near exchanges for speed + cost.


Conclusion

Handling 1 million+ concurrent WebSocket connections is one of the hardest FinTech engineering problems.

It requires:

This is why only a handful of trading apps in India (Zerodha, Upstox, Groww) have mastered this at scale.

Key Citations

tags: Engineering - Backend - Concurrency - Trading

Made with 💻 in India