My Journey into Scalable WebRTC with Mediasoup

A few days ago, a friend showed me a project — a simple Zoom clone built with WebRTC.
It was impressive at first glance: two users connecting face-to-face in real time.

But when I looked deeper, I realized one thing: it wouldn’t scale.

That sparked an idea. What if I could take the same concept — but rebuild it the way real, production-grade video apps like Zoom or Google Meet actually work?
That’s how Project Hyperion was born: a learning experiment in building scalable real-time systems.

💡 Why Scaling WebRTC Is Hard

WebRTC (the technology behind most video calls) connects users directly — known as peer-to-peer (P2P).
This works fine for two or three people. But with more participants, it quickly collapses.

Imagine 10 people in a call.
Each must upload their video to 9 others, resulting in 81 connections total.
Bandwidth gets crushed. CPUs melt. Everyone lags.

This is known as the N² problem — and it’s why “simple” video apps don’t scale.

📊 Visualizing the Problem

Here’s how the three major real-time architectures compare:

P2P vs MCU vs SFU comparison P2P (mesh) scales poorly, MCU is expensive to run, and SFU offers the best balance between scalability and cost.

That’s why I adopted an SFU (Selective Forwarding Unit) architecture.
Instead of everyone sending video to everyone else, each user uploads one stream to a central server — and the SFU redistributes it to others.
Efficient, simple, and scalable.

⚙️ The Architecture

To make this system both scalable and fault-tolerant, I used a mix of distributed tools:

Mediasoup for handling all audio/video streams
Kafka to coordinate multiple signaling servers
Redis for fast in-memory state management
Docker Compose for managing the entire environment
Nginx as a load balancer
Electron for the client app

Here’s the architecture visualization:

Hyperion System Architecture A horizontally scalable system where multiple Node.js servers handle signaling, backed by Redis and Kafka.

🧠 What I Learned

Scaling isn’t just about adding more servers — it’s about decoupling systems.
Kafka allows multiple backend servers to “talk” to each other instantly.
Redis keeps real-time state synced between them.
Mediasoup is powerful, but understanding transports, producers, and consumers is key.
Docker made running everything in isolated containers effortless.

🧩 The Stack

Purpose	Technology
Client	Electron (Desktop)
Real-Time Media	WebRTC + Mediasoup
Signaling	Node.js + Socket.IO
Message Broker	Apache Kafka
Cache / State	Redis
Load Balancer	Nginx
Environment	Docker Compose

🔮 What’s Next

Next up, I plan to test multi-instance scaling — running multiple SFUs and distributing users across them automatically.
After that: adding chat, screen sharing, and perhaps even real-time analytics.

This isn’t about cloning Zoom.
It’s about understanding how Zoom could be built — from the ground up.

🛰️ GitHub Repository:
https://github.com/NeurologiaLogic/scalable-zoom-clone