My Journey into Scalable WebRTC with Mediasoup

A few days ago, a friend showed me a project — a simple Zoom clone built with WebRTC.
It was impressive at first glance: two users connecting face-to-face in real time.

But when I looked deeper, I realized one thing: it wouldn’t scale.

That sparked an idea. What if I could take the same concept — but rebuild it the way real, production-grade video apps like Zoom or Google Meet actually work?
That’s how Project Hyperion was born: a learning experiment in building scalable real-time systems.


💡 Why Scaling WebRTC Is Hard

WebRTC (the technology behind most video calls) connects users directly — known as peer-to-peer (P2P).
This works fine for two or three people. But with more participants, it quickly collapses.

Imagine 10 people in a call.
Each must upload their video to 9 others, resulting in 81 connections total.
Bandwidth gets crushed. CPUs melt. Everyone lags.

This is known as the N² problem — and it’s why “simple” video apps don’t scale.


📊 Visualizing the Problem

Here’s how the three major real-time architectures compare:

P2P vs MCU vs SFU comparison P2P (mesh) scales poorly, MCU is expensive to run, and SFU offers the best balance between scalability and cost.

That’s why I adopted an SFU (Selective Forwarding Unit) architecture.
Instead of everyone sending video to everyone else, each user uploads one stream to a central server — and the SFU redistributes it to others.
Efficient, simple, and scalable.


⚙️ The Architecture

To make this system both scalable and fault-tolerant, I used a mix of distributed tools:

  • Mediasoup for handling all audio/video streams
  • Kafka to coordinate multiple signaling servers
  • Redis for fast in-memory state management
  • Docker Compose for managing the entire environment
  • Nginx as a load balancer
  • Electron for the client app

Here’s the architecture visualization:

Hyperion System Architecture A horizontally scalable system where multiple Node.js servers handle signaling, backed by Redis and Kafka.


🧠 What I Learned

  • Scaling isn’t just about adding more servers — it’s about decoupling systems.
  • Kafka allows multiple backend servers to “talk” to each other instantly.
  • Redis keeps real-time state synced between them.
  • Mediasoup is powerful, but understanding transports, producers, and consumers is key.
  • Docker made running everything in isolated containers effortless.

🧩 The Stack

PurposeTechnology
ClientElectron (Desktop)
Real-Time MediaWebRTC + Mediasoup
SignalingNode.js + Socket.IO
Message BrokerApache Kafka
Cache / StateRedis
Load BalancerNginx
EnvironmentDocker Compose

🔮 What’s Next

Next up, I plan to test multi-instance scaling — running multiple SFUs and distributing users across them automatically.
After that: adding chat, screen sharing, and perhaps even real-time analytics.

This isn’t about cloning Zoom.
It’s about understanding how Zoom could be built — from the ground up.


🛰️ GitHub Repository:
https://github.com/NeurologiaLogic/scalable-zoom-clone

© 2025 Patrick Kwon. All rights reserved.