WebRTC signaling on the IC

WebRTC signaling on the IC

Link to Caffeine App

WebRTC gives browsers a way for two peers to form a direct connection. It does not say how those peers find each other in the first place, how they exchange the setup material that connection requires, how rooms are represented, how access is controlled, or how failures are recorded. That layer — signaling — is left to the application, and most of the time it ends up as a WebSocket service with some in-memory state and a few database tables nearby.

A signaling service is rarely just a pipe. It knows who is waiting, who is allowed into a room, which peer is the host, whether a room is public, which messages have been delivered, and what should happen when someone leaves. In larger systems it accumulates matchmaking, presence, moderation, rate limits, capability checks, and operational logs. The media path may be peer-to-peer, but the coordination layer is a real piece of application infrastructure.

A canister is a useful place for rendezvous state

The basic flow is unsurprising. Peers register, create or join rooms, exchange setup messages, and then communicate directly. The canister participates in rendezvous and connection formation, then steps away.

What it leaves behind is more interesting than the introduction itself. Rooms are not transient objects inside a process; they are canister state, with the same lifecycle, persistence, and inspection properties as any other application data. Access checks can be based on principals or on state owned by other canisters. The frontend can be served from the same network. Calls are authenticated in the same environment as the rest of the application. Inspection, administration, metering, and auditability become ordinary canister interfaces rather than operational side channels bolted onto a separate service.

The hub is a control-plane role

It is tempting to describe peer-to-peer systems as if the ideal architecture has no hub at all. In practice, most useful peer-to-peer applications need some shared reference point. The interesting question is not whether a hub exists, but what kind of hub it is and what it is responsible for.

A canister fits the control plane well. It can maintain room membership, enforce admission rules, sequence signaling messages, record connection attempts, and decide how messages should be fanned out. More importantly, it can represent different coordination shapes as first-class types rather than as emergent behaviour. In this implementation, rooms declare their session type at creation — pairwise, broadcast, or one-way — and the canister enforces the addressing and fan-out rules that go with each.

This matters most for the cases that look least like a video call. Collaborative tools, multiplayer environments, watch parties, agent networks, local-first sync systems, and live coordination surfaces all need a way to form and reshape peer relationships over time. For those workloads, the canister can do something analogous to what an SFU does for media: receive one peer’s signaling message and selectively distribute it to the peers that should see it, without ever touching the media path. The job is not forwarding frames. It is managing the authenticated, stateful coordination required for a peer network to organise itself.

Signaling history becomes part of the application

A gap in many conventional WebRTC deployments is that the encrypted peer connection is well-specified, but the path that gets two peers to that connection is not. The signaling service decides which messages are delivered, who they are attributed to, and what room state they correspond to. If something goes wrong, the evidence is whatever that service happened to log. Those logs are usually structured for operations rather than for the application, and they live outside the trust boundary of the rest of the system.

When signaling is canister-native, the record is part of the application’s own state. Messages carry monotonic IDs issued by a single authority. Connection attempts, successes, and failures are counted in the same place that decided to allow them. Membership changes, host handoffs, and room lifecycle transitions are explicit calls with explicit outcomes. None of this requires a separate observability stack to inspect.

Cost follows coordination, not duration

A conventional signaling service is paid for as standing infrastructure. It may be cheap, but it is always present: process, host, deployment, logs, monitoring, upgrades, and operational responsibility.

Signaling itself is bursty. It is active when peers arrive, form rooms, reconnect, or reconfigure, and quiet at all other times. A canister fits that profile because the cost is tied to calls and state changes; the hub does work when there is coordination to perform.

A public signaling hub still needs bounded mailboxes, message size limits, and lifecycle rules for stale peers and orphaned rooms. The difference is that those constraints can live in the canister interface and state model, rather than as a mixture of WebSocket behaviour, server memory, and external cleanup jobs.

This implementation

The current implementation is a Motoko canister with a React/Vite frontend. The frontend is a test harness rather than a polished product: multiple browser tabs can act as separate peers, rooms can be created and joined, and the admin view exposes the canister’s view of peers, rooms, and message flow.

The canister handles room membership, peer registration, host handoff, and bounded message delivery, with rooms declaring their session type — pairwise, broadcast, or one-way — at creation. Peer identity is scoped per browser tab, so a single Internet Identity principal can drive several peers concurrently without collision. Room state, monotonic message IDs, and connection counters are held as canister state. The implementation is intentionally conservative: explicit peer actions, bounded mailboxes, no hidden scheduler, and no assumption that a browser tab is a permanent actor.

Caffeine App

i tried to add some market research here pointing to why WebRTC is a foundational technology and, by inference, could drive a lot of money to the IC, but the post got held in moderation.

that’s wild to me, when i look at the level of anger and accusation that gets thrown around on this forum. but the moderators are keeping this one under wraps?

should i start a conspiracy in order to get people excited about why fully p2p apps on the IC could be a big thing?

for the record, this project allows complete peer to peer communication across the IC, meaning substantially reduced costs, plus tech like zoom calls…

I did bomberman game via WebRTC.
I do wonder if Mac Studio M4 Max with agents can be connected to do tasks for other apps via WebRTC connection.

i am using webrtc for vr avatar sharing as well as audio and screen sharing. the client is not really an issue, but i was always annoyed that i needed to use websockets and a web2 signaling server to do the peer handshake. this fixes that and it gets me from 99% on-chain to 99.9% - only STUN servers remain…

openAI and gemini already use webrtc for their media streaming. maybe others too. it’s really a web staple at this point. building new channels is relatively easy and this signaling server would save you needing to spin up a heroku (type) server to handle matchmaking and avoid boundary-node dynamics and latency