With WebRTC, live streaming can be played like this!

With WebRTC, live streaming can be played like this!

How to realize a real-time video call between two people? You may first think of the live broadcast method: Streaming -> Pushing -> Pulling. However, if you put this process on the front end, you can only see it on the rooftop. However, the emergence of WebRTC has reversed this status quo.

With the help of WebRTC, the front end does not need to pay attention to the process of "collecting stream -> pushing stream -> pulling stream", and can easily realize live broadcast and even real-time audio and video calls.

What is WebRTC? What is the underlying principle? how to use? Let the BBQ brother tell you!

What is WebRTC?

The full name of WebRTC is Web Real-time Communication, web instant communication technology. It is a real-time communication solution initiated by Google. It is called a solution, not a protocol, because it covers a complete set of implementation solutions such as audio and video acquisition, communication establishment, information transmission, and audio and video display. The launch of the program makes it possible to quickly realize an audio and video communication application.

If you are a web developer, you can easily realize the collection and playback of audio and video through the WebRTC API provided by the browser, realize the establishment of end-to-end communication channels, and realize the sharing of audio and video data through the established channels.

Although WebRTC seems to be a browser gadget, due to Google's open source spirit, it can achieve full platform interoperability by compiling C++ code. So, if you want to remotely control a windows computer through the web, you can let your C++ children receive a wave of WebRTC, and WebRTC also supports real-time desktop collection!

How does WebRTC realize end-to-end audio and video sharing?

Traditional resource sharing is more exchanged through a transit server. Upload the resources that the other party wants to the fixed public network server in advance, and then access it through the address. The good advantage of this form is that the reliability is very strong, because the resource server is fixed and will not be affected by the network of the transmitting party and the user, and it is very flexible and reliable!

However, the real-time performance is very poor. You need to wait until the other party finishes uploading the file before you can download the file. Of course, this process can also be real-time, and a file stream can be built on the server to transfer the file stream that reaches the server to the puller immediately. So troublesome, why not omit this server?

P2P connection

The full name of P2P is peer to perr, and the academic name is peer-to-peer network. It is a network technology and network topology. A device that has established a P2P connection can achieve a one-to-one information transmission and exchange without being forwarded by a third-party service.

So how to create a P2P connection? Here we first look at how the real Internet world communicates like this!

The real network world

The network in the real world looks like this:

Because most of the Internet Protocol (IP protocol) versions currently deployed on the Internet are IPV4, but IPV4 uses a 32-bit binary address, which can generate 4.3 billion IP addresses. If each user terminal is connected with an independent IP address Access to the Internet, then these 4.3 billion addresses will not be enough.

Therefore, the current network structure is basically that multi-device terminals access the Internet through one or more layers of NAT agents, that is, a local area network.

What is NAT?

NAT: Network address translation, it is a technology that solves the problem of connecting devices in a private network to the public network.

So what is the working principle of NAT?

  • When device A wants to send a request to the server =>, the request will first reach the NAT, and the NAT will modify the source address and source port of the message and the corresponding check code, and then Sent to the server, a mapping relationship is formed at this time: => => copy the code
  • When the server has processed the request and the response data is returned to the NAT, the NAT modifies the destination address, destination port, and corresponding check code according to the mapping relationship, and then sends it to device A: => => copy the code

This is the principle feature of NAT that enables internal network devices to access public network servers normally.

Devices in the intranet can use NAT to access the public network server, but in a P2P connection, two devices may be in different intranets. How do these two devices make a P2P connection?

NAT penetration technology

The technical solution to enable two devices in two intranets to establish a P2P connection is collectively referred to as NAT penetration technology. Generally speaking, P2P can establish a UDP connection or a TCP connection, so this mechanism is also called UDP hole punching or TCP hole punching . Because the transport layer protocol used by webRTC is UDP, I will mainly explain the principle of UDP hole punching here .

  • The first step is to add a messenger server. The function of the messenger server is to discover and record the port mapped by the internal network device in the NAT and the public IP of the NAT. Such servers are also called STUN servers. According to the characteristics of NAT, when device A sends a request to the STUN server, a mapping relationship will be formed: => => , then the STUN server will be Respond to device A. Similarly, device B also responds to the request to obtain the corresponding mapping relationship.

  • Exchange mapping relationship. Device A and device B need to exchange their mapping relationships in NAT to prepare for the next connection establishment. There needs to be another agreement for exchange.
  • After the exchange is complete, device A sends a request packet to device B s NAT-B address , because this request is not initiated by device B. For security reasons, NAT-B will not forward it to Device B is discarded instead, but NAT-A records this mapping relationship according to the characteristics, and subsequent packets sent from the address will be forwarded to Device A. Similarly, device B also initiates the same request to to let NAT-B know that all subsequent requests from are forwarded to device B.
  • After completing the above actions, device A and device B can establish a P2P connection and send messages happily. Of course, this connection needs to be maintained by a heartbeat packet to prevent it from being closed.

The above is a complete UDP impressing process. After the hole punching is completed, the device can realize P2P connection across NAT.

WebRTC is an end-to-end connection based on UDP

Based on the previous knowledge, WebRTC also needs to deal with the process of UDP hole punching to realize end-to-end audio and video transmission. When we call WebRTC to create an end-to-end connection, we will not implement this UDP hole-punching process (if it does, then it deviates from the original intention of WebRTC. The realization of convenient and fast establishment of audio and video instant communication is what WebRTC pursues. ). More, we need to call the API according to the process to complete the creation of WebRTC.

Learn the API usage of WebRTC from creating a WebRTC connection

Let's take a look at the entire process of creating a connection:

Signaling server: A service used for information exchange. Its role in the WebRTC process is to exchange the information needed to establish a connection as a terminal. This signaling server implementation is not in the scope of the WebRTC solution, because this implementation is closer to the business itself, and there are more implementation methods to match different business scenarios.

Although the compatibility of WebRTC is very good (except for the indispensable IE), there are still differences in the top-level API provided by each browser, so we need a gasket to solve this difference: webrtcHacks/adapter

Break it down:

  1. Create an RTC instance. WebRTC provides the API RTCPeerConnection for instantiating connections. The code is shown below, in which configuration is optional, and configuration is used to configure STUN/TURN server information. Here the STUN/TURN server also needs to be built by itself. If there is a need, see the deployment of stun and turn servers .
let connection = new RTCPeerConnection ([configuration ]); Copy Code

If the configuration is not configured, it means that this connection can only be made on the intranet.

  1. Access the camera microphone device. With the code shown below, you can turn on the camera microphone without relying on any plug-ins (the browser will request authorization at this time) and get the media stream. Among them, constraints refer to the requested media type and corresponding parameters; mediaStream refers to the media stream. You can assign mediaStream to the srcObject attribute of the video element to realize real-time media stream playback. More information about mediaDevices can be found here
navigator.mediaDevices.getUserMedia(constraints) .then(function(mediaStream) {... }) .catch(function(error) {... }) Copy code

getUserMedia can get the media stream normally only in the local environment localhost and trusted domain name (https), otherwise it will report an error.

  1. Add the media stream to the RTCPeerConnection instance.
connection.addTrack(mediaStream.getVideoTracks()[0], stream); Copy code
  1. Exchange SDP. WebRTC uses the Offer-Answer mode to exchange offers. 1. the initiator creates an Offer SDP through createOffer, and transmits it to the receiver through the signaling service; the receiver creates an Answer SDP through createAnswer, and transmits it to the initiator through the signaling server. Both parties need to set the SDP generated by themselves and sent from the peer to the connection through setLocalDescription and setRemoteDescription.

SDP (Session Description Protocol) is a session description protocol. It is based on text. It does not belong to a transmission protocol. It needs to rely on other transmission protocols (such as SIP and HTTP) to exchange necessary media information. It is used between two session entities. Media negotiation. The SDP contains media information, network information, security features, and transmission strategies required to establish a session.

Note: If you are transmitting an audio and video stream, you need to add the media stream to the channel through addTrack before generating the Offer SDP, because the stream information needs to be collected when generating the SDP. If it is not added, SDP will not survive.

//Add streaming information connection.addTrack(stream.getVideoTracks()[0], stream); //The initiator creates an Offer SDP connection .createOffer() .then((sessionDescription) => { console.log("Send offer"); if (connection) { console.log("Set local description"); connection.setLocalDescription(sessionDescription); } sendMessage(sessionDescription, targetId); }) .catch(() => { console.log("offer create error"); }); //The receiver sets the remote description connection.setRemoteDescription( new RTCSessionDescription(sessionDescription) ); //The receiver generates an Answer SDP connection .createAnswer() .then((sessionDescription) => { console.log("send answer"); if (connection) { console.log("Set local description"); connection.setLocalDescription(sessionDescription); } sendMessage(sessionDescription, targetId); }) .catch(() => { console.log("Failed to create answer"); }); //The initiator sets the remote description connection.setRemoteDescription( new RTCSessionDescription(sessionDescription) ); Copy code

5. Exchange candidate information. Both parties of the connection obtain their candidate information by monitoring the icecandidate event of RTCPeerConnection, and then transmit it to the peer through the signaling service. Candidate information exchange is an important step in establishing a webRTC connection. After getting the candidate of the peer, you need to instantiate the RTCIceCandidate object, and then add it to the Rtc instance through the addIceCandidate method of RTCPeerConnection.

The candidate information candidate contains the NAT mapping information corresponding to the current device, including: ip, port, and protocol

The time when the icecandidate event is triggered is after the execution of setLocalDescription.

//Monitor the trigger of icecandidate connection.addEventListener("icecandidate", (event) => { if (event.candidate) { console.log("Send candidate", event.candidate.candidate); sendMessage( { type: "candidate", label: event.candidate.sdpMLineIndex, id: event.candidate.sdpMid, candidate: event.candidate.candidate, }, targetId ); } else { console.log("End of candidates."); } }); //Add candidate information const candidate = new RTCIceCandidate({ sdpMLineIndex: message.label, candidate: message.candidate, }); connection.addIceCandidate(candidate).catch((error) => { console.log(error); }); Copy code

The above is probably the API needed to complete the entire WebRTC connection establishment. In addition, there are many other APIs for practical monitoring of connections, negotiation of audio and video encoding during connection creation, and creation of WebRTC connections by establishing data channels. , Those who are interested can refer to the WebRTC API .

Actual combat

To understand some of the main APIs for playing WebRTC, let us complete an example together in actual combat-the realization of real-time communication between two people mentioned in the beginning, here we will realize the simultaneous video communication of multiple people. Here is mainly an example based on the implementation of the intranet, accessing the service through the intranet ip.

The implementation here is mainly divided into two parts, following the idea of separating front and back ends, one is the realization of signaling services, and the other is the realization of front-end interactive pages.

The first step is to create a signaling service: express + socke.io

As mentioned earlier, the main function of the signaling service is to relay information when establishing a connection. In order to implement the function simply, node express framework and socket.io which can support two-way communication are used;

Create Https service

Because if our webpage requires multiple online video calls, then our website needs to be shared. As mentioned earlier, only localhost or a viable address can access the browser media device. Therefore, in order to avoid the problems caused by cross-protocol, an https service needs to be created.

const app = require("express")(); const fs = require("fs"); //Read the certificate const key = fs.readFileSync( "/Users/XXX/Documents/study/https/", "utf8" ); const cert = fs.readFileSync( "/Users/XXX/Documents/study/https/", "utf8" ); const http = require("https").Server( { key, cert, }, app ); //Listen on port 3005 http.listen(3005, function () { console.log("listening on *:3005"); }); Copy code

In fact, you can also configure nginx to create https. I don't want to engage in nginx here, so I implemented it in code.

Cross-domain settings

Because the page and the signaling service are two different services, there will be cross-domain problems. The entry file needs to add the following code:

const allowCors = function (req, res, next) { res.header("Access-Control-Allow-Origin", req.headers.origin); res.header("Access-Control-Allow-Methods", "GET,PUT,POST,DELETE,OPTIONS"); res.header("Access-Control-Allow-Headers", "Content-Type"); res.header("Access-Control-Allow-Credentials", "true"); next(); }; app.use(allowCors); Copy code

The realization of socket.io information relay mechanism.

The implementation is very simple, the code is a mechanism, and I have to admire the greatness of the open source world.

const socketIo = require("socket.io"); function createSocketIo(httpInstance) { //Initialize instance, support cross-domain const io = socketIo(httpInstance, { cors: { origin: "*", allowedHeaders: ["Content-Type"], methods: ["GET,PUT,POST,DELETE,OPTIONS"], }, }); //Monitor every connection io.on("connection", function (socket) { //Monitor a single terminal on the connection, and send newcomer join messages like other terminals at the same time socket.on("connect", () => { console.log("Connected"); socket.joinRoom("demo", () => { socket.broadcast.to("demo").emit("new", socket.id); }); }); //Message relay socket.on("message", (message) => { if (message.target) { socket.to(message.target).emit("message", { originId: socket.id, data: message.data, }); } else { socket.broadcast.to('demo').emit("message", { originId: socket.id, data: message.data, }); } }); }); } module.exports = createSocketIo; Copy code

The second step, the realization of the front-end interactive interface

The main function of the front-end interactive interface is to establish a WebRTC connection with the end joining the room through signaling service.

Here the user interaction interface uses creat-react-app to quickly create projects.

Among them, there are two parts of the user interaction interface, one part is the logic of WebRTC connection creation, and the other part is the interaction logic of socket.io.

Encapsulate WebRTC connection, support multi-peer connection

//Invincible Gasket import "webrtc-adapter"; class ConnectWebrtc { protected connection: RTCPeerConnection | null; constructor() { this.connection = null; } //Create an instance of RTCPeerConnection and listen to icecandidate and track events at the same time create( onAddStream: EventListenerOrEventListenerObject, onReomveStream: EventListenerOrEventListenerObject, onCandidate: (candidate: RTCIceCandidate) => void ) { this.connection = new RTCPeerConnection(undefined); this.connection.addEventListener("icecandidate", (event) => { if (event.candidate) { onCandidate(event.candidate); } else { console.log("End of candidates."); } }); this.connection.addEventListener("track", onAddStream); this.connection.addEventListener("removeTrack", onReomveStream); } //Create offer sdp createOffer( onSessionDescription: ( sessionDescription: RTCSessionDescriptionInit ) => void ) { if (this.connection) { this.connection .createOffer() .then((sessionDescription) => { if (this.connection) { this.connection.setLocalDescription(sessionDescription); onSessionDescription(sessionDescription); } }) .catch(() => { console.log("offer create error"); }); } } //Create answer sdp createAnswer( onSessionDescription: ( sessionDescription: RTCSessionDescriptionInit ) => void ) { if (this.connection) { this.connection .createAnswer() .then((sessionDescription) => { if (this.connection) { this.connection.setLocalDescription(sessionDescription); } onSessionDescription(sessionDescription); }) .catch(() => { console.log("Failed to create answer"); }); } } //Set the remote description setRemoteDescription( sessionDescription: RTCSessionDescriptionInit | undefined ) { this.connection?.setRemoteDescription( new RTCSessionDescription(sessionDescription) ); } //Set up candidates setCandidate(message: any) { if (this.connection) { const candidate = new RTCIceCandidate({ sdpMLineIndex: message.label, candidate: message.candidate, }); this.connection.addIceCandidate(candidate).catch((error) => { console.log(error); }); } } //Add the media stream to the connection addTrack(stream: MediaStream) { if (this.connection) { this.connection.addTrack(stream.getVideoTracks()[0], stream); this.connection.addTrack(stream.getAudioTracks()[0], stream); } } //Remove the media stream from the connection removeTrack() { if (this.connection) { this.connection.removeTrack(this.connection.getSenders()[0]); } } } export default ConnectWebrtc; Copy code

By encapsulating to create a connection to the key steps, you can create multiple WebRTC connections.

The page interaction component is connected with socket.io

import {useEffect, useRef, useState} from "react"; import {io, Socket} from'socket.io-client'; import {server} from "./config"; import ConnectWebrtc from "./webrtc"; //Media device collection configuration const mediaStreamConstraints = { video: { width: 400, height: 400 }, audio: true }; const Room = () => { //Local stream const localStream = useRef<MediaStream>(); //Play the video tag of the local stream const localVideoRef = useRef<HTMLVideoElement>(null); //Save multiple connection instance objects const connectList = useRef<{ [target: string]: any }>({}); //List of connected users const [userlist, setUserList] = useState<string[]>([]); //socket.io instance let socket = useRef<Socket>(); //Send a message to the specified peer const sendMessage = (data: any, targetId?: string | null) => { socket.current?.emit('message', { target: targetId, data }) } //Add the media stream from the peer to the specified video tag const handeStreamAdd = (originId: string) => (event: any) => { let video = document.getElementById(originId) as HTMLVideoElement; if (video) { video.srcObject = event.streams[0]; } } //Get the WebRTC connection instance with the specified peer, if it does not exist, create it const getConnection = (originId: string) => { let connection = connectList.current?.[originId]; if (!connection) { connection = new ConnectWebrtc(); connection.create(handeStreamAdd(originId), () => {}, (candidate: RTCIceCandidate) => { sendMessage( { type: "candidate", label: candidate.sdpMLineIndex, id: candidate.sdpMid, candidate: candidate.candidate, }, originId ); }); //Preferentially add media stream to the connection connection.addTrack(localStream.current); connectList.current[originId] = connection; } return connection; } //Create a socket.io connection with the signaling service const handleConnectIo = () => { socket.current = io(server); socket.current.on('connect', () => { console.log('Connected'); }); //Listen for messages socket.current.on('message', function (message) { //Add peer if (!userlist.includes(message.originId)) { userlist.push(message.originId); setUserList([...userlist]) } let connection = getConnection(message.originId); //When acting as the receiver, set the remote description and create answer sdp if (message.data.type ==='offer') { connection.setRemoteDescription(message.data); connection.createAnswer((sdp) => { sendMessage(sdp, originId); }); } //When acting as the initiator, the answer sdp will be set as the remote description else if (message.data.type ==='answer') { connection.setRemoteDescription(message.data); } //When the candidate information is received, add the candidate information to the connection else if (message.data.type ==='candidate') { connection.setCandidate(message.data); } }); //When a new user joins the room, initiate a WebRTC connection socket.current.on('new', (newId) => { const connection = getConnection(newId); connection.createOffer((sdp) => { sendMessage(sdp, originId); }); if (!userlist.includes(newId)) { userlist.push(newId); setUserList([...userlist]) } }) } //Open the local media device and set it to the video tag for playback const handleGetLocalStream = (callback: () => void) => { navigator.mediaDevices.getUserMedia(mediaStreamConstraints) .then((mediaStream) => { localStream.current = mediaStream; if (localVideoRef.current) { localVideoRef.current.srcObject = mediaStream; } callback(); }).catch((error) => { console.log(error) }); } //Component mounting is to first open the media device and then establish a socket.io connection useEffect(() => { handleGetLocalStream(() => { handleConnectIo(); }); }, []); return ( <div> <div style={{ marginTop: 20 }}> <p>My screen</p> <video ref={localVideoRef} autoPlay playsInline ></video> </div> <div style={{ marginTop: 20 }}> <p>Other people's screens</p> { userlist.map(user => { return <video id={user} key={user} autoPlay playsInline></video> }) } </div> </div> ) } export default Room; Copy code

The above component code clearly shows the important role of the signaling service in the entire WebRTC connection process, which can be said to lead the entire process. There are still many details in the whole process that have not been realized, but this is not what I want to pursue here. More importantly, it shows the general realization process of multi-party WebRTC real-time communication connection establishment.

The effect achieved is similar to that of the head, so I won't show it here.


Use a picture to show the relationship between users in the actual combat above:

The intricate relationship, if you add millions to tens of thousands of connections, wouldn't it be an explosion.

Therefore, P2P connection is a decentralized connection method. It is suitable for a small number of connections. For a large number of connections, it must finally return to the centralization idea, and the central service will realize the decentralized ability (such as CDN). ).

At first glance, this will not return to the traditional audio and video live broadcast. Then why should I choose WebRTC?

From the perspective of technology selection, WebRTC has better compatibility and scalability than other audio and video protocols (because it is open source); in addition, the delay of WebRTC connection is lower, which requires real-time The live broadcast industry, which is increasingly sexually interactive, is a very good solution.

protocolDelayData segmentationHTML5 live broadcastApplication scenarioFront-end related plug-ins
HLS10~30sslicestand byH5 live broadcast, game live broadcasthls.js
RTMP2s~5sContinuous flownot supportInteractive live broadcast
HTTP-FLV2s~5s, better than rtmpContinuous flowstand byInteractive live broadcastflv.js
RTSPGenerally below 500msContinuous flownot supportInteractive live broadcast
webRTCWithin 1sContinuous flowSupport (compatibility)Interactive entertainmentNative

At present, some industries with high requirements for real-time interaction are already using WebRTC solutions. It can be said that the prospects are very bright.


This article briefly analyzes WebRTC from the perspective of principle to actual combat. WebRTC is a very complicated solution, which is definitely not what my article can make sense. If there is something wrong, please correct me.

At last

I think the barbecue guy wrote well, please like and pay attention; if there are any shortcomings, please feel free to correct me.