The real-time CMS: How WebRTC, the browser-based RTC protocol, can reinvent CMS collaboration

March 5, 2020

In many respects, one of the last obstacles remaining for the success of truly dynamic web applications is the absence of bonafide real-time communication. Whether it entails better browser-native videoconferencing or more ambitious decentralized applications in the emerging Web 3.0, real-time communication (RTC) is an essential facet of the coming landscape, and user expectations, ahead for web applications at large. But when it comes to content management and real-time collaboration, there are still many gaps despite the incredible innovation that the real-time communication space has seen in web standards.

Real-time communication has always been a difficult proposition. Nonetheless, its realization in the context of web browsers and the decentralized web opens the door to a bevy of enticing use cases, including content collaboration, peer-to-peer collaboration, and even decentralized approaches like Onion-routed communication and projects immersed even further in the future such as the Dat protocol and the Interplanetary File System (IPFS). In order for it to operate at scale, real-time communication must occur in a peer-to-peer fashion, as it is simply impossible to expect any centralized server to respond with the near-zero latency required for our expectations of what real time entails. The key difference between real-time applications and their more latent counterparts is the fact that connections are established between clients peer-to-peer rather than mediated by central servers that necessitate a round trip.

In this article, I offer a high-level overview of WebRTC, the standard set of APIs that facilitate web real-time communication. In the process, in addition to enumerating some of the use cases that WebRTC enables and that have seen much spilled ink, I want to spotlight one of the less-considered opportunities that is highly relevant not only to web content management (WCM) but also to the presumptive digital experience platform (DXP) market, especially as content management system (CMS) use cases like preview evolve, the architectural separation of concerns in CMS shifts, and the distributed CMS begins to gain steam across a variety of verticals.

WebRTC, the browser-based real-time communication protocol

WebRTC is an open-source project that is freely available and facilitates direct communication in a peer-to-peer manner without any need for special browser plugins or installed native applications on devices. A shared initiative currently supported by Apple, Google, Microsoft, Mozilla, and Opera, WebRTC is also under the purview of the World Wide Web Consortium (W3C) and Internet Engineering Task Force (IETF) when it comes to the authoring of the specifications in question. The stated mission of WebRTC in early 2018 was to “enable rich, high-quality RTC applications to be developed for the browser, mobile platforms, and IoT [Internet of Things] devices, and allow them all to communicate via a common set of protocols.”

The history of WebRTC is checkered, but it is also a compelling case study of the ways in which open-source innovation (like the sort we practiced at Acquia Labs) can help further innovation effloresce in other ecosystems. In May 2010, soon after acquiring proprietary Voice over Internet Protocol (VoIP) and videoconferencing software, Google made the unprecedented decision to open-source all of the so-called GIPS technology that it had amassed during its acquisition of Global IP Solutions (GIPS). Within a year, both Google and Ericsson Labs had built initial WebRTC implementations and open-sourced them to the wider market.

Achievements in the space continued apace, and after the W3C’s publication of a first-draft specification for WebRTC in late 2011, the first cross-browser video call and data transfers were realized in 2013 and 2014. During the summer of 2014, Google Hangouts began to leverage WebRTC in its own tooling, and since that time, browser support has only continued to grow with the increasing demands for better real-time communication support native to browsers. The first implementation of WebRTC leveraging data transfers was particularly seminal, as it was crucial for the later development of real-time collaboration tools.

Applications for WebRTC

WebRTC is gaining attention partially because of its unique ability to penetrate multiple network layers, including proxies, by “punching holes” through the network and establishing channels of communication between browsers on discrete computers. Apart from the most obvious initial use case of real-time conferencing solutions that are now ubiquitous in the corporate world, Carl Blume identified other prospective applications for WebRTC in 2018 including cross-device communication in the Internet of Things (IoT), peer-to-peer messaging and content sharing in the form of data transfers, and Onion-routed communication providing better privacy and security.

On a recent episode of Tag1 Team Talks, the webinar series I host on emerging web technologies, Kevin Jahns (creator of Yjs, an open-source framework for collaborative editing) also enumerated two intriguing examples of how decentralized web technologies are now adopting WebRTC: the Dat protocol and IPFS both leverage WebRTC to a certain extent. The WebRTC initiative also offers a range of readymade samples with source code that detail some of the other ways developers can employ WebRTC for their own ambitious projects. Given the emerging emphasis on data transfer use cases using WebRTC, it is logical to consider WebRTC’s potential to enable CMS responsibilities involving content collaboration as well, across a variety of dimensions.

Motivating WebRTC

In 2018, Shan Sinha of Computerworld argued that the era of widespread WebRTC adoption was here, thanks to three key factors that were influencing the market toward real-time communication as a baseline required feature. First, the extensive browser support presently enjoyed by WebRTC, despite nontrivial distinctions between individual browser implementations, now portends a much less rocky path for developers committed to cross-browser compatibility. WebRTC is now, according to Wikipedia, supported in the following browsers and operating systems:

Desktop: Microsoft Edge 12+, Google Chrome 28+, Mozilla Firefox 22+, Safari 11+, Opera 18+, Vivaldi 1.9+
Android: Google Chrome 28+ (enabled by default since 29), Mozilla Firefox 24+, Opera Mobile 12+
iOS: MobileSafari/WebKit (iOS 11+)
Operating systems: Chrome OS, Firefox OS, BlackBerry 10, Tizen 3.0

There are other reasons Sinha identifies for accelerated WebRTC adoption. Sinha writes that the open-source nature of WebRTC necessarily flattens the playing field for real-time collaboration providers by washing away the foregoing proprietary siloes that characterized the landscape. Finally, perhaps the most important factor is the fact that users are displaying more demanding expectations for real-time user experiences and reliability of real-time communication tools.

Tsahi Levent-Levi, who has written extensively about WebRTC, also contends that the newfound simplicity of the developer experience is a key motivation for wider adoption of the WebRTC standard. For example, prior to the promulgation of WebRTC by Google, implementing a voice- or video-calling feature natively in the browser required juggling lower-level languages like C or C++, which inevitably drove up development costs and lengthened development cycles. Though Levent-Levi writes that WebRTC remains mostly implemented in C and C++, the implementation details are irrelevant to the immediate developer experience contained in WebRTC’s JavaScript application programming interfaces (APIs).

WebRTC internals

This brings us to some of the JavaScript APIs—admittedly much better-documented in the Mozilla Developer Network (MDN) documentation on the same topic—that are now available natively in the browser that developers building real-time communication into their web applications can leverage:

```
getUserMedia()
```
is a JavaScript method that acquires audio and video media, most commonly by accessing a device’s camera and microphone.
```
MediaRecorder
```
records audio and video made available by a device.
```
getDisplayMedia()
```
acquires a screen recording, whereas
```
getUserMedia()
```
only acquires camera and microphone media.

The
```
RTCPeerConnection
```
interface is an API that enables audio and video communication in a peer-to-peer fashion by performing signal processing, codec handling, secure peer-to-peer communication, and bandwidth management. Meanwhile, the
```
RTCDataChannel
```
interface facilitates two-way communication consisting of arbitrary data payloads between peers, leveraging the same API as WebSockets to introduce as little latency as possible.

```
getStats()
```
is a method that offers the web application approaches to retrieve certain statistics and metrics about WebRTC sessions. It produces an
```
RTCStatsReport
```
.

The Adapter.js library, available separately, was built by Google in order to provide shims and polyfills that successfully smooth over differences between individual browsers’ WebRTC implementations. It also gracefully handles prefixes and other differences in nomenclature to make the development process much easier from the developer standpoint. The majority of the aforementioned WebRTC samples leverage Adapter.js precisely to provide the complete cross-browser compatibility required.

Rather than outline the entirety of functionality available within WebRTC, which could stretch the length of a book in its own right, I turn now to how WebRTC, paired with the right technologies, can enable real-time content collaboration, a newly pertinent yet largely unseen use case in modern CMS implementations. The Mozilla Developer Network has an in-progress but largely comprehensive account of WebRTC capabilities and APIs, while WebRTC.org contains a more traditional cumulative tutorial that surveys the APIs in question.

Signaling servers and encryption with WebRTC

When you enable your client to communicate via WebRTC, your external internet protocol (IP) address and other metadata are made available for others to establish peer-to-peer connections. Another peer can then connect to the first peer via their own IP address. There is some debate about whether this presents security ramifications, particularly for users of virtual private networks (VPNs), who run the risk of inadvertently revealing the IP addresses shrouded by their VPNs of choice by leveraging WebRTC.

To forge a connection between two clients and initiate data transfer, an additional networking layer is necessary as an intermediary between both peers: a signaling server. According to Jahns, though communication through WebRTC is feasible without signaling servers (such as through e-mail or Bluetooth), they remain the most commonly seen approach to bridge multiple peers. As Alex Castrounis notes in his deep dive into WebRTC, “Signaling is not specified by the WebRTC standard, nor implemented by its APIs in order to allow flexibility in the technologies and protocols used. Signaling and the server that handles it is left to the WebRTC application creator to sort out.” According to Castrounis, signaling encompasses processes of network discovery and network access translation (NAT) traversal, as well as the creation of the session between peers.

These “channels” are akin to rooms, and in this way signaling servers are similar to publish–subscribe (Pub/Sub) servers, in which once a client submits a message to such a “room,” everyone subscribed to the information transmitted in that room receives that data. As Jahns states during our recent Tag1 Team Talks episode, this data is encrypted using a key that only the peers have such that only those entitled to the data can receive it. In this fashion, encryption is possible without the need to instill trust in a signaling server that may have many other peers whose connections it is responsible for negotiating.

How WebRTC enables real-time content collaboration

Though a comprehensive accounting of not only how WebRTC works but also how to leverage its full potential is well beyond the scope of this overview, there are several underleveraged ways in which WebRTC and real-time communication at large can revolutionize the CMS market in unprecedented ways. In the past, many CMS vendors have focused on the notion of real-time content collaboration through collaborative text editing within limited contexts in content editing interfaces.

But real-time collaboration use cases extend well beyond pure text editing. Consider the possibilities represented by collaborative layout management, collaborative site building, or collaborative design that could occur within a real-time setting, not to mention other exciting potential edge cases such as collaborative content modeling. Whereas the content management system has long been the auspices of disparate users performing tasks independently that occasionally and inevitably conflict with one another, the idea of collaborating in real time on certain key CMS tasks can supercharge the notion of the CMS as a collaborative framework in the first place for users with distinct prerogatives.

Integrating y-webrtc with Yjs, a shared editing framework

One of the shared editing frameworks attracting substantial attention in the CMS landscape, particularly in the WordPress community, is Yjs, an open-source collaborative editing framework created by Kevin Jahns and intended for use in browser-based approaches such as those undergirded by WebRTC. Yjs performs robust operations on collaboratively edited documents through conflict-free replicated data types (CRDTs) that facilitate highly performant conflict resolution, an important feature when editors are working simultaneously on a content item or have regained connectivity after going offline.

Yjs has officially supported integrations with a variety of technologies, including WebRTC in the form of y-webrtc. The y-webrtc integration furnishes access to many different signaling servers simultaneously from Yjs-enabled editorial interfaces. This means that when many individuals are working at the same time in a document, not only will a single peer establish a multitude of connections with others (with connections between those peers as well), all operations performed in a document will be synchronized to all other peers present in that environment as well.

Collaboration use cases beyond textual content

There are a host of questions involved in facilitating real-time collaboration in the CMS context, especially when it comes to issues such as document fragmentation, synchronization across many peers, partitioned collaboration, and avoiding “sync storms” that bottleneck performance. With the support of a framework like Yjs, however, many of these problems are easily resolved, with the added advantage presented by the fact that Yjs is capable of supporting real-time use cases well beyond the traditional content editing scenarios CMS collaboration typically touted by CMS vendors.

Yjs can enable real-time collaboration not only in text but also in graphical formats and other arbitrary data. The possibilities for such collaboration in other contexts within the CMS realm are endless, including common “administration-by-committee” requirements like layout management, in-context editing, and content modeling. I can imagine an exciting array of new workflows in only a few short years, as real-time collaboration continues to mature in the CMS world thanks to initiatives like Yjs in Gutenberg. Not only will editors be able to work with others located twelve time zones away simultaneously on a shared document; they will also be able to craft layouts collaboratively, crop and resize images collaboratively, manipulate CMS themes collaboratively, edit previewed content collaboratively, and even create their schemas collaboratively.

Conclusion: The real-time CMS era is in its infancy

Is there room in the rapidly emerging digital experience platform (DXP) market for a real-time CMS? Such a real-time CMS would enable editors in disparate locations to work together not only on the content that CMS platforms traffic in but also the many “site building” features that are critical to the success of any marketing function in the enterprise.

And there is another potential added benefit to such a real-time CMS future as well. Many in the CMS market have long lamented the fact that content editors no longer write or edit content directly in the CMS itself, eschewing what they consider antiquated word processors in favor of highly collaborative editorial environments like Google Docs and Office 365.

Could this be in the canary in the coal mine for a pendulum swing back in the opposite direction? Could added collaborative capabilities related to layout manipulation and other common CMS functions encourage editors to return en masse to the CMS as a tool for collaboration rather than copy-pasting? Could tasks long relegated to the deepest recesses of the CMS like content modeling become as interactive and dynamic as the Google Docs-like interfaces that have commandeered content editor mindshare?

The technical underpinnings for building the foundations of such a CMS product are already broadly supported, thanks to WebRTC and frameworks like Yjs. For instance, WordPress contributors are already prototyping a real-time editorial experience for Gutenberg that leverages Yjs, and Tag1 Consulting (where I am Editor in Chief) has just published the first installment in a blog series about y-webrtc and how it enables decentralized collaboration. In addition, join Peta Hoyes, Fabian Franz, and yours truly for a session about real-time collaboration and the future of web applications at DrupalCon Minneapolis in a few short months.

From an industry standpoint, I believe that we are in the beginning stages of what I would term the real-time CMS. But the foundation is in place for the real-time revolution around us to continue apace in content management. Now, the mission before us is to implement these real-time collaboration tools—and by doing so to recapture the imaginations of editors who continue to seek better means to work with one another in more robust and more immediate ways.

preston.so