Understanding The Complexities Of Audio Over IP
In their move to IP-based operations, broadcasters have largely focused on getting video transport right, as it requires a lot of bandwidth and brings various technical issues that must be solved (e.g., latency). However, audio also presents its own set of challenges. Compared to video, not only does audio involve a substantially greater number of sources, but it also uses a myriad of standards—sometimes in the same workflow.
Over the years, several competing proprietary approaches and standards for audio over IP have emerged, including DANTE, REVENA MADI (AES10) and AES67. However, the compatibility between these various approaches and even between implementations of specific formats has been a challenge to get audio transport and processing right.
Along with all of its benefits, the move to Audio-over-IP (AoIP) has introduced several important issues that broadcasters must carefully consider.
The Streaming Plane
The streaming plane refers to the basic transport of the audio over the network. In that context, AES67 has become key. First issued in 2013, the AES67 standard has been adopted and integrated by most manufacturers, including providers of products based on proprietary approaches. The standard also forms the basis of the SMPTE ST 2110-30 standard, which helps ensure compatibility on the streaming plane between devices and software.
Within the SMPTE ST 2110-30 standard, three levels of conformance are defined. The mandatory Level A provides support for 48 kHz streams with one to eight audio channels, at packet times of 1 ms. Level B adds support for packet times of 125 μs. Level C increases the max number of audio channels allowed per stream to 64. The latter means that MADI may be carried as-is over the audio network.
What broadcasters must be aware of is that many AoIP systems are currently only able to handle the basic level A. They may also have limitations when it comes to the total number of audio network streams supported, and what combinations of channel count and stream count can be used. So, while manufacturers can genuinely claim support for SMPTE ST 2110-30, the limited scope of their compliance should be taken into careful consideration when selecting audio equipment as they could limit the flexibility of the overall workflow.
As part of implementing AES67 compatibility, Precision Time Protocol (PTP) version 2, or IEEE 1588-2008, is used for timing of the network by the different manufacturers. This also fits with the SMPTE ST 2110-10 standard that mandates use of PTP v2. SMPTE has also published the ST-2059 standard, which generalizes the media clock concept of AES67 to any kind of periodic media clock, including video and timecode.
The common production environment has many more audio sources than video and an even greater number of destinations. A major sports production could have thousands of audio channels travelling across the network, for example. So, while audio may not necessarily place high demands on bandwidth in an IP network compared to video, it certainly creates a challenge in terms of control and orchestration.
Audio engineers expect to be able to connect sources and destinations without concerns about protocols and standards. On the other hand, in a broadcast facility, inter-studio routing must be centrally controlled both for the integrity of signals, but also for security and access control.
The advantage of some of the proprietary approaches is that they include a comprehensive control plane, whereas standards like AES67 or indeed SMPTE ST-2110 do not define how the streams should be controlled.While these proprietary control planes are effective on their own, they are not compatible with each other. More crucially, they are designed for a local studio environment (LAN) and therefore aren’t suited to a seamless distributed production environment, such as for big campus or inter-campus use, or for remote production (over WAN).
In addition, these control planes rely on audio being made seamlessly available to any equipment in the network by default, meaning no explicit routing of streams is required. The assumption behind this approach that no controlled bandwidth management is needed may be flawed when the size and complexity of the network increases.
One approach to overcoming the issues with control plane interoperability, and addressing security and stability concerns, is to bridge different IP audio “islands" lands” by using MADI baseband tielines. However, this adds complexity to the management of audio routing in the facility and reduces flexibility and agility.
The Networked Media Open Specifications (NMOS), developed by the Advanced Media Workflow Association (AMWA), offers a way to address endpoint control for audio in a way that may deliver the true promise of distributed IP production. The standard is now gaining traction in the industry—although its uptake among audio equipment manufacturers is lagging behind that of video equipment vendors.
Meanwhile, an increasingly popular way to control audio flows in an IP network is to use software defined networking (SDN). This not only provides an easy way to connect diverse sources and destinations, but it also adds a layer of predictability, performance guarantees and security by managing bandwidth and only allowing authorized destinations access to specific audio network flows.
Production Made Easy
It’s clear that the move to audio over IP has made it easier for broadcasters to serve the various destination required while implementing their infrastructures to be as flexible as possible. For example, immersive audio authoring typically requires up to 127 dedicated audio object channels, in addition to a base surround signal containing up to 22+2 audio channels. Traditionally MADI has been used to interface this high channel count in the production stage, but IP networked audio has higher capacity and can send all these channels down a single cable.
Audio over IP is also more flexible with regards to routing and does not require any expensive and dedicated MADI routers when more complex topologies than point-to-point links are required. The use of AoIP means there is less need for dedicated or custom hardware, allowing for virtualized and flexible workflows.
Armed with new audio networking capabilities, broadcasters and production facilities are better able to experiment with new technologies like immersive audio, offering audiences a better viewing experience, which in the end is what it’s all about.