Disaster recovery in the cloud – the right time to invest
Every major broadcaster acknowledges that they have to consider disaster recovery.
And every major broadcaster also acknowledges that it is the subject of a difficult debate.
On one hand, business continuity is vital. Apart from meeting audience expectations, if a channel is off air, it cannot transmit commercials. Without commercials, it has no income. Getting the station back on air – and broadcasting commercials – is clearly vital.
On the other hand, today’s technology is very reliable. Most people would expect never to have to use the disaster recovery site. So a large investment in replicating the primary playout center at some geographical distance can be seen as wasted money: a lot of hardware (and real estate) that would never go to air.
As a side note, the last year has taught us that planning for business continuity can never be too detailed. The need to keep staff socially distanced due to a global pandemic was one disaster few of us expected to have to recover from.
The question, then, is how to ensure business continuity through a disaster recovery site that gets the channel on air in the shortest possible time, that can be operated from anywhere, and involves the least amount of engineering support to launch. And the answer that broadcasters are increasingly turning to is the cloud.
I must make it clear that cloud-based disaster recovery is not free from capital investment. There are costs involved in establishing the communications between your primary site, your various delivery systems (transmitters, satellite uplinks, streaming servers) and the cloud. Building the software stack in the cloud will also take time and money.
But once all the elements are there, it can be extremely cost-effective to keep a standby system in the cloud: ready to start when you need it; dormant when you do not. And, as I will explain, cloud-based disaster recovery can serve as a practical first step on the path towards a complete broadcast architecture in the cloud.
Let’s define cloud
For the purposes of this article, I am going to refer to AWS (Amazon Web Services) as the cloud provider. Imagine works with other cloud providers, but we have more live production installations in the AWS cloud, and we have excellent relationships with the technical team there. In some parts of the world, other cloud providers have a more dominant market position: broadcasters in Asia, for example, are perhaps more likely to look at Alibaba or Huawei cloud platforms.
Whichever provider you choose, what you buy from them is access to effectively infinite amounts of processing power and storage space. AWS also offers some media-specific services (through their acquisition of Elemental) like media processing, transcoding and live streaming, but in general you are renting time on generic IT processors. You need to add the services that meet your needs.
There is another really important point to make here. Moving to the cloud is not an all or nothing, irreversible decision. The very nature of the cloud means it is simple to flex the amount of processing you put there, so if you should decide to back away from the cloud, it is simple to do so.
The cloud is an option within the IP transition. You decide when and how to make that transition, and when and how much to use the cloud. For many broadcasters, disaster recovery is an excellent way to dip toes into the cloud – to develop your own knowledge and understanding of how best to use it.
With today’s software-defined architectures, users are justified in demanding that devices should perform identically whether they are in dedicated computers in the machine room, virtualized in the corporate data center, or in the public cloud. That, indeed is the true definition of cloud-native: the ability to perform anywhere, with the same user interfaces and responses.
Consistent operation is especially important in disaster recovery deployments. If disaster strikes, the last thing you want is for operators to scrabble around trying to make sense of an unfamiliar system. Performance and user interaction must be exactly the same wherever the processes are actually being performed.
That does not mean that the primary system and the disaster recovery site must be identical. You may choose not to replicate all of your channels, for example. But with a well-designed cloud solution, you should be able to emulate the same user interfaces. This makes it easy and familiar for the operators to switch back and forth between the two different environments.
It also means you can set resilience and availability by channel. You might want your premium channels to switch over to disaster recovery in seconds, for example, while some of your secondary channels can be left for a while. That is a business decision for you: we can help you find the right cost-benefit balance.
A cloud-based solution naturally includes tools that will maintain synchronization between on-premises asset management and the cloud. This can also involve third parties: a studio might deliver new programs direct to your quarantine storage in the cloud for automated checking and QC before being transferred to the playout stack.
Some would say that you need a lot of bandwidth just to keep the scheduled content in the cloud. Certainly if you are continually refreshing the cloud storage this could be true. But it need not be.
Faced with the imminent obsolescence of video tape libraries, and wary of the eternal cost of maintaining an LTO data tape library, many broadcasters are looking to the cloud to host content archives. This is an ideal application for cloud storage. You can load it once knowing that all the technology migration and maintenance will be carried out, flawlessly, by someone else. And this security of content comes at a much lower cost than managing it in house.
Other organizations may be empowering collaborative working in post-production by hosting content and decision lists in the cloud.
Playout, archiving and post may be managed as separate departments with separate budgets. But if you combine them, content is only delivered to the cloud once (or content created in the cloud stays there). It is then available for playout without the high egress costs and is securely stored at significant cost savings.
Broadcasters have traditionally sought very high availability from the technology delivering premium channels. “Five nines” used to be regarded as the gold standard – 99.999 % up time. Even that, though, is equivalent to about 5 ¼ minutes of dead air a year.
AWS offers its broadcast clients unimagined availability, up to maybe nine nines – effectively zero downtime. And it achieves that without any maintenance effort on your part. You need no engineers to keep track of the SMART status of large numbers of disk drives, no routine cleaning of air conditioning, no continual updates of operating systems and virus protection.
One of the deciding factors in setting up disaster recovery centers is the need for geographic diversity. The business continuity site must be sufficiently far away that any problems affecting the primary site, like power failures or earthquakes, will not affect the backup location.
The cloud is inherently geographically diverse, and a good provider will ensure that your applications and data are stored across multiple locations. With a global player like AWS, you can have disaster recovery out of anywhere, for anywhere.
The cloud also gives you control over your processes from anywhere with a reasonable internet connection. So if the disaster is that your building has to be evacuated because of detected cases of a communicable disease, playout operators can work from home with exactly the same user interface and functionality as if they were sitting in the MCR or the Network Operations Center (NOC).
When it comes to broadcast disaster recovery, you can make your own SLA. If you want hot standby (complete parallel running in the cloud for almost instantaneous failover), then the technology allows it – although of course you are paying for the processing time.
Or you can choose your own level of cold or warm standby. Even from cold, when the channel playout instances have to be loaded and booted, the delay is still only going to be of the order of the 5 ¼ minutes that on premises five nines would have given you.
Cyberattacks are becoming an all-too familiar headline. Other industries have seen crippling incursions and software systems held to ransom. Naturally, media industry CIOs and CFOs have security at the forefront of their mind. Developing a business continuity strategy that protects the business from such attacks is paramount.
When considering a disaster recovery investment, the key point to bear in mind is where the dangers lie. The traditional thinking behind disaster recovery was about fire or flood taking out the primary center. But what if it is a cyberattack on the delivery network? Our customers tell us that cyber security is the number one concern they have today.
Again, the cloud is the right solution. A good cloud provider will deliver better data security than you can do yourself. AWS has thousands of staff with the word “security” on their business cards. While no organization can hope to be perfect, a good cloud provider will give you your best shot at complete protection, because that is their business. The alternative is to build your own data security team: an unnecessary overhead and a challenge to develop, recruit and manage.
AWS is even used by the US Intelligence Community which suggests that it is probably working.
One comment that is often heard is that you cannot run live channels or live content from the cloud.
This simply is not true. At Imagine, we have implemented primary playout systems that feature live content. In the U.S., we recently equipped a SMPTE ST 2110 IP media operations center and cloud-hosted disaster recovery channels for Sinclair’s Bally Sports Regional Networks. For Sinclair’s Tennis Channel, we provided core infrastructure for a large-scale ST 2110 live production center featuring a cloud-based environment for pop-up live events.
The biggest requirement for sports television is that live should be absolutely live: no one wants to hear their neighbors cheer and wait to find out why. Minimum latency is also critical for the big money business of sports books.
Sinclair spun up live channels around the 2021 Miami Open tennis tournament in March. All the playout, including the unpredictable live interventions associated with fitting commercial breaks into tennis matches, was hosted in the cloud, with operators sitting wherever was convenient and safe for them. And it was all completely transparent to the viewers.
We have delivered systems that allow operators to decide whether to broadcast a channel under completely automated control, or from a switcher panel in master control with the actions performed in the cloud. We know that the live cloud playout and delivery to broadcast and streaming platforms adds limited latency to on-premises systems, which can be easily managed using the techniques already in place to synchronize signals from disparate sources. You can definitely go live.
As consumer preferences move from broadcast to streaming, what happens after the master control switcher becomes ever more complicated, i.e., preparing the output for all the different platforms. That level of signal processing is better done in the cloud, especially with transcoding-as-a-service providing high-performance, affordable delivery.
Putting disaster recovery playout in the cloud is a natural first step. It allows broadcasters to develop the skills needed to move content and schedules and work with cloud suppliers to fine tune their systems for broadcast. It also means that everyone in the organization gains confidence in the cloud as a suitable platform for broadcasters. Routine rehearsals of business continuity will mean that operators will learn how much similarity there is in performance of the cloud and on-premises systems, and how the user interface seamlessly switches from one to the other.
This experience gives confidence to move on towards a completely cloud future. Pop-up channels can be created in minutes not months, so it is easy to service sports events or music festivals, while only paying for processor time when you need it.
As the legacy playout network reaches life-expiration, broadcasters will know what the cloud can do operationally and technically, and will have built up a solid base of information on the costs of operating in the cloud. That knowledge will be invaluable in evaluating proposals for the next generation of playout.
Ultimately, disaster recovery is fundamentally a business issue – a strategic decision for any company. So I’ll close with a summary of the many strategic and business benefits of deploying disaster recovery in the cloud.
We can now use the cloud as an effective playout system that performs almost exactly as a traditional, on-premises legacy playout network would do, with the same user interface and responsiveness. Cloud access to as much processing power as you need also future-proofs the system, allowing you, for example, to implement machine learning AI algorithms to automate captions and metadata generation.
Cloud playout is inherently suited to remote working. Operators can work from home if needed. If you are a global broadcaster, you could even eliminate night shifts by moving operations around the world every eight hours.
The cloud is infinitely scalable, so you can add channels or services, support new delivery platforms, and test market 4K and HDR. The direct linkage between the cost of delivery and the revenue won makes for easier business management.
Having a master control and playout operation in your premises is an overhead that does not drive your core business. You need real estate for the racks, power to drive them, air conditioning to take away the heat, and specialist staff on shift to maintain it all. You need a continuing maintenance budget to upgrade COTS hardware more often than the traditional seven-year broadcast cycle, and you need to plan to make operating system and software upgrades without risk to the output.
For all these reasons, cloud hosting offers a real reduction in total cost of ownership. Couple the lower TCO with the boost in resilience and the convenience of remote access, and it is clear why the cloud will become the norm for content delivery in future.
Building a disaster recovery playout solution in the cloud is a natural first step. It provides extremely responsive business continuity, including resilience to the disaster we never expected (a global pandemic keeping people away from work). And it gives the broadcaster invaluable insight and experience to guide future designs and decisions.