In May, we started a blog series on how to improve quality of experience (QoE) in an OTT environment in order to match the quality of broadcast. Part 1 covers what you can do on the video compression side and reduce latency to improve the user experience.
You’ll also see we’ve run a couple of blogs on live OTT delivery, with a focus on the scalability issues and how Netflix is improving QoE for OTT. At the end of the day, something is still missing to really scale for big events, and we have some ideas for how to solve that.
What is missing?
Lately, the industry has been working hard on improving video quality for live OTT events and reducing bit rates. When you have millions of viewers, generally reducing the bit rate for a given video quality is a good thing. Reducing latency is also very important for the user experience.
Even so, there are issues with delivering large events for live OTT. When many simultaneous viewers want to watch the same channel, it’s not uncommon for OTT service providers to encounter scalability problems in certain geographical regions or with some categories of users. For example, let’s say there’s a soccer game between Real Madrid and Barcelona. In most viewing areas, QoE won’t be an issue. But in Madrid and Barcelona, you’ll see scalability issues.
These scalability issues usually cause the user experience to decline. In the best-case scenario, subscribers of this OTT service may access their content at a lower quality (lower bit rate/lower resolution) but in the worst case, the video cuts out completely.
To make matters worse, OTT service providers are getting information about why QoE issues happened after the live event is over — when it’s too late to do anything about it.
Why does adaptive bitrate technology fail to solve that?
Adaptive bitrate technology was designed to make the client adapt the request to the network conditions. This technique works in theory, but in practice we see different behaviors. In critical situations, some clients will request lower bit rates, and other clients co-located will see more bandwidth available or be more aggressive and request higher profiles. The streaming world is not balanced, and we cannot rely on the client itself. We need a holistic view of the network to guide the client.
What is the solution?
We need to close the loop much faster between the end-user and content delivery by collecting information in the network elements and making use of big data, relying on machine learning (ML) technology to build a prediction model of the delivery network.
Here’s what I mean by that:
By gathering information from inside the network, including from edge caches and from the OTT players, you can capture telemetry, such as the amount of rebuffering, actual access bandwidth, CRC errors, and more. Imagine if you were able to collect all of that information from all of those devices from the CDN or from the RAN in the case of mobile networks and properly use it. The challenge is to do that in real time. This solution is much more dynamic than the way it’s done today where adjustments are often made post-mortem, after the problem has already occurred.
Collecting data in real time per video segment is crucial. And there is a lot of data, so you need to extract the relevant information. Once the raw information is collected, you can leverage an AI engine running on the cloud using ML technology. Over time, this engine can improve what’s happening in the network if it uses state-of-the-art algorithms like reinforcement learning, that can scale with millions of users.
When you have a prediction model of the network behavior that is constantly improving, you can anticipate QoE issues before they happen. Then you can feed this information back, either on the origin server or edge cache to propose different renditions for different locations, or on the player to guide its ABR algorithm. For example, using your prediction model, you can foresee that in exactly one minute you’ll likely have a QoE issue in a certain area and mitigate it. You can react since you have a holistic view of what is happening in the network. So when there is a QoE issue, you as a service provider can respond immediately.
Closing the loop
There are different ways to close the loop and improve QoE during peak live OTT events, but one thing is certain: collecting data and properly using it at scale, in real time, is critical.
The final question is how do you close the loop. Should you do it at the origin server level? In edge caches? Through playlist manipulation? At the player level? How do you communicate when different companies are involved in the player, analytics, network, CDN, origin, encoder? You definitely need some form of API with real-time behaviors.
There are still many question to address, but having easy access to the raw data on players, the network status and edge cache servers is key for gathering relevant information.