July 19, 2024

Solid State Lighting Design

Find latest world news and headlines today based on politics, crime, entertainment, sports, lifestyle, technology and many more

MLow: Meta’s low-bitrate audio codec

MLow: Meta’s low-bitrate audio codec
  • At Meta, we power real-time communication (RTC) for billions of people through our apps, including WhatsApp, Instagram, and Messenger.
  • We’re working to make RTC accessible by providing a high-quality experience for everyone – even those who may not have the fastest connections or the newest phones.
  • As more and more people have relied on our products to make calls over the years, we’ve worked to find new ways to ensure all calls have solid audio quality.
  • We’ve built the Meta Low Bitrate (MLow) codec: a new tool that improves audio quality especially for those on slow-speed connections.
Figure 1: Increasing complexity or bitrate usually improves quality, but good codecs achieve higher quality while balancing the other two.

RTC products use several key elements to deliver the full experience, and one important component is audio/video codecs. These codecs help compress captured audio/video data so that it can be sent over the Internet efficiently to the recipient, while maintaining a real-time experience. For example, the raw audio captured for a typical call is 768 kbps (mono, sampling at 48 kHz, bit depth 16), which modern codecs can compress to 25-30 kbps. This compression often comes at the cost of some quality (information loss), but good codecs can balance the trifecta of quality, bitrate, and complexity by exploiting deep knowledge about the nature of the audio signal as well as by using psychoacoustics. .

Creating a good codec is very difficult, which is why we don’t see new codecs coming out very often. Another good, widely known open source codec is Opus, released in 2012, which has become the codec of choice for a wide range of applications on the Internet. Meta has used Opus for all of its RTC needs, and it has served us well so far – helping deliver high-quality calls to billions of users around the world.

Our motivation to build a new codec

Given the massive scope of RTC’s use in Meta products, we can see how the codec performs in a range of network scenarios and how it impacts the end user experience. In particular, we observed that a significant portion of calls had poor network connections during the call or part of it. Typically, the Bandwidth Estimator (BWE) detects network quality, and as network quality deteriorates, we need to lower the bitrate of the codec playback to avoid network congestion and keep the audio flowing – affecting the triple-balance noted above. To complicate matters further, making a video call despite poor network quality does not leave much room for sound and results in the audio bitrate falling even further. The lowest operating point of the Opus is 6kbps, as it operates in NarrowBand mode (0 – 4kHz) and does not adequately pick up all the audio frequencies that human voices produce – and therefore does not sound clear or natural. Below is an example of how 6kbps Opus audio sounds and the corresponding reference file for comparison.

See also  Pokemon variety show to share "latest information about Pokemon games" next week

Raw bookmark:

Opus @ 6 kbps NarrowBand (NB):

Over the past couple of years, we have seen the development of some new machine learning (ML)-based audio codecs that deliver high-quality audio at very low bit rates. In October of 2022, the Meta was released Coding, which achieves amazingly clear sound quality at very low bit rates. Although AI/ML-based codecs are capable of achieving great quality at low bitrates, this often comes at the cost of prohibitive computational cost. Therefore, only high-end (expensive) mobile phones are able to play these codecs reliably, while users running on low-end devices continue to experience audio quality issues in low bitrate conditions. So the net effect of these newer, computationally expensive codecs is actually limited to a small portion of users.

A large number of our users are still using low-end devices. For example, more than 20% of our calls are made on ARMv7 devices, and tens of millions of daily calls on WhatsApp are made on devices that are more than 10 years old. Given the readily available codec options and our commitment to ensuring that all users – regardless of the device they use – have a high-quality calling experience, it is clear that we need a codec with very low computing requirements that still delivers high-quality audio at these lowest bitrate devices.

MLow codec

We began development of a new codec in late 2021. After nearly two years of active development and testing, we are proud to announce Meta a little Bitrate audio codec, also known as MLow, which achieves two times better quality than Opus (POLQA MOS 1.89 vs. 3.9 @ 6 kbps WB). Most importantly, we are able to achieve this great quality while maintaining the computational complexity of MLow 10 percent less than that of opus.

See also  The new Apple ID password reset issue is plaguing iPhone, iPad and MacBook users

Figure 2 below shows the MOS (Mean Opinion Score) plot on a scale of 1 to 5 and compares the POLQA scores between Opus and MLow at different bit rates. As the graph shows, MLow has a significant advantage over Opus at the lowest bitrates, as it saturates quality faster than Opus.

Figure 2: POLQA result comparing Opus (WB) versus MLow at different bit rates across a large data set of files.

We have already fully launched MLow for all Instagram and Messenger calls and are actively rolling it out on WhatsApp, and have already seen an incredible improvement in user engagement driven by better audio quality.

Here are some audio samples to listen to. We suggest you use your favorite pair of headphones to appreciate the amazing differences in sound quality.

Opus 6 kbps note Mlow 6 kbps WB reference

The ability to encode high-quality audio at lower bit rates also opens up more effective forward error correction (FEC) strategies. Compared to Opus, with MLow we can afford FEC at much lower bit rates, which greatly helps improve audio quality in packet loss scenarios.

Below are two 14kbps audio samples with a significant receiver-side packet loss of 30 percent.

Authorship:

Note that at these bit rates, Opus is unable to encode any internal corrective correction (FEC). It needs at least 19 kbps to encode any internal FEC at 10 percent packet loss, which is detrimental to audio recovery.

Mlow interior

MLow is based on the concepts of the classical CELP (code excitation linear prediction) codec with advances on excitation generation, parameter quantization, and coding schemes. Figure 3 is a high-level visual of how the codec works internally. On the left we have an input signal (raw PCM audio) that is fed into the encoder, which then splits the signal into low and high frequency bands. Next, each band is individually encrypted taking advantage of shared information to achieve better compression. All outputs are passed through a band encoder for further compression and creating an encrypted payload. The decoder does exactly the opposite when the load is given to create audio signals for the output.

See also  Rapper Designer Adjusted for Exposing Himself on a Plane, Seeking Mental Help
Figure 3: High-level MLow encoder and decoder architecture.

With these split-band improvements, we are able to encode high-band using very few bits, allowing MLow to deliver SuperWideBand (32 kHz sampling) using a much lower bit rate.

What then?

MLow has dramatically improved audio quality on low-end devices while ensuring end-to-end call encryption. We’re really excited about what we’ve accomplished in just the past two years — from developing a new codec to successfully shipping it to billions of users around the world. We continue to work on improving audio recovery in heavy packet loss networks by pumping out more redundant audio, which MLow allows us to do efficiently. We’re excited to share more as we continue working to make it easier for all of our users to make high-quality voice calls.