Codecs

Introduction

A codec (COder/DECoder) is a device and/or software program that is used typically to convert analogue information (such as speech) into a digital stream for transmission, and then back to analogue again at the receiving end. At the same time, codecs are generally designed to compress the information in order to minimize the amount of bandwidth or storage space used. This inevitably leads to some loss of quality in the transmitted information, so the design seeks to use minimal bandwidth and/or storage whilst preserving an acceptable quality in the information conveyed.

Specification of speech and audio codecs has been one of ETSI's many success stories.



Our Role & Activities

GSMTM Full Rate codec

When GSM was first being specified, the challenge was to prove that the limited available spectrum could be exploited more efficiently than with the existing analogue systems. That would mean the capacity of systems (i.e. number of customers the mobile network can support for a given amount of licensed frequency allocation) could be maximized whilst preserving, or even improving, the speech quality as perceived by the user. The work resulted in a digital 'full-rate speech' coding algorithm.

GSM Half Rate codec

At the completion of the GSM Full Rate exercise, the Half Rate (HR) speech coding exercise was started, with the objective of meeting the same basic quality of the GSM full rate by using half the bit rate (the GSM full rate speech codec requires 13 kbit/s, added to 9.8 kbit/s to the channel coder, making a total rate for the GSM speech channel of 22.8 kbit/s).

The resulting new algorithm produced the standardized GSM Half Rate codec that used only 5600 bit/s, leaving 5800 bit/s to the associated channel coder, making a total rate for the GSM half speech channel of 11.4 kbit/s. Unfortunately, the HR codec showed that it could suffer in terms of perceived quality in extreme conditions (e.g. with certain background noises, mobile-to-mobile communications, or certain languages).

Enhanced Full Rate (EFR) codec

In the mid-1990s, the qualitative drawbacks shown by the HR codec, together with the advent of more advanced and powerful digital signal processing technologies, pushed the GSM Association to request the Speech Experts Group (SEG) of ETSI to provide a new and better sounding speech coding algorithm, called Enhanced Full Rate (EFR), working at 12.2 kbit/s and leaving 10.6 kbit/s for the channel coding, which assured better error protection.

Adaptive Multi-Rate (AMR) codec

Between the encoding and the decoding processes which take place in the transmitting and receiving ends of a communication over a digital network, another important function takes place. This is the 'channel coding' process, described in GSM Technical Specification 05.03: this process is indispensable to protect the encoded speech signal against interference in the radio link. The need to balance speech coding and channel coding in order to optimize network capacity led to the Adaptive Multi-Rate (AMR) speech coder, which appeared in GSM Release 98.

The AMR coder balances the proportion of available GSM radio channel bit rate (22800 for the full-rate or 11400 bit/s for the half-rate) between the Speech coding and the Channel coding, enabling the most effective use of the radio resources.

For the adaptation of the uplink codec mode, the network must estimate the channel quality, identify the best codec mode for the existing propagation conditions and send this information to the Mobile Station (handset etc.) over the air interface. For the downlink codec adaptation, the Mobile Station must estimate the downlink channel quality and send quality information to the network. This information is used to define a 'suggested' codec mode.

Each link may use a different codec mode but it is mandatory for both links to use the same channel mode (either full rate or half rate). The channel mode is selected by the Radio Resource management function in the network: it is done at call set up or after a handover between cells. The channel type can further be changed during a call as a function of the channel conditions.

Adaptive Multi-Rate Wideband (AMR-WB) codec

In March 2001, 3GPPTM approved the technical specifications for the Adaptive Multi-Rate Wideband (AMR-WB) coding algorithm, as part of 3GPPTM Release 5. The International Telecommunication Union (ITU-T) Study Group 16 approved the same wideband coding algorithm as Recommendation G.722.2 and its Annexes in January 2002.

The AMR-WB codec provides a bandwidth of 50 Hz up to 7 kHz, compared to the conventional 3.1 kHz of the traditional telephony (300-3400 Hz). The codec includes Voice Activity Detection (VAD), Discontinuous Transmission (DTX), and Comfort Noise Generation (CNG) operations consisting of nine modes (bit rates) between 6.60 and 23.85 kbit/s. The coding scheme is called 'Multi-Rate Algebraic Code Excited Linear Prediction'.

The range of bit-rates allows the application of the AMR-WB codec for GSM Full Rate channels, GERAN (Enhanced Data rate for GSM Evolution, EDGE) 8-Phase Shift Keying (8-PSK) channels, and 3G UMTSTM Terrestrial Radio Access Network Wideband Code Division Multiple Access (UTRAN WCDMA) channels. In GSM, link adaptation is used to optimize the perceived transmission quality based on measurement reports of the radio channel quality. AMR-WB is required in 3GPPTM for Multimedia Messaging Service (MMS), Packet-switched Streaming Service (PSS), Multi Broadcast Multicast Services (MBMS) and Packet-Switched Conversational Services, when 16 kHz sampled speech is used.

In addition to 3GPPTM wireless applications, further applications were targeted by ITU-T standardization, including Voice over IP (VoIP), Internet applications, Public Switched Telephone Network (PSTN) and Integrated Services Digital Network (ISDN) wideband telephony, and audio/video teleconferencing.

Audio codecs

Mobile streaming audio and messaging services may contain speech only, music only, or speech mixed with music on background. For this expected mixed streaming content, the codecs described so far have difficulties in performing consistently well for both speech and music at low bit-rates (i.e. well below 32 kbit/s).

Radio resources and channel capacity set further limitations on data rates available for streaming. Streamed audio content should be made available at a low bit-rate well below 32 kbit/s, corresponding to the bit-rate range already used in the AMR-WB codec. If video is included in the content, the data rate should be as low as possible.

For these reasons, in March 2005 3GPPTM introduced (in Release 6) two new 'audio' codecs for Packet Switched Streaming Service (PSS), Multimedia Messaging Service (MMS), Multimedia Broadcast and Multicast Service (MBMS), IMS Messaging Service and Presence Service. These are the Extended AMR Wide Band (AMR-WB+) codec and the Enhanced aacPlus codec.

Extended Adaptive Multi-Rate - Wideband (AMR-WB+) codec

The Extended AMR-WB codec (AMR-WB+) was initially targeted on wideband applications; it extends the AMR-WB codec (with new modes) for use in packet-switched streaming and messaging services, as well as for MBMS, IMS Messaging and Presence. As this codec brings just additional modes to the existing AMR-WB codec, there are no service or architectural impacts.

The work therefore consisted in enhancing the current AMR-WB codec for audio applications by developing an audio extension based on the current 3GPPTM AMR-WB speech codec. The audio extension is primarily intended for non-conversational services. Among the main objectives of the audio extension were:

  • High perceptual quality with speech, music and mixed content
  • Music performance comparable to the quality of state-of-the-art audio codecs
  • Speech performance at least as good as that of AMR-WB
  • Similar bit-rates as the AMR-WB codec in order to ensure efficient use of radio resources
  • Mono and stereo coding

Enhanced aacPlus codec

The Enhanced aacPlus is an extended and improved version based on the recommended Release 5 Audio codec AAC-LC. It is optimized for high audio quality at low bitrates and is therefore well suited for services such as PSS, MMS, MBMS, and Presence. In particular, the Enhanced aacPlus codec offers the following capabilities:

  • Excellent (CD-like) audio quality at bitrates well below 64 kbit/s
  • Efficient stereo modes, enabling high quality stereo starting at bitrates below 24 kbit/s
  • Music quality across the full bit-rate exceeding that of any other audio codec known today
  • Flexible configuration allowing use of any particular bit-rate starting from 8 kbit/s
  • Low computational complexity for decoder and encoder
  • Fully specified in 3GPPTM, including optimized floating-point and fixed-point source code



Standards

The following is a list of recently published and frequently downloaded standards. Please use the ETSI Work Programme to find further related standards.

Standard No.Standard title
TR 146 085 Digital cellular telecommunications system (Phase 2+);Subjective tests on the interoperability of the Half Rate / Full Rate / Enhanced Full Rate (HR/FR/EFR) speech codecs, single, tandem and tandem free operation (3GPPTM TR 46.085 version 7.0.0 Release 7)
EN 301 245 Digital cellular telecommunications system (Phase 2) (GSM);
Enhanced Full Rate (EFR) speech transcoding
(GSM 06.60 version 4.1.1)
EN 301 713 Digital cellular telecommunications system (Phase 2+) (GSM);
Test sequences for the Adaptive Multi-Rate (AMR) speech codec
(GSM 06.74 version 7.0.3 Release 1998)
TR 126 936 Digital cellular telecommunications system (Phase 2+); Universal Mobile Telecommunications System (UMTSTM);
Performance characterization of 3GPPTM audio codecs
(3GPPTM TR 26.936 version 7.0.0 Release 7)