/*====================================================================================
    3GPP TS26.258 Nov 20, 2025. IVAS Codec Version IVAS-FL-3.0
  ====================================================================================*/


These files represent the 3GPP EVS Codec Extension for Immersive Voice and 
Audio Services (IVAS) floating-point C simulation. All code is writtten
in ISO/IEC C99. The system is implemented as six separate programs:

        IVAS_cod        IVAS Encoder
        IVAS_dec        IVAS Decoder
        IVAS_rend       IVAS External Renderer
        ISAR_post_rend  ISAR Post Renderer
        IVAS_cod_fmtsw  IVAS Encoder with support for format switching
        ambi_converter  example program for Ambisonics format conversion

For encoding using the coder program, the input is a binary
audio file (*.8k, *.16k, *.32k, *.48k) and the output is a binary
encoded parameter file (*.192).  For decoding using the decoder program,
the input is a binary parameter file (*.192) and the output is a binary
synthesized audio file (*.8k, *.16k, *.32k, *.48k). For certain audio
formats (ISM, MASA), there are additional metadata files required. Audio 
channels are interleaved in the input and output audio file. 


                            FILE FORMATS:
                            =============

The file format of the supplied binary data (*.8k, *.16k, *.32k, *.48k,
*.192) is 16-bit binary data which is read and written in 16 bit words.  
The data is therefore platform DEPENDENT.  
The files contain only data, i.e., there is no header.
The test files included in this package are "PC" format, meaning that the
least signification byte of the 16-bit word comes first in the files.

If the software is to be run on some other platform than PC,
such as an HP (HP-UX) or a Sun, then binary files will need to be modified
by swapping the byte order in the files.

The input and output files (*.8k, *.16k, *.32k, *.48k) are 16-bit integer 
PCM files with 8/16/32/48 kHz sampling rate with no headers. Alternatively, 
the input and output files are WAV files.

The Encoder produces bitstream files in either ITU G.192 or MIME file
storage format.

Using ITU G.192 format:	

For every 20 ms input audio frame, the encoded bitstream contains the
following data: 

    Word16 SyncWord
    Word16 DataLen
    Word16 1st DataBit
    Word16 2nd DataBit
    .
    .
    .
    Word16 Nth DataBit


The SyncWord from the encoder is always 0x6b21. If decoder receives
SyncWord as 0x6b20 it indicates that the current frame was received in
error (bad frame). 

The DataLen parameter gives the number of audio data bits in the
frame. For example using DTX, DataLen for NO_DATA frames is zero.

Each bit is presented as follows: Bit 0 = 0x007f, Bit 1 = 0x0081.

Using MIME file storage format: 

The MIME file storage format is a byte based format which is
appropriate for media file storage or as format for email/MMS
attachments. 

Encoder: With the "-mime" option, the encoder always produces EVS-mime
storage format specified in TS26.445 Annex.2.6. The AMRWB-mime(RFC4867)
storage format is not supported by the encoder. 

Decoder: With the "-mime" option, the decoder can parse both EVS-mime
format storage files and AMRWB-mime (RFC4867) storage format files. 
The decoder automatically distinguishes between the two
mime storage formats by reading the initial Magic Word in the bitstream
file. The EVS-mime storage format is described in TS 26.445, Annex
A.2.6. The AMRWB-mime storage format is described in RFC-4867. 


                      INSTALLING THE SOFTWARE
                      =======================

Installing the software on the PC:

First unpack the compressed folder into your directory. After that you 
should have the following structure:

.
`-- c-code
    |-- readme.txt
    |-- Makefile
    |-- Workspace_msvc
    |-- apps
    |-- lib_com
    |-- lib_debug
    |-- lib_dec
    |-- lib_enc
    |-- lib_isar
    |-- lib_lc3plus
    |-- lib_rend
    |-- lib_util
    |-- scripts

The package includes a Makefile for gcc, which has been verified on
32-bit Linux systems. The code can be compiled by entering the directory
"c-code" and typing the command: make. The resulting encoder/decoder/renderer/
ISAR_post_renderer executables are named "IVAS_cod", "IVAS_dec", "IVAS_rend",
and "ISAR_post_rend". All reside in the c-code directory. In addition, this 
directory will contain a version of the encoder with support for format switching 
(named "IVAS_cod_fmtsw") and an example program for Ambisonics format conversion 
(named "ambi_converter").

The package also includes a solution-file for Microsoft Visual Studio 2017 (x86). 
To compile the code, please open "Workspace_msvc\Workspace_msvc.sln" and build 
"encoder" for the encoder, "decoder" for the decoder, and "renderer" for the 
renderer executable. The resulting encoder/decoder/renderer/ISAR_post_renderer 
executables are "IVAS_cod.exe", "IVAS_dec.exe", "IVAS_rend.exe", and
"ISAR_post_rend.exe". All reside in the c-code main directory. In addition, this 
directory will contain a version of the encoder with support for format switching 
(named "IVAS_cod_fmtsw.exe") and an example program for Ambisonics format conversion 
(named "ambi_converter.exe").



                       INTEGRATION AS LIBRARIES
                       ========================

While this package contains the necessary applications to execute the IVAS encoder,
decoder, renderer and ISAR post renderer, it is envisioned that the libraries used
would be integrated into custom applications.

It should be noted that this library is not thread-safe by default. Thus, when using
the IVAS libraries in a multi-threaded environment, proper synchronization of API
calls is required to prevent race conditions by concurrent access to IVAS internal
state memory, FIFO queues buffers or any other data structures. Potential mechanisms
include e.g. mutexes, spinlocks and semaphores. The API calls are at the present not
optimized for fine-granular locking of just critical sections. Some sensitive
sections have thus been marked with a comment in the form /* LOCK XYZ BEGIN */ and
/* LOCK XYZ END */ to provide guidance where code could be modified to prevent
some potential race conditions.


                       RUNNING THE SOFTWARE
                       ====================

The usage of the "IVAS_cod" program is as follows:
--------------------------------------------------

Usage: IVAS_cod [Options] R Fs input_file bitstream_file

Mandatory parameters:
---------------------
R                   : Bitrate in bps,
                      for EVS native modes R = (5900*, 7200, 8000, 9600, 13200, 16400,
                                                24400, 32000, 48000, 64000, 96000, 128000)
                                                *VBR mode (average bitrate),
                      for AMR-WB IO modes R =  (6600, 8850, 12650, 14250, 15850, 18250,
                                                19850, 23050, 23850)
                      for IVAS stereo R =      (13200, 16400, 24400, 32000, 48000, 64000, 80000,
                                                96000, 128000, 160000, 192000, 256000)
                      for IVAS ISM R =          13200 for 1 ISM, 16400 for 1 ISM and 2 ISM,
                                               (24400, 32000, 48000, 64000, 80000, 96000,128000)
                                                for 2 ISM, 3 ISM and 4 ISM also 160000, 192000, 256000
                                                for 3 ISM and 4 ISM also 384000
                                                for 4 ISM also 512000
                      for IVAS SBA, MASA, MC, ISM-SBA, and ISM-MASA R=(13200, 16400, 24400, 32000, 
                      48000, 64000, 80000, 96000, 128000, 160000, 192000, 256000, 384000, 512000)
                      Alternatively, R can be a bitrate switching file which consists of R values
                      indicating the bitrate for each frame in bps. These values are stored in
                      binary format using 4 bytes per value
Fs                  : Input sampling rate in kHz, Fs = (8, 16, 32 or 48)
input_file          : Input audio filename
bitstream_file      : Output bitstream filename

Options:
--------
EVS mono is default, for IVAS choose one of the following: -stereo, -ism, -sba, -masa, -mc, -ism_sba, -ism_masa
-stereo             : Stereo format
-ism [+]Ch Files    : ISM format
                      where Ch specifies the number of ISMs (1-4)
                      where positive (+) indicates extended metadata (only 64 kbps and up)
                      and Files specify input files containing metadata, one file per object
                      (use NULL for no input metadata)
-sba +/-Order       : Scene Based Audio input format (Ambisonics ACN/SN3D),
                      where Order specifies the Ambisionics order (1-3),
                      where positive (+) means full 3D and negative (-) only 2D/planar components to be coded
-masa Ch File       : MASA format
                      where Ch specifies the number of MASA input/transport channels (1 or 2):
                      and File specifies input file containing parametric MASA metadata
-ism_sba IsmCh +/-Order IsmFiles : SBA and ISM combined format
                      where IsmCh specifies the number of ISMs (1-4),
                      Order specifies the Ambisionics order (1-3)
                      and IsmFiles specify input files containing ISM metadata, one file per object
-ism_masa IsmCh MasaCh IsmFiles MasaFile : MASA and ISM combined format
                      where IsmCh specifies the number of ISMs (1-4),
                      MasaCh specifies the number of MASA input/transport channels (1-2),
                      IsmFiles specify input files containing ISM metadata, one file per object,
                      and MasaFile specifies input file containing parametric MASA metadata
-mc InputConf       : Multi-channel format
                      where InputConf specifies the channel configuration: 5_1, 7_1, 5_1_2, 5_1_4, 7_1_4
                      Loudspeaker positions are assumed to have azimuth and elevation as per
                      ISO/IEC 23091-3:2018 Table 3. Channel order is as per ISO/IEC 23008-3:2015 Table 95.
                      See below for details.
-dtx D              : Activate DTX mode, D = (0, 3-100) is the SID update rate
                      where 0 = adaptive, 3-100 = fixed in number of frames, default is deactivated
-dtx                : Activate DTX mode with a SID update rate of 8 frames
                      Note: DTX is supported in EVS, stereo, ISM, MASA, and SBA up to 80kbps
-rf p o             : Activate channel-aware mode in EVS for WB and SWB signal at 13.2kbps,
                      where FEC indicator, p: LO or HI, and FEC offset, o: 2, 3, 5, or 7 in number of frames.
                      Alternatively p and o can be replaced by a rf configuration file with each line
                      contains the values of p and o separated by a space, default is deactivated
-max_band B         : Activate bandwidth limitation, B = (NB, WB, SWB or FB)
                      alternatively, B can be a text file where each line contains "nb_frames B"
-no_delay_cmp       : Turn off delay compensation
-stereo_dmx_evs     : Stereo downmix function for EVS
-binaural           : Optional indication that input is binaural audio (to be used with -stereo or -stereo_dmx_evs)
-mime               : Mime output bitstream file format
                      The encoder produces TS26.445 Annex.2.6 Mime Storage Format, (not RFC4867 Mime Format).
                      default output bitstream file format is G.192
-pca                : activate PCA in SBA format FOA at 256 kbps
-level level        : Complexity level, level = (1, 2, 3), will be defined after characterisation.
                      Currently, all values default to level 3 (full functionality).
-q                  : Quiet mode, limit printouts to terminal, default is deactivated
-rtpdump <N>        : RTPDump output, hf_only=1 by default. The encoder will packetize the
                      bitstream frames into TS26.253 Annex A IVAS RTP Payload Format packets and
                      writes those to the output file. In EVS mono operating mode, TS26.445 Annex A.2.2
                      EVS RTP Payload Format is used. Optional N represents number of frames per RTP packet
-scene_orientation  : Scene orientation trajectory file. Only used with rtpdump output.
-device_orientation : Device orientation trajectory file. Only used with rtpdump output.


The usage of the "IVAS_dec" program is as follows:
--------------------------------------------------

Usage for EVS:   IVAS_dec [Options] Fs bitstream_file output_file
                 OR usage for IVAS (below) with -evs option and OutputConf
Usage for IVAS:  IVAS_dec [Options] OutputConf Fs bitstream_file output_file

Mandatory parameters:
---------------------
OutputConf           : Output configuration: MONO, STEREO, 5_1, 7_1, 5_1_2, 5_1_4, 7_1_4, FOA,
                       HOA2, HOA3, BINAURAL, BINAURAL_ROOM_IR, BINAURAL_ROOM_REVERB, 
                       BINAURAL_SPLIT_CODED, BINAURAL_SPLIT_PCM, EXT
                       By default, channel order and loudspeaker positions are equal to the
                       encoder. For loudspeaker outputs, OutputConf can be a custom loudspeaker
                       layout file. See below for details.
                       Parameter is only used when decoding IVAS bitstream.
Fs                   : Output sampling rate in kHz (8, 16, 32 or 48)
bitstream_file       : Input bitstream filename or RTP packet filename (in VOIP mode)
output_file          : Output audio filename

Options:
--------
-evs                : Specify that the supplied bitstream is an EVS bitstream
-VOIP               : VoIP mode: RTP in G192
-VOIP_hf_only=0     : VoIP mode: EVS RTP Payload Format hf_only=0 in rtpdump
-VOIP_hf_only=1     : VoIP mode: EVS or IVAS RTP Payload Format hf_only=1 in rtpdump
                      The decoder may read rtpdump files containing TS26.445 Annex A.2.2
                      EVS RTP Payload Format or rtpdump files containing TS26.253 Annex A
                      IVAS RTP Payload Format. The SDP parameter hf_only is required.
                      Reading RFC4867 AMR/AMR-WB RTP payload format is not supported.
-Tracefile TF       : VoIP mode: Generate trace file named TF. Requires -no_delay_cmp to
                      be enabled so that trace contents remain in sync with audio output.
-fec_cfg_file       : Optimal channel aware configuration computed by the JBM
                      as described in Section 6.3.1 of TS26.448. The output is
                      written into a .txt file. Each line contains the FER indicator
                      (HI|LO) and optimal FEC offset.
-no_delay_cmp       : Turn off delay compensation
-mime               : Mime bitstream file format
                      The decoder may read both TS26.445 Annex.2.6 and RFC4867 Mime Storage
                      Format files, the magic word in the mime file is used to determine
                      which of the two supported formats is in use.
                      default bitstream file format is G.192
-fr L               : render frame size in ms L=(5,10,20), default is 20
-hrtf File          : HRTF filter File used in BINAURAL rendering
-T File             : Head rotation specified by external trajectory File
-otr tracking_type  : Head orientation tracking type: 'none', 'ref', 'avg', 'ref_vec'
                      or 'ref_vec_lev' (only for binaural rendering)
-rf File            : Reference rotation specified by external trajectory File
                      works only in combination with '-otr ref' mode
-rvf File           : Reference vector specified by external trajectory File
                      works only in combination with '-otr ref_vec' and 'ref_vec_lev' modes
-render_config File : Binaural renderer configuration parameters in File (only for binaural outputs)
-room_size (S|M|L)  : Selects default reverb based on a room size (S - small | M - medium | L - large)
                      for BINAURAL_ROOM_REVERB output configuration
-non_diegetic_pan P : panning mono non-diegetic sound to stereo -90<= P <=90,
                      left or l or 90->left, right or r or -90->right, center or c or  0->middle
-exof File          : External orientation trajectory File for simulation of external orientations
-dpid ID            : Directivity pattern ID(s) (space-separated list of up to 4 numbers can be 
                      specified) for binaural output configurations
-aeid ID | File     : Acoustic environment ID (number > 0) or a text file where each line 
                      contains "ID duration" for BINAURAL_ROOM_REVERB output configuration
-obj_edit File      : Object editing instructions file or NULL for built-in example
-level level        : Complexity level, level = (1, 2, 3), will be defined after characterisation
                      Currently, all values default to level 3 (full functionality)
-om File            : Coded metadata File for BINAURAL_SPLIT_PCM output configuration
-q                  : Quiet mode, limit printouts to terminal, default is deactivated


The usage of the "IVAS_rend" program is as follows:
---------------------------------------------------

Usage: IVAS_rend [Options]

Options:
--------
-i File             : Input audio File (WAV, raw PCM or scene description file)
-if Format          : Audio Format of input file (e.g. 5_1 or HOA3 or META, use -l for a list)
                      META is related to the Scene description file, see scripts/testv/renderer_config_format_readme.txt
-im Files           : Metadata files for ISM/MASA/OMASA/OSBA/BINAURAL_SPLIT_PCM (one file per object). 
                      For OMASA input, ISM files must be specified first.
-o File             : Output audio File
-of Format          : Audio Format of output file
                      Alternatively, it can be a custom loudspeaker layout File
-fs                 : Input sampling rate in kHz (16, 32, 48) - required only with raw PCM inputs
-fr L               : render frame size in ms L=(5,10,20), default is 20
-hrtf File          : Custom HRTF File for binaural rendering (only for binaural outputs)
-T File             : Head rotation trajectory File for simulation of head tracking (only for binaural outputs)
-otr tracking_type  : Head orientation tracking type: 'none', 'ref', 'avg' or `ref_vec` or `ref_vec_lev` (only for binaural outputs)
-rf File           	: Reference rotation trajectory File for simulation of head tracking (only for binaural outputs)
-rvf File           : Reference vector trajectory File for simulation of head tracking (only for binaural outputs)
-render_config File : Binaural renderer configuration parameters in File (only for binaural outputs)
-room_size (S|M|L)  : Selects default reverb based on a room size (S - small | M - medium | L - large)
-non_diegetic_pan P : Panning mono non-diegetic sound to stereo -90<= P <= 90
                      left or l or 90->left, right or r or -90->right, center or c or 0 ->middle
-exof File          : External orientation trajectory File for simulation of external orientations
-dpid ID            : Directivity pattern ID(s) (space-separated list of up to 4 numbers can be 
                      specified) for binaural output configurations
-aeid ID | File     : Acoustic environment ID (number > 0) or a text file where each line 
                      contains "ID duration" for BINAURAL_ROOM_REVERB output configuration
-lp Position        : Output LFE position. Comma-delimited triplet of [gain, azimuth, elevation] where gain is linear 
                      (like --gain, -g) and azimuth, elevation are in degrees
                      If specified, overrides the default behavior which attempts to map input to output LFE channel(s)
-lm File            : LFE panning matrix File (CSV table) containing a matrix of dimensions 
                      [ num_input_lfe x num_output_channels ] with elements specifying linear routing gain (like --gain, -g). 
                      If specified, overrides the output LFE position option and the default behavior which attempts to map input to output LFE channel(s)
-no_delay_cmp       : Turn off delay compensation
-g                  : Input gain (linear, not in dB) to be applied to input audio file
-l                  : List supported audio formats
-smd                : Metadata Synchronization Delay in ms, Default is 0. Quantized by 5ms subframes.
-om File            : Coded metadata File for BINAURAL_SPLIT_PCM output configuration
-level level        : Complexity level, level = (1, 2, 3), will be defined after characterisation
                      Currently, all values default to level 3 (full functionality).
-q                  : Quiet mode, limit printouts to terminal, default is deactivated


The usage of the "ISAR_post_rend" program is as follows:
--------------------------------------------------------

Usage: ISAR_post_rend [options]

Options:
--------
-i File             : Input File (input file is bitstream if format is BINAURAL_SPLIT_CODED, or PCM/WAV file if format is BINAURAL_SPLIT_PCM)
-if Format          : Input Format of input (BINAURAL_SPLIT_CODED, BINAURAL_SPLIT_PCM)
-im File            : Coded metadata File for BINAURAL_SPLIT_PCM input format
-o File             : Output Audio File in BINAURAL format
-fs                 : Input sampling rate in kHz (48)
-prbfi File         : BFI File


The usage of the "ambi_converter" program is as follows:
--------------------------------------------------------

Usage: ambi_converter input_file output_file input_convention output_convention

input_convention and output convention must be an integer number in [0,5]
the following conventions are supported:
0 : ACN-SN3D
1 : ACN-N3D
2 : FuMa-MaxN
3 : FuMa-FuMa
4 : SID-SN3D
5 : SID-N3D

Either the input or the output convention must always be ACN-SN3D.


The usage of the "IVAS_cod_fmtsw" program is as follows:
--------------------------------------------------------

Usage: IVAS_cod_fmtsw format_switching_file

Mandatory parameters:
---------------------
format_switching_file:   Text file containing a valid encoder command line in each line



                       MULTICHANNEL LOUDSPEAKER INPUT / OUTPUT CONFIGURATIONS
                       ======================================================
The loudspeaker positions for each MC layouts are assumed to have the following azimuth and elevation
(as per ISO/IEC 23091-3:2018 Table 3), 4th channel is LFE:
              5_1   -> CICP6:  azi |  30| -30|   0|   0| 110|-110|
                               ele |   0|   0|   0|   0|   0|   0|
              7_1   -> CICP12: azi |  30| -30|   0|   0| 110|-110| 135|-135|
                               ele |   0|   0|   0|   0|   0|   0|   0|   0|
              5_1_2 -> CICP14: azi |  30| -30|   0|   0| 110|-110|  30| -30|
                               ele |   0|   0|   0|   0|   0|   0|  35|  35|
              5_1_4 -> CICP16: azi |  30| -30|   0|   0| 110|-110|  30| -30| 110|-110|
                               ele |   0|   0|   0|   0|   0|   0|  35|  35|  35|  35|
              7_1_4 -> CICP19: azi |  30| -30|   0|   0| 135|-135|  90| -90|  30| -30| 135|-135|
                               ele |   0|   0|   0|   0|   0|   0|   0|   0|  35|  35|  35|  35|
Position is not considered for the LFE channel. Channel order is as per ISO/IEC 23008-3:2015 Table 95.

Additionally, at the decoder, OutputConf can be a custom loudspeaker layout file with the format:
               azi0, azi1, ... aziN-1
               ele0, ele1, ... eleN-1
               LFE0                  [optional]
Where the first two rows are comma separated azimuth and elevation positions of the N loudspeakers. 
The output channel ordering is 0, 1, ... N-1. The third row contains an index "LFE0" (zero based) 
specifying the output channel to which the LFE input will be routed if present. If the third row is 
omitted, the LFE input is downmixed to all channels with a factor of 1/N. Position is not considered for
the LFE channel. Maximum number of supported loudskpeakers N is 16.
An example custom loudspeaker layout file is available: ls_setup_16ch_8+4+4.txt


                       RUNNING THE SELF TEST
                       =====================

A codec verification script is available at https://forge.3gpp.org/rep/ivas-codec-pc/ivas-codec/ 
in scripts/self_test.py. The script demonstrates how to use the software at several operating points 
and compares the output to a reference version/implementation. 
Please note: In order to keep the run-time short it does not cover all operating
points or complete coverage.

Documentation on the self_test.py can be found as a part of scripts/README.md.

Note: Running the self_test.py requires the input vectors in the folder scripts/testv. 

stv1ISM48s.wav     - 1 channel (1 audio object), 48000 Hz, 1440000 samples
stv2ISM48s.wav     - 2 channels (discrete audio objects), 48000 Hz, 1440000 samples per channel
stv2OA32c.wav      - 9 channels (2nd order Ambisonics ACN/SN3D), 32000 Hz 
stv2OA48c.wav      - 9 channels (2nd order Ambisonics ACN/SN3D), 48000 Hz
stv3ISM48s.wav     - 3 channels (discrete audio objects), 48000 Hz, 1440000 samples per channel
stv3OA32c.wav      - 16 channels (3rd order Ambisonics ACN/SN3D), 32000 Hz, 288939 samples per channel
stv3OA48c.wav      - 16 channels (3rd order Ambisonics ACN/SN3D), 48000 Hz, 433408 samples per channel
stv4ISM48s.wav     - 4 channel (discrete audio objects), 48000 Hz, 1440000 samples per channel
stv4ISM48n.wav     - 4 channel (discrete audio objects), 48000 Hz, noisy speech
stv8c.wav          - 1 channel, 8000 Hz, clean speech/audio
stv8n.wav          - 1 channel, 8000 Hz, noisy speech
stv16c.wav         - 1 channel, 16000 Hz, 610307 samples, clean speech 
stv16n.wav         - 1 channel, 16000 Hz, 257024 samples, noisy speech
stv32c.wav         - 1 channel, 32000 Hz, 1220613 samples, clean speech/audio
stv32n.wav         - 1 channel, 32000 Hz, 514048 samples, noisy speech
stv48c.wav         - 1 channel, 48000 Hz, 960000 samples, clean speech/audio
stv48n.wav         - 1 channel, 48000 Hz, 931200 samples, noisy clean speech
stv51MC48c.wav     - 6 channels (5.1 1..6 where 4th channel is LFE), 960000 samples per channel, 48000 Hz
stv512MC48c.wav    - 8 channels (5.1+2 1..8 where 4th channel is LFE), 144000 samples per channel, 48000 Hz
stv514MC48c.wav    - 10 channels (7.1+2 1..10 where 4th channel is LFE), 144000 samples per channel, 48000 Hz
stv71MC48c.wav     - 8 channels (7.1 1..8 where 4th channel is LFE), 144000 samples per channel, 48000 Hz
stv714MC48c.wav    - 12 channels (7.1+4 1..12 where 4th channel is LFE), 144000 samples per channel, 48000 Hz
stvFOA16c.wav      - 4 channels (1st order Ambisonics ACN/SN3D), 16000 Hz,
stvFOA32c.wav      - 4 channels (1st order Ambisonics ACN/SN3D), 32000 Hz, 288939 samples per channel
stvFOA48c.wav      - 4 channels (1st order Ambisonics ACN/SN3D), 48000 Hz, 433408 samples per channel
stvST16c.wav       - 2 channels, 16000 Hz, 329601 samples per channel, clean speech/audio
stvST16n.wav       - 2 channels, 16000 Hz, 310401 samples per channel, noisy speech
stvST32c.wav       - 2 channels, 32000 Hz, 659200 samples per channel, clean speech/audio
stvST32n.wav       - 2 channels, 32000 Hz, 620800 samples per channel, noisy speech
stvST48c.wav       - 2 channels, 48000 Hz, 988800 samples per channel, clean speech/audio
stvST48n.wav       - 2 channels, 48000 Hz, 931200 samples per channel, noisy speech
stv1MASA1TC48c.wav - 1 channel (1 MASA 1 transport channel), 48000 Hz, 48000 Hz, 144000 samples 
stv1MASA1TC48n.wav - 1 channel (1 MASA 1 transport channel), 48000 Hz, 48000 Hz, 963840 samples
stv1MASA2TC48c.wav - 2 channels (2 MASA 2 transport channels), 48000 Hz, 48000 Hz, 288000 samples per channel
stv1MASA2TC48n.wav - 2 channels (2 MASA 2 transport channels), 48000 Hz, 48000 Hz, 963840 samples per channel
stv2MASA1TC48c.wav - 1 channel (1 MASA 1 transport channel), 48000 Hz, 48000 Hz, 288000
stv2MASA2TC48c.wav - 2 channels (2 MASA 2 transport channels), 48000 Hz, 48000 Hz, 144000 samples per channel
stvOMASA_1ISM_1MASA2TC48c.wav - 3 channels (1 discrete audio object and 1 MASA 2 transport channels), 48000 Hz
stvOMASA_1ISM_2MASA1TC32c.wav - 2 channels (1 discrete audio object and 2 MASA 1 transport channel), 32000 Hz 
stvOMASA_1ISM_2MASA2TC48c.wav - 3 channels (1 discrete audio object and 2 MASA 2 transport channels), 48000 Hz
stvOMASA_2ISM_1MASA1TC16c.wav - 3 channels (2 discrete audio object and 1 MASA 1 transport channel), 48000 Hz
stvOMASA_2ISM_1MASA2TC48c.wav - 4 channels (2 discrete audio object and 1 MASA 2 transport channels), 16000 Hz
stvOMASA_2ISM_2MASA2TC48c.wav - 4 channels (2 discrete audio object and 2 MASA 2 transport channels), 48000 Hz
stvOMASA_3ISM_1MASA1TC32c.wav - 4 channels (3 discrete audio object and 1 MASA 1 transport channel), 32000 Hz
stvOMASA_3ISM_1MASA2TC16c.wav - 5 channels (3 discrete audio object and 1 MASA 2 transport channels), 16000 Hz
stvOMASA_3ISM_1MASA2TC32c.wav - 5 channels (3 discrete audio object and 1 MASA 2 transport channels), 32000 Hz
stvOMASA_3ISM_1MASA2TC48c.wav - 5 channels (3 discrete audio object and 1 MASA 2 transport channels), 32000 Hz
stvOMASA_3ISM_2MASA1TC48c.wav - 4 channels (3 discrete audio object and 2 MASA 1 transport channel), 48000 Hz
stvOMASA_3ISM_2MASA2TC32c.wav - 5 channels (3 discrete audio object and 2 MASA 2 transport channels), 32000 Hz
stvOMASA_3ISM_2MASA2TC48c.wav - 5 channels (3 discrete audio object and 2 MASA 2 transport channels), 48000 Hz
stvOMASA_4ISM_1MASA1TC48c.wav - 5 channels (4 discrete audio object and 1 MASA 1 transport channel), 48000 Hz
stvOMASA_4ISM_1MASA2TC48c.wav - 6 channels (4 discrete audio object and 1 MASA 2 transport channels), 48000 Hz
stvOMASA_4ISM_2MASA1TC48c.wav - 5 channels (4 discrete audio object and 2 MASA 1 transport channel), 48000 Hz
stvOMASA_4ISM_2MASA2TC48c.wav - 6 channels (4 discrete audio object and 2 MASA 2 transport channels), 48000 Hz

MASA metadata file
------------------
For the MASA operation modes, in addition the following metadata files
located in /scripts/testv/ folder are required:

stv1MASA1TC48c.met
stv1MASA1TC48n.met
stv1MASA2TC48c.met
stv1MASA2TC48n.met
stv2MASA1TC48c.met
stv2MASA2TC48c.met

The detailed syntax of MASA metadata files can be found in 3GPP TS 26.258.

It is strongly recommended to align these files to the corresponding
PCM audio files. The MASA metadata files can be generated with the
latest version of the IVAS MASA C Reference Software, which was made
available at
https://www.3gpp.org/ftp/TSG_SA/WG4_CODEC/TSGS4_118-e/Docs/S4-220443.zip


Object based audio metadata file
--------------------------------
For the ISM operation modes, in addition the following metadata files
located at /scripts/testv/ folder are required:

stvISM1.csv
stvISM2.csv
stvISM3.csv
stvISM4.csv

These are comma separated files (csv) which indicate the per object position
in the format:
azimuth, elevation, radius, spread, gain, yaw, pitch, non-diegetic

Example metadata line with default values:
0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0

with the following meaning:
| Parameter    | format, value range | meaning
---------------------------------------------------------------------------------------------------
| azimuth      | float, [-180,180]   | azimuth or panning; positive indicates left; default: 0
---------------------------------------------------------------------------------------------------
| elevation    | float, [-90,90]     | elevation; positive indicates up; default: 0
---------------------------------------------------------------------------------------------------
| radius       | float, [0, 15.75]   | radius (extended metadata); default: 1
---------------------------------------------------------------------------------------------------
| spread       | float, [0,360]      | spread in angles from 0...360 deg; default: 0
---------------------------------------------------------------------------------------------------
| gain         | float, [0,1]        | gain; default: 1
---------------------------------------------------------------------------------------------------
| yaw          | float, [-180,180]   | yaw (extended metadata); positive indicates left; default: 0
---------------------------------------------------------------------------------------------------
| pitch        | float, [-90,90]     | pitch (extended metadata); positive indicates up; default: 0
---------------------------------------------------------------------------------------------------
| non-diegetic | float*, [0 1]       | Flag for activation of non-diegetic rendering; default: 0
|                                    | if Flag is set to 1, panning gain is specified by azimuth.
|                                    | Value between [-90,90], 90 left, -90 right, 0 center
---------------------------------------------------------------------------------------------------
*Read as float value for convenience, but used as an integer flag internally.

The metadata reader accepts 1-8 values specified per line. If a value is not specified, the default
value is assumed.


HRTF filter file
----------------
For the HRTF filter File option, external HRTF filter Files are available in folder
/scripts/binauralRenderer_interface/binaural_renderers_hrtf_data:

ivas_binaural_16kHz.bin
ivas_binaural_32kHz.bin
ivas_binaural_48kHz.bin

The HRTF filter file has a specific container format with a header and a sequence of entries. The 
detailed syntax can be found in 3GPP TS 26.258.


Head rotation trajectory file
-----------------------------

Input data representing the current rotation of the listeners head can be provided to the decoder 
in an ASCII formatted file comprising four columns separated by commas. These columns contain 
floating-point numbers representing either a quaternion or a Euler angle. The distinction between 
these two input formats is made by a magic number in the first column. If this value is set to -3.0, 
it is assumed that the remaining three columns contain three Euler angles. Otherwise, all four 
columns are interpreted as a Quaternion. The input is expected to have one line for each subframe of 5 ms. 

In the case of Quaternion-based input, the columns are the w, x, y, z components of a unit quaternion. 
Proper normalization to 1 shall be maintained in the input. The coordinate system is defined such that 
the x-axis points from the left to the right ear, the y axis points into the direction of view, and the 
z axis point from bottom to top. The origin is in the center of the head. 

In the case of Euler angle input, the first column contains the magic number -3.0, and the next three 
columns are the Euler angles yaw, pitch, and roll. The rotations are applied in the order yaw-pitch-roll. 
The yaw angle rotates around the z axis, the pitch angle rotates aroud the new y axis, and the roll angle 
rotates around the new x axis.

In case of 6 DoF support for rendering, the head rotation trajectory file may also include a listener 
position in absolute Cartesian coordinates on the x-, y- and z-axis. Note that the listener position is 
expressed in absolute coordinates, while the listener orientation is expressed as scene displacement. 
An example line from a headtracking file of a listener facing forward, positioned at x=3.0, y=4.0 and z=0, 
could be:

-3.0,0.0,0.0,0.0,3.0,4.0,0.0

Note that the listener position applies for listener orientation expressed both in Quaternions and Euler angles.


For the Head rotation operation modes, external trajectory files are available:

headrot.csv 
headrot_case00_3000_q.csv 
headrot_case01_3000_q.csv 
headrot_case02_3000_q.csv 
headrot_case03_3000_q.csv


Reference rotation/vector file
------------------------------
The external reference orientation of the orientation tracking feature can either be provided as a 
rotation (Quaternion or Euler angles) or as a pair of 3-dimensional positions (listener position 
and acoustic reference position). 

The Reference Rotation format is identical to Head rotation trajectory file.

The Reference Vector file format describes a pair of x/y/z positions, one for the listener and one 
for the acoustic reference. The acoustic reference direction is defined by the vector from the 
listener towards the acoustic reference position. The reference vector file is a CSV file with 
comma as separator. Each line must contain a listener and an acoustic reference position in the 
following order:
    x axis position of the listener. 
    y axis position of the listener. 
    z axis position of the listener. 
    x axis position of the acoustic reference. 
    y axis position of the acoustic reference. 
    z axis position of the acoustic reference. 

For Reference vector specified by external trajectory file, example files are available in folder 
/scripts/trajectories.


External orientation file
-------------------------
The external orientation file provides orientation information for any non-listener dependent orientations. 
The orientations shall be given as floating point quaternions to the decoder/renderer in (w, x, y, z) order. 
Additional information may be given as HeadRotIndicator, ExtOriIndicator, ExtIntrpFlag and ExtIntrpNFrames. 
Each entry line represents a sub-frame entry, where the sub-frame resolution is 5ms. In the processing, the 
quaternions are inverted to act as a rotation instead of orientation. 

The detailed syntax can be found in 3GPP TS 26.258.


Renderer config file
--------------------
The renderer configuration file provides metadata for controlling the rendering process. This metadata 
includes acoustics environment parameters and source directivity. The data can be provided using 
binary bitstream or a text file. 

The detailed syntax can be found in 3GPP TS 26.258.

Example renderer configuration files are available, e.g.:

rend_rend_config_hospital_patientroom.cfg
rend_config_recreation.cfg
rend_config_renderer.cfg


Object editing file
-------------------
The parameters for the object editing in decoder for the supported formats can be provided via a text
parameter file. Each row of the file corresponds to one 20 ms IVAS frame. The row contains one or more
of the following parameters separated by a comma:

bg_gain=<float>           linear gain to be applied on the SBA/MASA component in OSBA/OMASA, no effect for ISM
obj_<int>_gain=<float>    linear gain to be applied on object <int>, 0-based indexing
obj_<int>_relgain=0|1     if 1, obj_<int>_gain is interpreted as a relative modification. default is absolute modification
obj_<int>_azi=<float>     azimuth angle in degrees to be applied on object <int>, 0-based indexing
obj_<int>_relazi=0|1      if 1, obj_<int>_azi is interpreted as a relative modification. default is absolute modification
obj_<int>_ele=<float>     elevation angle in degrees to be applied on object <int>, 0-based indexing
obj_<int>_relele=0|1      if 1, obj_<int>_ele is interpreted as a relative modification. default is absolute modification
obj_<int>_radius=<float>  linear radius to be applied on object <int>, 0-based indexing
obj_<int>_relradius=0|1   if 1, obj_<int>_radius is interpreted as a relative modification. default is absolute modification
obj_<int>_yaw=<float>     yaw angle in degrees to be applied on object <int>, 0-based indexing
obj_<int>_relyaw=0|1      if 1, obj_<int>_yaw is interpreted as a relative modification. default is absolute modification
obj_<int>_pitch=<float>   pitch angle in degrees to be applied on object <int>, 0-based indexing
obj_<int>_relpitch=0|1    if 1, obj_<int>_pitch is interpreted as a relative modification. default is absolute modification

If a parameter is not specified, that parameter is not edited. An empty line in the file corresponds to
not editing any parameter in the item.
Example files are available in folder /scripts/object_edit.


RTP streaming file
-------------------
IVAS supports a simple packing and unpacking for streaming file for the RTP. In this format a single RTP_streaming_packet
contains the length of an RTP packet followed by the actual RTP packet which is recorded as-is. This format is produced
by the encoder when using the -rtpdump switch and the decoder assumes this format in the input when -VOIP_hf_only=1 is set.

typedef struct {
  u_int32 length;                   /* size of the RTP packet in bytes  */
  (u_int8 * length) RTP_packet;     /* RTP packet (sized length * byte) */
} RTP_streaming_packet;
