ArthurZ HF staff commited on
Commit
ec73fe3
1 Parent(s): be02c2a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +11 -4
README.md CHANGED
@@ -118,15 +118,22 @@ Nachmani et al., 2020; Chazan et al., 2021).
118
 
119
  ### Results
120
 
121
- The results of the evaluation demonstrate the superiority of EnCodec compared to the baselines across different bandwidths (1.5, 3, 6, and 12 kbps). Figure 3 provides an overview of the streamable setup results, while Table 1 offers a category-wise breakdown. Although alternative quantizers such as Gumbel-Softmax and DiffQ were explored, their preliminary results did not surpass or match the performance of EnCodec, so they are not included in the report.
122
 
123
- When comparing EnCodec with the baselines at the same bandwidth, EnCodec consistently outperforms them in terms of MUSHRA score. Notably, EnCodec achieves better performance, on average, at 3 kbps compared to Lyra-v2 at 6 kbps and Opus at 12 kbps. Additionally, by incorporating the language model over the codes, it is possible to achieve a bandwidth reduction of approximately 25-40%. For example, the bandwidth of the 3 kbps model can be reduced to 1.9 kbps.
 
 
 
124
 
125
- Furthermore, it is observed that as the bandwidth increases, the compression ratio decreases. This behavior can be attributed to the small size of the Transformer model used, which makes it challenging to effectively model all codebooks together.
126
 
127
  #### Summary
128
 
129
- EnCodec is a state-of-the-art real-time neural audio compression model that excels in producing high-fidelity audio samples at various sample rates and bandwidths. The model's performance was evaluated across different settings, ranging from 24kHz monophonic at 1.5 kbps to 48kHz stereophonic, showcasing both subjective and objective results (Figure 3 and Table 4). Notably, EnCodec incorporates a novel spectrogram-only adversarial loss, effectively reducing artifacts and enhancing sample quality. Training stability and interpretability were further enhanced through the introduction of a gradient balancer for the loss weights. Additionally, the study demonstrated that a compact Transformer model can be employed to achieve an additional bandwidth reduction of up to 40% without compromising quality, particularly in applications where low latency is not critical (e.g., music streaming).
 
 
 
 
 
130
 
131
 
132
  ## Citation
 
118
 
119
  ### Results
120
 
121
+ The results of the evaluation demonstrate the superiority of EnCodec compared to the baselines across different bandwidths (1.5, 3, 6, and 12 kbps).
122
 
123
+ When comparing EnCodec with the baselines at the same bandwidth, EnCodec consistently outperforms them in terms of MUSHRA score.
124
+ Notably, EnCodec achieves better performance, on average, at 3 kbps compared to Lyra-v2 at 6 kbps and Opus at 12 kbps.
125
+ Additionally, by incorporating the language model over the codes, it is possible to achieve a bandwidth reduction of approximately 25-40%.
126
+ For example, the bandwidth of the 3 kbps model can be reduced to 1.9 kbps.
127
 
 
128
 
129
  #### Summary
130
 
131
+ EnCodec is a state-of-the-art real-time neural audio compression model that excels in producing high-fidelity audio samples at various sample rates and bandwidths.
132
+ The model's performance was evaluated across different settings, ranging from 24kHz monophonic at 1.5 kbps to 48kHz stereophonic, showcasing both subjective and
133
+ objective results. Notably, EnCodec incorporates a novel spectrogram-only adversarial loss, effectively reducing artifacts and enhancing sample quality.
134
+ Training stability and interpretability were further enhanced through the introduction of a gradient balancer for the loss weights.
135
+ Additionally, the study demonstrated that a compact Transformer model can be employed to achieve an additional bandwidth reduction of up to 40% without compromising
136
+ quality, particularly in applications where low latency is not critical (e.g., music streaming).
137
 
138
 
139
  ## Citation