amiriparian commited on
Commit
6142f96
1 Parent(s): 49891e2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +47 -103
README.md CHANGED
@@ -207,7 +207,18 @@ for epoch in range(num_epochs):
207
 
208
 
209
  ### Citation Info
 
210
 
 
 
 
 
 
 
 
 
 
 
211
 
212
  ```
213
  @inproceedings{Amiriparian24-EEH,
@@ -222,205 +233,138 @@ for epoch in range(num_epochs):
222
  month = {September},
223
  publisher = {ISCA},
224
  }
225
-
226
-
227
  ```
 
228
 
229
  ### References
230
 
231
  <small>
232
  <a id="1">[1]</a>
233
- B. Schuller, D. Arsic, G. Rigoll, M. Wimmer, and B. Radig. Audiovisual Behavior
234
- Modeling by Combined Feature Spaces. In 2007 IEEE International Conference on
235
- Acoustics, Speech and Signal Processing - ICASSP ’07, volume 2, pages II–733–II–
236
- 736, Apr. 2007.
237
 
238
 
239
  <a id="2">[2]</a>
240
- M. Gerczuk, S. Amiriparian, S. Ottl, and B. W. Schuller. EmoNet: A Transfer
241
- Learning Framework for Multi-Corpus Speech Emotion Recognition. IEEE Trans-
242
- actions on Affective Computing, 14(2):1472–1487, Apr. 2023.
243
 
244
 
245
  <a id="3">[3]</a>
246
- T. L. Nwe, S. W. Foo, and L. C. De Silva. Speech emotion recognition using hidden
247
- Markov models. Speech Communication, 41(4):603–623, Nov. 2003.
248
 
249
 
250
  <a id="4">[4]</a>
251
- The selected speech emotion database of institute of automation chineseacademy of
252
- sciences (casia). http://www.chineseldc.org/resource_info.php?rid=76. accessed March 2024.
253
 
254
 
255
  <a id="5">[5]</a>
256
- P. Liu and M. D. Pell. Recognizing vocal emotions in Mandarin Chinese: A val-
257
- idated database of Chinese vocal emotional stimuli. Behavior Research Methods,
258
- 44(4):1042–1051, Dec. 2012.
259
 
260
 
261
  <a id="6">[6]</a>
262
- H. Cao, D. G. Cooper, M. K. Keutmann, R. C. Gur, A. Nenkova, and R. Verma.
263
- CREMA-D: Crowd-sourced Emotional Multimodal Actors Dataset. IEEE transactions on affective computing, 5(4):377–390, 2014.
264
 
265
 
266
 
267
  <a id="7">[7]</a>
268
- I. S. Engberg, A. V. Hansen, O. K. Andersen, and P. Dalsgaard. Design Record-
269
- ing and Verification of a Danish Emotional Speech Database: Design Recording
270
- and Verification of a Danish Emotional Speech Database. EUROSPEECH’97 : 5th
271
- European Conference on Speech Communication and Technology, Patras, Rhodes,
272
- Greece, 22-25 September 1997, pages Vol. 4, pp. 1695–1698, 1997.
273
 
274
 
275
 
276
  <a id="8">[8]</a>
277
- E. Parada-Cabaleiro, G. Costantini, A. Batliner, M. Schmitt, and B. W. Schuller.
278
- DEMoS: An Italian emotional speech corpus. Language Resources and Evaluation,
279
- 54(2):341–383, June 2020.
280
-
281
 
282
  <a id="9">[9]</a>
283
- B. Schuller. Automatische Emotionserkennung Aus Sprachlicher Und Manueller
284
- Interaktion. PhD thesis, Technische Universit¨at M¨unchen, 2006.
285
 
286
 
287
  <a id="10">[10]</a>
288
- F. Burkhardt, A. Paeschke, M. Rolfes, W. F. Sendlmeier, and B. Weiss. A database
289
- of German emotional speech. In Interspeech 2005, pages 1517–1520. ISCA, Sept.
290
- 2005.
291
 
292
 
293
  <a id="11">[11]</a>
294
- E. Parada-Cabaleiro, G. Costantini, A. Batliner, A. Baird, and B. Schuller.
295
- Categorical vs Dimensional Perception of Italian Emotional Speech. In Interspeech 2018,
296
- pages 3638–3642. ISCA, Sept. 2018.
297
 
298
 
299
  <a id="12">[12]</a>
300
- A. Dhall, R. Goecke, J. Joshi, K. Sikka, and T. Gedeon. Emotion Recognition In
301
- The Wild Challenge 2014: Baseline, Data and Protocol. In Proceedings of the 16th
302
- International Conference on Multimodal Interaction, ICMI ’14, pages 461–466, New
303
- York, NY, USA, Nov. 2014. Association for Computing Machinery.
304
 
305
 
306
  <a id="13">[13]</a>
307
- G. Costantini, I. Iaderola, A. Paoloni, and M. Todisco. EMOVO Corpus: An Italian
308
- Emotional Speech Database. In N. Calzolari, K. Choukri, T. Declerck, H. Loftsson,
309
- B. Maegaard, J. Mariani, A. Moreno, J. Odijk, and S. Piperidis, editors, Proceed-
310
- ings of the Ninth International Conference on Language Resources and Evaluation
311
- (LREC’14), pages 3501–3504, Reykjavik, Iceland, May 2014. European Language
312
- Resources Association (ELRA).
313
-
314
 
315
 
316
  <a id="14">[14]</a>
317
- O. Martin, I. Kotsia, B. Macq, and I. Pitas. The eNTERFACE’ 05 Audio-Visual
318
- Emotion Database. In 22nd International Conference on Data Engineering Work-
319
- shops (ICDEW’06), pages 8–8, Apr. 2006.
320
-
321
 
322
 
323
 
324
  <a id="15">[15]</a>
325
- K. Zhou, B. Sisman, R. Liu, and H. Li. Seen and Unseen emotional style transfer
326
- for voice conversion with a new emotional speech dataset, Feb. 2021.
327
-
328
 
329
 
330
  <a id="16">[16]</a>
331
- H. O’Reilly, D. Pigat, S. Fridenson, S. Berggren, S. Tal, O. Golan, S. B¨olte, S. Baron-
332
- Cohen, and D. Lundqvist. The EU-Emotion Stimulus Set: A validation study.
333
- Behavior Research Methods, 48(2):567–576, June 2016.
334
-
335
 
336
 
337
  <a id="17">[17]</a>
338
- A. Lassalle, D. Pigat, H. O’Reilly, S. Berggen, S. Fridenson-Hayo, S. Tal, S. Elfstr¨om,
339
- A. R˚ade, O. Golan, S. B¨olte, S. Baron-Cohen, and D. Lundqvist. The EU-Emotion
340
- Voice Database. Behavior Research Methods, 51(2):493–506, Apr. 2019.
341
 
342
 
343
  <a id="18">[18]</a>
344
- A. Batliner, S. Steidl, and E. Noth. Releasing a thoroughly annotated and processed
345
- spontaneous emotional database: The FAU Aibo Emotion Corpus. 2008.
346
 
347
 
348
  <a id="19">[19]</a>
349
- K. R. Scherer, T. B¨anziger, and E. Roesch. A Blueprint for Affective Computing:
350
- A Sourcebook and Manual. OUP Oxford, Sept. 2010.
351
 
352
 
353
  <a id="20">[20]</a>
354
- R. Banse and K. R. Scherer. Acoustic profiles in vocal emotion expression. Journal
355
- of Personality and Social Psychology, 70(3):614–636, 1996.
356
 
357
 
358
  <a id="21">[21]</a>
359
- C. Busso, M. Bulut, C.-C. Lee, A. Kazemzadeh, E. Mower, S. Kim, J. N. Chang,
360
- S. Lee, and S. S. Narayanan. IEMOCAP: Interactive emotional dyadic motion
361
- capture database. Language Resources and Evaluation, 42(4):335–359, Dec. 2008.
362
 
363
  <a id="22">[22]</a>
364
- M. M. Duville, L. M. Alonso-Valerdi, and D. I. Ibarra-Zarate. The Mexican Emo-
365
- tional Speech Database (MESD): Elaboration and assessment based on machine
366
- learning. Annual International Conference of the IEEE Engineering in Medicine
367
- and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual
368
- International Conference, 2021:1644–1647, Nov. 2021.
369
 
370
  <a id="23">[23]</a>
371
- S. Poria, D. Hazarika, N. Majumder, G. Naik, E. Cambria, and R. Mihalcea. MELD:
372
- A Multimodal Multi-Party Dataset for Emotion Recognition in Conversations, June
373
- 2019.
374
 
375
  <a id="24">[24]</a>
376
- S. R. Livingstone and F. A. Russo. The Ryerson Audio-Visual Database of Emo-
377
- tional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal
378
- expressions in North American English. PLOS ONE, 13(5):e0196391, May 2018.
379
 
380
 
381
  <a id="25">[25]</a>
382
- S. Haq and P. J. B. Jackson. Speaker-dependent audio-visual emotion recognition.
383
- In Proc. AVSP 2009, pages 53–58, 2009.
384
 
385
 
386
  <a id="26">[26]</a>
387
- O. Mohamad Nezami, P. Jamshid Lou, and M. Karami. ShEMO: A large-scale
388
- validated database for Persian speech emotion detection. Language Resources and
389
- Evaluation, 53(1):1–16, Mar. 2019.
390
 
391
 
392
  <a id="27">[27]</a>
393
- F. Schiel, S. Steininger, and U. T¨urk. The SmartKom Multimodal Corpus at BAS. In
394
- M. Gonz´alez Rodr´ıguez and C. P. Suarez Araujo, editors, Proceedings of the Third
395
- International Conference on Language Resources and Evaluation (LREC’02), Las
396
- Palmas, Canary Islands - Spain, May 2002. European Language Resources Association (ELRA).
397
 
398
 
399
  <a id="28">[28]</a>
400
- B. Schuller, F. Eyben, S. Can, and H. Feußner. Speech in Minimal Invasive Surgery - Towards an Affective Language Resource of Real-life Medical Operations. 2010.
401
 
402
 
403
  <a id="29">[29]</a>
404
- J. H. L. Hansen and S. E. Bou-Ghazale. Getting started with SUSAS: A speech under
405
- simulated and actual stress database. In Proc. Eurospeech 1997, pages 1743–1746,
406
- 1997.
407
-
408
 
409
 
410
  <a id="30">[30]</a>
411
- S. Sultana, M. S. Rahman, M. R. Selim, and M. Z. Iqbal. SUST Bangla Emotional
412
- Speech Corpus (SUBESCO): An audio-only emotional speech corpus for Bangla.
413
- PLOS ONE, 16(4):e0250173, Apr. 2021.
414
 
415
 
416
  <a id="31">[31]</a>
417
- M. K. Pichora-Fuller and K. Dupuis. Toronto emotional speech set (TESS), Feb.
418
- 2020.
419
-
420
 
421
 
422
  <a id="32">[32]</a>
423
- S. Latif, A. Qayyum, M. Usman, and J. Qadir. Cross Lingual Speech Emotion
424
- Recognition: Urdu vs. Western Languages. In 2018 International Conference on
425
- Frontiers of Information Technology (FIT), pages 88–93, Dec. 2018.
426
  <small>
 
207
 
208
 
209
  ### Citation Info
210
+ ExHuBERT has been accepted for presentation at INTERSPEECH 2024.
211
 
212
+ ```
213
+ @article{amiriparian2024exhubert,
214
+ title = {ExHuBERT: Enhancing HuBERT Through Block Extension and Fine-Tuning on 37 Emotion Datasets},
215
+ author = {Amiriparian, Shahin and Packa{\'n}, Filip and Gerczuk, Maurice and Schuller, Bj{\"o}rn W},
216
+ journal= {arXiv preprint arXiv:2406.10275},
217
+ year = {2024}
218
+ }
219
+ ```
220
+
221
+ <!--
222
 
223
  ```
224
  @inproceedings{Amiriparian24-EEH,
 
233
  month = {September},
234
  publisher = {ISCA},
235
  }
 
 
236
  ```
237
+ -->
238
 
239
  ### References
240
 
241
  <small>
242
  <a id="1">[1]</a>
243
+ B. Schuller, D. Arsic, G. Rigoll, M. Wimmer, and B. Radig. Audiovisual Behavior Modeling by Combined Feature Spaces. Proc. ICASSP 2007, Apr. 2007.
 
 
 
244
 
245
 
246
  <a id="2">[2]</a>
247
+ M. Gerczuk, S. Amiriparian, S. Ottl, and B. W. Schuller. EmoNet: A Transfer Learning Framework for Multi-Corpus Speech Emotion Recognition. IEEE Transactions on Affective Computing, 14(2):1472–1487, Apr. 2023.
 
 
248
 
249
 
250
  <a id="3">[3]</a>
251
+ T. L. Nwe, S. W. Foo, and L. C. De Silva. Speech emotion recognition using hidden Markov models. Speech Communication, 41(4):603–623, Nov. 2003.
 
252
 
253
 
254
  <a id="4">[4]</a>
255
+ The selected speech emotion database of institute of automation chineseacademy of sciences (casia). http://www.chineseldc.org/resource_info.php?rid=76. accessed March 2024.
 
256
 
257
 
258
  <a id="5">[5]</a>
259
+ P. Liu and M. D. Pell. Recognizing vocal emotions in Mandarin Chinese: A validated database of Chinese vocal emotional stimuli. Behavior Research Methods, 44(4):1042–1051, Dec. 2012.
 
 
260
 
261
 
262
  <a id="6">[6]</a>
263
+ H. Cao, D. G. Cooper, M. K. Keutmann, R. C. Gur, A. Nenkova, and R. Verma. CREMA-D: Crowd-sourced Emotional Multimodal Actors Dataset. IEEE transactions on affective computing, 5(4):377–390, 2014.
 
264
 
265
 
266
 
267
  <a id="7">[7]</a>
268
+ I. S. Engberg, A. V. Hansen, O. K. Andersen, and P. Dalsgaard. Design Recording and Verification of a Danish Emotional Speech Database: Design Recording and Verification of a Danish Emotional Speech Database. EUROSPEECH’97 : 5th European Conference on Speech Communication and Technology, Patras, Rhodes, Greece, 22-25 September 1997, pages Vol. 4, pp. 1695–1698, 1997.
 
 
 
 
269
 
270
 
271
 
272
  <a id="8">[8]</a>
273
+ E. Parada-Cabaleiro, G. Costantini, A. Batliner, M. Schmitt, and B. W. Schuller. DEMoS: An Italian emotional speech corpus. Language Resources and Evaluation, 54(2):341–383, June 2020.
 
 
 
274
 
275
  <a id="9">[9]</a>
276
+ B. Schuller. Automatische Emotionserkennung Aus Sprachlicher Und Manueller Interaktion. PhD thesis, Technische Universität München, 2006.
 
277
 
278
 
279
  <a id="10">[10]</a>
280
+ F. Burkhardt, A. Paeschke, M. Rolfes, W. F. Sendlmeier, and B. Weiss. A database of German emotional speech. In Interspeech 2005, pages 1517–1520. ISCA, Sept. 2005.
 
 
281
 
282
 
283
  <a id="11">[11]</a>
284
+ E. Parada-Cabaleiro, G. Costantini, A. Batliner, A. Baird, and B. Schuller. Categorical vs Dimensional Perception of Italian Emotional Speech. In Interspeech 2018, pages 3638–3642. ISCA, Sept. 2018.
 
 
285
 
286
 
287
  <a id="12">[12]</a>
288
+ A. Dhall, R. Goecke, J. Joshi, K. Sikka, and T. Gedeon. Emotion Recognition In The Wild Challenge 2014: Baseline, Data and Protocol. In Proceedings of the 16th International Conference on Multimodal Interaction, ICMI ’14, pages 461–466, New York, NY, USA, Nov. 2014. Association for Computing Machinery.
 
 
 
289
 
290
 
291
  <a id="13">[13]</a>
292
+ G. Costantini, I. Iaderola, A. Paoloni, and M. Todisco. EMOVO Corpus: An Italian Emotional Speech Database. In N. Calzolari, K. Choukri, T. Declerck, H. Loftsson, B. Maegaard, J. Mariani, A. Moreno, J. Odijk, and S. Piperidis, editors, Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14), pages 3501–3504, Reykjavik, Iceland, May 2014. European Language Resources Association (ELRA).
 
 
 
 
 
 
293
 
294
 
295
  <a id="14">[14]</a>
296
+ O. Martin, I. Kotsia, B. Macq, and I. Pitas. The eNTERFACE’ 05 Audio-Visual Emotion Database. In 22nd International Conference on Data Engineering Workshops (ICDEW’06), pages 8–8, Apr. 2006.
 
 
 
297
 
298
 
299
 
300
  <a id="15">[15]</a>
301
+ K. Zhou, B. Sisman, R. Liu, and H. Li. Seen and Unseen emotional style transfer for voice conversion with a new emotional speech dataset, Feb. 2021.
 
 
302
 
303
 
304
  <a id="16">[16]</a>
305
+ H. O’Reilly, D. Pigat, S. Fridenson, S. Berggren, S. Tal, O. Golan, S. Bölte, S. Baron-Cohen, and D. Lundqvist. The EU-Emotion Stimulus Set: A validation study. Behavior Research Methods, 48(2):567–576, June 2016.
 
 
 
306
 
307
 
308
  <a id="17">[17]</a>
309
+ A. Lassalle, D. Pigat, H. O’Reilly, S. Berggen, S. Fridenson-Hayo, S. Tal, S. Elfström, A. Rade, O. Golan, S. Bölte, S. Baron-Cohen, and D. Lundqvist. The EU-Emotion Voice Database. Behavior Research Methods, 51(2):493–506, Apr. 2019.
 
 
310
 
311
 
312
  <a id="18">[18]</a>
313
+ A. Batliner, S. Steidl, and E. Noth. Releasing a thoroughly annotated and processed spontaneous emotional database: The FAU Aibo Emotion Corpus. 2008.
 
314
 
315
 
316
  <a id="19">[19]</a>
317
+ K. R. Scherer, T. B¨anziger, and E. Roesch. A Blueprint for Affective Computing: A Sourcebook and Manual. OUP Oxford, Sept. 2010.
 
318
 
319
 
320
  <a id="20">[20]</a>
321
+ R. Banse and K. R. Scherer. Acoustic profiles in vocal emotion expression. Journal of Personality and Social Psychology, 70(3):614–636, 1996.
 
322
 
323
 
324
  <a id="21">[21]</a>
325
+ C. Busso, M. Bulut, C.-C. Lee, A. Kazemzadeh, E. Mower, S. Kim, J. N. Chang, S. Lee, and S. S. Narayanan. IEMOCAP: Interactive emotional dyadic motion capture database. Language Resources and Evaluation, 42(4):335–359, Dec. 2008.
326
+
 
327
 
328
  <a id="22">[22]</a>
329
+ M. M. Duville, L. M. Alonso-Valerdi, and D. I. Ibarra-Zarate. The Mexican Emotional Speech Database (MESD): Elaboration and assessment based on machine learning. Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual International Conference, 2021:1644–1647, Nov. 2021.
330
+
 
 
 
331
 
332
  <a id="23">[23]</a>
333
+ S. Poria, D. Hazarika, N. Majumder, G. Naik, E. Cambria, and R. Mihalcea. MELD: A Multimodal Multi-Party Dataset for Emotion Recognition in Conversations, June 2019.
334
+
 
335
 
336
  <a id="24">[24]</a>
337
+ S. R. Livingstone and F. A. Russo. The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English. PLOS ONE, 13(5):e0196391, May 2018.
 
 
338
 
339
 
340
  <a id="25">[25]</a>
341
+ S. Haq and P. J. B. Jackson. Speaker-dependent audio-visual emotion recognition. In Proc. AVSP 2009, pages 53–58, 2009.
 
342
 
343
 
344
  <a id="26">[26]</a>
345
+ O. Mohamad Nezami, P. Jamshid Lou, and M. Karami. ShEMO: A large-scale validated database for Persian speech emotion detection. Language Resources and Evaluation, 53(1):1–16, Mar. 2019.
 
 
346
 
347
 
348
  <a id="27">[27]</a>
349
+ F. Schiel, S. Steininger, and U. T¨urk. The SmartKom Multimodal Corpus at BAS. In M. González Rodríguez and C. P. Suarez Araujo, editors, Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02), Las Palmas, Canary Islands - Spain, May 2002. European Language Resources Association (ELRA).
 
 
 
350
 
351
 
352
  <a id="28">[28]</a>
353
+ B. Schuller, F. Eyben, S. Can, and H. Feussner. Speech in Minimal Invasive Surgery - Towards an Affective Language Resource of Real-life Medical Operations. 2010.
354
 
355
 
356
  <a id="29">[29]</a>
357
+ J. H. L. Hansen and S. E. Bou-Ghazale. Getting started with SUSAS: A speech under simulated and actual stress database. In Proc. Eurospeech 1997, pages 1743–1746, 1997.
 
 
 
358
 
359
 
360
  <a id="30">[30]</a>
361
+ S. Sultana, M. S. Rahman, M. R. Selim, and M. Z. Iqbal. SUST Bangla Emotional Speech Corpus (SUBESCO): An audio-only emotional speech corpus for Bangla. PLOS ONE, 16(4):e0250173, Apr. 2021.
 
 
362
 
363
 
364
  <a id="31">[31]</a>
365
+ M. K. Pichora-Fuller and K. Dupuis. Toronto emotional speech set (TESS), Feb. 2020.
 
 
366
 
367
 
368
  <a id="32">[32]</a>
369
+ S. Latif, A. Qayyum, M. Usman, and J. Qadir. Cross Lingual Speech Emotion Recognition: Urdu vs. Western Languages. In 2018 International Conference on Frontiers of Information Technology (FIT), pages 88–93, Dec. 2018.
 
 
370
  <small>