형규 송 commited on
Commit
e8ae36b
1 Parent(s): dc84017

update main page description (4b7d4ed in https://bitbucket.org/maum-system/cvpr22-demo-gradio)

Browse files
Files changed (2) hide show
  1. docs/article.md +9 -8
  2. docs/description.txt +4 -3
docs/article.md CHANGED
@@ -1,17 +1,18 @@
1
- ## Why you learn a new language, when your model can learn it for you?
 
2
 
3
  ### Abstract
4
 
5
  Recent studies in talking face generation have focused on building a train-once-use-everywhere model i.e. a model that will generalize from any source speech to any target identity. A number of works have already claimed this functionality and have added that their models will also generalize to any language. However, we show, using languages from different language families, that these models do not translate well when the training language and the testing language are sufficiently different. We reduce the scope of the problem to building a language-robust talking face generation system on seen identities i.e. the target identity is the same as the training identity. In this work, we introduce a talking face generation system that will generalize to different languages. We evaluate the efficacy of our system using a multilingual text-to-speech system. We also discuss the usage of joint text-to-speech system and the talking face generation system as a neural dubber system.
6
 
7
- [arXiv](https://arxiv.org/abs/2205.06421) (To be updated with CVPR Proceedings link)
8
 
9
- ### CVPR 2022 schedule
10
 
11
- This demonstration is mainly for CVPR 2022 virtual participants.
12
 
13
- If you participate CVPR 2022 with passport(in-person) registration, you can meet our special demonstration in *Demo Area* from Tuesday to Thursday.
14
 
15
- - 21st, June (Tue) 10:00 ~ 13:30
16
- - 22nd, June (Wed) 10:00 ~ 12:30
17
- - 23rd, June (Thu) 10:00 ~ 12:00
 
1
+
2
+ ## Why learn a new language, when your model can learn it for you?
3
 
4
  ### Abstract
5
 
6
  Recent studies in talking face generation have focused on building a train-once-use-everywhere model i.e. a model that will generalize from any source speech to any target identity. A number of works have already claimed this functionality and have added that their models will also generalize to any language. However, we show, using languages from different language families, that these models do not translate well when the training language and the testing language are sufficiently different. We reduce the scope of the problem to building a language-robust talking face generation system on seen identities i.e. the target identity is the same as the training identity. In this work, we introduce a talking face generation system that will generalize to different languages. We evaluate the efficacy of our system using a multilingual text-to-speech system. We also discuss the usage of joint text-to-speech system and the talking face generation system as a neural dubber system.
7
 
8
+ [CVPR Open Access](https://openaccess.thecvf.com/content/CVPR2022/html/Song_Talking_Face_Generation_With_Multilingual_TTS_CVPR_2022_paper.html) [arXiv](https://arxiv.org/abs/2205.06421)
9
 
10
+ ### CVPR 2022 Schedule
11
 
12
+ This demonstration is mainly for the CVPR 2022 virtual participants.
13
 
14
+ If you are attending in person, we will set up a special demonstration in the *Demo Area* during the following times:
15
 
16
+ - 21st, June (Tue) 10:00 - 13:30
17
+ - 22nd, June (Wed) 10:00 - 12:30
18
+ - 23rd, June (Thu) 10:00 - 12:00
docs/description.txt CHANGED
@@ -1,5 +1,6 @@
1
- This system generates talking face video with corresponding text.
2
- You can input text with one of four languages, which are Chinese, English, Japanese, and Korean.
3
- If your text language and target language is different, it translates the sentence into target language with Google Translation API.
 
4
 
5
  (2022.06.05.) Due to the latency from HuggingFace Spaces and video rendering, it takes 15 ~ 30 seconds to get a video result.
 
1
+ This system generates a talking face video based on the input text.
2
+ You can provide the input text in one of the four languages: Chinese (Mandarin), English, Japanese, and Korean.
3
+ You may also select the target language, the language of the output speech.
4
+ If the input text language and the target language are different, the input text will be translated to the target language using Google Translate API.
5
 
6
  (2022.06.05.) Due to the latency from HuggingFace Spaces and video rendering, it takes 15 ~ 30 seconds to get a video result.