Some questions about training

#2
by grider-withourai - opened

This whisper fine-tune looks impressive! By far the best ASR model I've seen for Japanese ASMR while maintaining strong anime domain performance. I'm curious - did you apply any text or audio preprocessing to handle the uncleaned Reazon data or the potentially messy NSFW datasets? Planning my own full param fine-tune on a larger model and would appreciate any insights on your approach if you used any. Great work!

No audio preprocessing for this, only placing clips in window with random delay.

Text:

  • normalise halfwidth forms
  • remove text if only punctuation or single kana
  • limit punctuation reptition to 2

Galgame only:

  • clean scripting text
r'…*[▼。]+'
r'[\u2460-\u24FF]'
r'<[^>]+>|(?:PP KOE|@CG0|@?timewait(?:key)?|TIMEWAIT|@PCM_PLAY|@bs_move_x|@se|@FADE_背景)?\([^\)]+\)|\s*@simochi_seen0281_font_size(?:_re)?\s*|v@Hitret id=20291|@殴(?:られ)?る3|@絵文字ハート|5\]¶\[f|_t!\d+,\d,\d,・+/|\$\wd?(?::[\d/,]+)?;|\$str\d+|[@/]?ruby|%bd|\[n\]|^%C|(?:%0)+$|^(?:0:)?\d+;|rial;.+\\z;!?|^;|\\f\+\d+;|\$f\.name;|\$size:\d+;|\$[A-Z_]+,[A-Za-z0-9,\\_.\-]+|\^[a-z0-9]+,(?:file:05笑顔01|\$[a-z0-9_]+)|;;\$VOICE,.+|^\\r|\\f(?:[\-+]\d+)?|^\\|\\$|&emoji\d\d;'
r'.+ @simochi_ame_kakko_size ((.+)) @simochi_ame_kakko_size_re.*'
r'\[rb,[^,]+,([^,]+)\]'
r'\$r:[^,]+,([^;]+);'
r'_t!\d+,\d+,(\d+),([^/]+)/'
r'x[0-9a-f]{8};?([^\\]+)\\c'
r"\\(['%])"
  • uncensor [◯〇○●]
  • reduce repetition of not digit 1-4-grams

CE loss with reduced weight on repetition and penalty for wrong reptition.

Impressive! Appreciate your insights. By "remove text if only punctuation or single kana" do you mean by emptying the text or just simply removing the whole text audio pair? Also, could you elaborate a bit more on "reduce repetition of not digit 1-4-grams"? I guess you mean like multiple "よしよし"? But most of the time they seem to have matching number of occurrences as the audio

Keeping audio just no label. I only reduce 3+ repeats a little, seq2seq repeats too much so I figure a few less is better than infinite looping. I was doing more targeted removal but the model kept using rarer sequences so I gave up.

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment