RobPruzan commited on
Commit
eeaf49a
1 Parent(s): c8babd6

Updating diversity calculation

Browse files
Files changed (1) hide show
  1. app.py +2 -2
app.py CHANGED
@@ -480,12 +480,12 @@ with gr.Blocks(title="Automatic Literacy and Speech Assesmen") as demo:
480
  to understand.
481
  """)
482
  gr.Markdown("""**Lexical Diversity**- The lexical diversity score is computed by taking the ratio of unique similar words to total similar words
483
- squared. The similarity is computed as if the cosine similarity of the word2vec embeddings is greater than .75. It is bad writing/speech
484
  practice to repeat the same words when it's possible not to. Vocabulary diversity is generally computed by taking the ratio of unique
485
  strings/ total strings. This does not give an indication if the person has a large vocabulary or if the topic does not require a diverse
486
  vocabulary to express it. This algorithm only scores the text based on how many times a unique word was chosen for a semantic idea, e.g.,
487
  "Forest" and "Woods" are 2 words to represent one semantic idea, so this would receive a 100% lexical diversity score, vs using the word
488
- "Forest" twice would yield you a 25% diversity score, (1 unique word/ 2 total words)^2
489
  """)
490
  gr.Markdown("""**Speech Pronunciation Scoring-**- The Wave2Vec 2.0 model is utilized to convert audio into text in real-time. The model predicts words or phonemes
491
  (smallest unit of speech distinguishing one word (or word element) from another) from the input audio from the user. Due to the nature of the model,
 
480
  to understand.
481
  """)
482
  gr.Markdown("""**Lexical Diversity**- The lexical diversity score is computed by taking the ratio of unique similar words to total similar words
483
+ . The similarity is computed as if the cosine similarity of the word2vec embeddings is greater than .75. It is bad writing/speech
484
  practice to repeat the same words when it's possible not to. Vocabulary diversity is generally computed by taking the ratio of unique
485
  strings/ total strings. This does not give an indication if the person has a large vocabulary or if the topic does not require a diverse
486
  vocabulary to express it. This algorithm only scores the text based on how many times a unique word was chosen for a semantic idea, e.g.,
487
  "Forest" and "Woods" are 2 words to represent one semantic idea, so this would receive a 100% lexical diversity score, vs using the word
488
+ "Forest" twice would yield you a 25% diversity score, (1 unique word/ 2 total words)
489
  """)
490
  gr.Markdown("""**Speech Pronunciation Scoring-**- The Wave2Vec 2.0 model is utilized to convert audio into text in real-time. The model predicts words or phonemes
491
  (smallest unit of speech distinguishing one word (or word element) from another) from the input audio from the user. Due to the nature of the model,