RobPruzan commited on
Commit
41b5f59
1 Parent(s): a704965

Update app.py

Browse files
Files changed (1) hide show
  1. app.py +19 -1
app.py CHANGED
@@ -319,6 +319,24 @@ interface = gr.Interface(
319
  theme="huggingface",
320
  description="Enter text or speak into your microphone to have your text analyzed!",
321
  rounded=True,
322
- container=True
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
323
 
324
  ).launch()
 
319
  theme="huggingface",
320
  description="Enter text or speak into your microphone to have your text analyzed!",
321
  rounded=True,
322
+ container=True,
323
+ article="""
324
+ Fine-Tuned Distil Bert- Automatically determining how difficult something is to read is a difficult task as underlying semantics are relevant.
325
+ To efficiently compute text difficulty, a Distil-Bert pre-trained model is fine-tuned for regression using The CommonLit Ease of Readability (CLEAR)
326
+ Corpus. https://educationaldatamining.org/EDM2021/virtual/static/pdf/EDM21_paper_35.pdf This dataset contains over 110,000 pairwise comparisons of
327
+ ~1100 teachers responded to the question, "Which text is easier for students to understand?". This model is trained end-end (regression layer down to
328
+ the first attention layer to ensure the best performance- Merchant et al. 2020
329
+
330
+ Speech Pronunciaion Scoring: The Wave2Vec 2.0 model is utilized to convert audio into text in real-time. The model predicts words or phonemes (smallest
331
+ unit of speech distinguishing one word (or word element) from another) from the input audio from the user. Due to the nature of the model, users with poor
332
+ pronunciation get inaccurate results. This project attempts to score pronunciation by asking a user to read a target excerpt into the microphone. We then
333
+ pass this audio through Wave2Vec to get the inferred intended words. We measure the loss as the Levenshtein distance between the target and actual transcripts-
334
+ the Levenshtein distance between two words is the minimum number of single-character edits required to change one word into the other.
335
+
336
+ Lexical Diversity Score: The lexical diversity score is computed by taking the ratio of unique similar words to total similar words squared. The similarity is computed
337
+ as if the cosine similarity of the word2vec embeddings is greater than .75. It is bad writing/speech practice to repeat the same words when it's possible not to.
338
+ Vocabulary diversity is generally computed by taking the ratio of unique strings/ total strings. This does not give an indication if the person has a large vocabulary
339
+ or if the topic does not require a diverse vocabulary to express it
340
+ """
341
 
342
  ).launch()