Pablo commited on
Commit
e951a81
1 Parent(s): 605d4d7

Format changes in project description

Browse files
Files changed (1) hide show
  1. app.py +6 -3
app.py CHANGED
@@ -57,10 +57,13 @@ st.markdown(
57
  [Flax/Jax Community Week](https://discuss.huggingface.co/t/open-to-the-community-community-week-using-jax-flax-for-nlp-cv/7104)
58
  organised by HuggingFace.
59
 
60
- All models are variations of RoBERTa-base trained from scratch in Spanish.
61
- We used the mc4 dataset. We reduced the dataset size to 50 million documents to keep training times shorter, and also to be able to bias training examples based on their perplexity.
 
62
  The idea is to favour examples with perplexities that are neither too small (short, repetitive texts) or too long (potentially poor quality).
63
- "Random" sampling simply takes documents at random to reduce the dataset size. "Gaussian" rejects documents with a higher probability for lower and larger perplexities, based on a Gaussian function.
 
 
64
  The first models have been trained (250.000 steps) on sequence length 128, and training for Gaussian changed to sequence length 512 for the last 25.000 training steps.
65
  """
66
  )
 
57
  [Flax/Jax Community Week](https://discuss.huggingface.co/t/open-to-the-community-community-week-using-jax-flax-for-nlp-cv/7104)
58
  organised by HuggingFace.
59
 
60
+ All models are variations of RoBERTa-base trained from scratch in Spanish using the mc4 dataset.
61
+ We reduced the dataset size to 50 million documents to keep training times shorter, and also to be able to bias training examples based on their perplexity.
62
+
63
  The idea is to favour examples with perplexities that are neither too small (short, repetitive texts) or too long (potentially poor quality).
64
+ * **Random** sampling simply takes documents at random to reduce the dataset size.
65
+ * **Gaussian** rejects documents with a higher probability for lower and larger perplexities, based on a Gaussian function.
66
+
67
  The first models have been trained (250.000 steps) on sequence length 128, and training for Gaussian changed to sequence length 512 for the last 25.000 training steps.
68
  """
69
  )