XiaoEnn
/

herberta_V3_Modern

Inference Endpoints

Model card Files Files and versions Community

XiaoEnn commited on Dec 26, 2024

Commit

3750776

·

verified ·

1 Parent(s): 7823f79

Update README.md

Files changed (1) hide show

README.md +1 -27

README.md CHANGED Viewed

@@ -59,42 +59,16 @@ We named the model "Herberta" by combining "Herb" and "Roberta" to signify its p
 ![Loss](https://cdn-uploads.huggingface.co/production/uploads/6564baaa393bae9c194fc32e/BJ7enbRg13IYAZuxwraPP.png)
 ![Perplexity](https://cdn-uploads.huggingface.co/production/uploads/6564baaa393bae9c194fc32e/lOohRMIctPJZKM5yEEcQ2.png)
-<!-- <table>
-  <tr>
-    <td align="center"><strong>Accuracy</strong></td>
-    <td align="center"><strong>Loss</strong></td>
-    <td align="center"><strong>Perplexity</strong></td>
-  </tr>
-  <tr>
-    <td><img src="https://cdn-uploads.huggingface.co/production/uploads/6564baaa393bae9c194fc32e/RDgI-0Ro2kMiwV853Wkgx.png" alt="Accuracy" width="800"></td>
-    <td><img src="https://cdn-uploads.huggingface.co/production/uploads/6564baaa393bae9c194fc32e/BJ7enbRg13IYAZuxwraPP.png" alt="Loss" width="800"></td>
-    <td><img src="https://cdn-uploads.huggingface.co/production/uploads/6564baaa393bae9c194fc32e/lOohRMIctPJZKM5yEEcQ2.png" alt="Perplexity" width="800"></td>
-  </tr>
-</table> -->
 ### Pretraining Configuration
-#### Ancient Books
-- Pretraining Strategy: BERT-style MASK (15% tokens masked)
-- Sequence Length: 512
-- Batch Size: 32
-- Learning Rate: `1e-5` with an epoch-based decay (`epoch * 0.1`)
-- Tokenization: Sentence-based tokenization with padding for sequences <512 tokens.
-#### Modern Textbooks
 - Pretraining Strategy: Dynamic MASK + Warmup + Linear Decay
 - Sequence Length: 512
 - Batch Size: 16
 - Learning Rate: Warmup (10% steps) + Linear Decay (1e-5 initial rate)
 - Tokenization: Continuous tokenization (512 tokens) without sentence segmentation.
-#### V4 Mixed Dataset (Ancient + Modern)
-- Dataset: Combined 48 modern textbooks + 700 ancient books
-- Pretraining Strategy: Dynamic MASK, warmup, and linear decay (1e-5 learning rate).
-- Epochs: 20
-- Sequence Length: 512
-- Batch Size: 16
-- Tokenization: Continuous tokenization.
 ---

 ![Loss](https://cdn-uploads.huggingface.co/production/uploads/6564baaa393bae9c194fc32e/BJ7enbRg13IYAZuxwraPP.png)
 ![Perplexity](https://cdn-uploads.huggingface.co/production/uploads/6564baaa393bae9c194fc32e/lOohRMIctPJZKM5yEEcQ2.png)
 ### Pretraining Configuration
+#### Modern Textbooks Version
 - Pretraining Strategy: Dynamic MASK + Warmup + Linear Decay
 - Sequence Length: 512
 - Batch Size: 16
 - Learning Rate: Warmup (10% steps) + Linear Decay (1e-5 initial rate)
 - Tokenization: Continuous tokenization (512 tokens) without sentence segmentation.
 ---