Remove emoji checkmarks and warning signs
Browse files
README.md
CHANGED
|
@@ -190,9 +190,9 @@ for label, names in sorted(grouped.items()):
|
|
| 190 |
| Parameter | Value |
|
| 191 |
|---|---|
|
| 192 |
| Learning rate | 2e-05 |
|
| 193 |
-
| Batch size | 16 (
|
| 194 |
| Epochs | 3 |
|
| 195 |
-
| Optimizer | AdamW
|
| 196 |
| LR scheduler | Cosine with 10% warmup |
|
| 197 |
| Seed | 42 |
|
| 198 |
|
|
@@ -206,19 +206,19 @@ for label, names in sorted(grouped.items()):
|
|
| 206 |
|
| 207 |
**Note:** The best checkpoint (epoch ~2, lowest validation loss 0.0606) was selected as the final model, achieving **90.6% F1**.
|
| 208 |
|
| 209 |
-
## Strengths
|
| 210 |
|
| 211 |
### Strengths
|
| 212 |
-
-
|
| 213 |
-
-
|
| 214 |
-
-
|
| 215 |
-
-
|
| 216 |
-
-
|
| 217 |
|
| 218 |
### Limitations
|
| 219 |
-
-
|
| 220 |
-
-
|
| 221 |
-
-
|
| 222 |
|
| 223 |
## Recommended Post-Processing
|
| 224 |
|
|
|
|
| 190 |
| Parameter | Value |
|
| 191 |
|---|---|
|
| 192 |
| Learning rate | 2e-05 |
|
| 193 |
+
| Batch size | 16 (x2 gradient accumulation = 32 effective) |
|
| 194 |
| Epochs | 3 |
|
| 195 |
+
| Optimizer | AdamW |
|
| 196 |
| LR scheduler | Cosine with 10% warmup |
|
| 197 |
| Seed | 42 |
|
| 198 |
|
|
|
|
| 206 |
|
| 207 |
**Note:** The best checkpoint (epoch ~2, lowest validation loss 0.0606) was selected as the final model, achieving **90.6% F1**.
|
| 208 |
|
| 209 |
+
## Strengths and Limitations
|
| 210 |
|
| 211 |
### Strengths
|
| 212 |
+
- **Cross-domain**: Works on patents, papers, news, and political documents with a single model
|
| 213 |
+
- **Multilingual**: Handles both English and German text
|
| 214 |
+
- **Rich entity types**: 15 entity types covering people, organizations, locations, biological entities, diseases, instruments, and more
|
| 215 |
+
- **Fast**: ~5ms per document on CPU — suitable for processing millions of documents
|
| 216 |
+
- **Long context**: Inherits ModernBERT's 8,192 token context window
|
| 217 |
|
| 218 |
### Limitations
|
| 219 |
+
- **Conference/product names**: May fragment uncommon compound names (e.g., "NeurIPS" split into tokens) — use confidence thresholding (>0.5) to filter
|
| 220 |
+
- **Languages**: Optimized for English and German; other languages may work but are untested
|
| 221 |
+
- **Domain drift**: Performance is best on patent, scientific, political, and news text — may degrade on informal text (social media, chat)
|
| 222 |
|
| 223 |
## Recommended Post-Processing
|
| 224 |
|