|
--- |
|
license: apache-2.0 |
|
--- |
|
|
|
This is a distilbert-base-multilingual-cased-Model fine-tuned with a NER objective to tag tokens based on whether they belong to a code block or natural language text. |
|
The dataset of 78210 examples was generated by randomly combining code and text blocks from other permissively-licensed datasets, with some examples containing only code and some only regular text. |
|
|
|
The model achieves the following stats on the validation set: |
|
|
|
| Metric | Value | |
|
|--------------|-----------| |
|
| Loss | 0.0788 | |
|
| F1 Score | 0.8619 | |
|
| Precision | 0.8362 | |
|
| Recall | 0.8893 | |
|
| Accuracy | 0.9792 | |