---
license: apache-2.0
---

This is a distilbert-base-multilingual-cased-Model fine-tuned with a NER objective to tag tokens based on whether they belong to a code block or natural language text.
The dataset of 78210 examples was generated by randomly combining code and text blocks from other permissively-licensed datasets, with some examples containing only code and some only regular text.

The model achieves the following stats on the validation set:

| Metric       | Value     |
|--------------|-----------|
| Loss         | 0.0788    |
| F1 Score     | 0.8619    |
| Precision    | 0.8362    |
| Recall       | 0.8893    |
| Accuracy     | 0.9792    |