LemiSt
/

code-segmentor-distilbert

Token Classification

Token Classification

Inference Endpoints

Model card Files Files and versions Community

code-segmentor-distilbert / README.md

LemiSt's picture

Update README.md

b5ed404 verified 6 months ago

|

raw history blame

No virus

661 Bytes

	---
	license: apache-2.0
	---

	This is a distilbert-base-multilingual-cased-Model fine-tuned with a NER objective to tag tokens based on whether they belong to a code block or natural language text.
	The dataset of 78210 examples was generated by randomly combining code and text blocks from other permissively-licensed datasets, with some examples containing only code and some only regular text.

	The model achieves the following stats on the validation set:

	\| Metric \| Value \|
	\|--------------\|-----------\|
	\| Loss \| 0.0788 \|
	\| F1 Score \| 0.8619 \|
	\| Precision \| 0.8362 \|
	\| Recall \| 0.8893 \|
	\| Accuracy \| 0.9792 \|