LemiSt
/

code-segmentor-distilbert

Token Classification

Token Classification

Inference Endpoints

Model card Files Files and versions Community

code-segmentor-distilbert / README.md

LemiSt's picture

Update README.md

5dcd425 verified 5 months ago

|

raw history blame contribute delete

No virus

1.76 kB

	---
	license: apache-2.0
	tags:
	- Token Classification
	widget:
	- text: >-
	The following is a bubble sort implementation taken from TeamTest57/Whack-A-Mole on github.
	int iro = 0;
	int score = 0;
	void bubble_sort() {
	int i, j;
	for (i = 0; i < mole_num - 1; i++)
	for (j = mole_num - 1; j >= i + 1; j--)
	if (hole_y[j] < hole_y[j - 1]) {
	int temp;
	temp = hole_y[j];
	hole_y[j] = hole_y[j - 1];
	hole_y[j - 1] = temp;
	temp = hole_x[j];
	hole_x[j] = hole_x[j - 1];
	hole_x[j - 1] = temp;
	}
	}
	example_title: example 1
	- text: >-
	# Sample animal inherits from custom metaclass
	class Panda(metaclass=CustomMeta):
	"""I bet you see this docstring printed as well"""
	fav_food = "Bamboo"
	loves_code = True

	def activity(self):
	print("Zzz...")
	This programming code was taken from cyberpanda/PythonStuff on GitHub and is cc0-licensed. It defines a class with member variables and methods.
	example_title: example 2
	---

	This is a distilbert-base-multilingual-cased-Model fine-tuned with a NER objective to tag tokens based on whether they belong to a code block or natural language text.
	The dataset of 78210 examples was generated by randomly combining code and text blocks from other permissively-licensed datasets, with some examples containing only code and some only regular text.

	The model achieves the following stats on the validation set:

	\| Metric \| Value \|
	\|--------------\|-----------\|
	\| Loss \| 0.0788 \|
	\| F1 Score \| 0.8619 \|
	\| Precision \| 0.8362 \|
	\| Recall \| 0.8893 \|
	\| Accuracy \| 0.9792 \|