File size: 1,731 Bytes
160dfe2
b5ed404
d7af6cc
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
160dfe2
b5ed404
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
---
license: apache-2.0
widget:
  - text: >-
      The following is a bubble sort implementation taken from TeamTest57/Whack-A-Mole on github.
      int iro = 0;
      int score = 0;
      void bubble_sort() {
      	int i, j;
      	for (i = 0; i < mole_num - 1; i++)
      		for (j = mole_num - 1; j >= i + 1; j--)
      			if (hole_y[j] < hole_y[j - 1]) {
      				int temp;
      				temp = hole_y[j];
      				hole_y[j] = hole_y[j - 1];
      				hole_y[j - 1] = temp;
      				temp = hole_x[j];
      				hole_x[j] = hole_x[j - 1];
      				hole_x[j - 1] = temp;
      			}
      }
    example_title: example 1
  - text: >-
      # Sample animal inherits from custom metaclass
      class Panda(metaclass=CustomMeta):
          """I bet you see this docstring printed as well"""
          fav_food = "Bamboo"
          loves_code = True
      
          def activity(self):
              print("Zzz...")
      This programming code was taken from cyberpanda/PythonStuff on GitHub and is cc0-licensed. It defines a class with member variables and methods.
    example_title: example 2
---

This is a distilbert-base-multilingual-cased-Model fine-tuned with a NER objective to tag tokens based on whether they belong to a code block or natural language text.
The dataset of 78210 examples was generated by randomly combining code and text blocks from other permissively-licensed datasets, with some examples containing only code and some only regular text.

The model achieves the following stats on the validation set:

| Metric       | Value     |
|--------------|-----------|
| Loss         | 0.0788    |
| F1 Score     | 0.8619    |
| Precision    | 0.8362    |
| Recall       | 0.8893    |
| Accuracy     | 0.9792    |