File size: 4,288 Bytes
a9979f5
 
9f5e4f0
 
 
 
 
 
 
 
 
 
 
 
1930b02
 
bc528bd
b9434c2
8953341
4d6cc9b
 
bc528bd
eab57b7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4d6cc9b
eab57b7
4b4843c
ac48d25
 
 
 
 
 
 
4b4843c
5d5ca4b
748090d
d17eb4a
 
 
 
748090d
4d6cc9b
 
ac48d25
 
 
 
bc528bd
ac48d25
 
1577cef
ac48d25
 
 
 
 
b081453
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9090d38
b081453
 
 
 
 
 
 
 
9090d38
b081453
 
 
9090d38
 
b081453
ac48d25
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9090d38
ac48d25
9090d38
ac48d25
9090d38
 
ac48d25
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
---
license: wtfpl
datasets:
- cakiki/rosetta-code
language:
- en
metrics:
- accuracy
library_name: transformers
pipeline_tag: text-classification
tags:
- code
- programming-language
- code-classification
base_model: huggingface/CodeBERTa-small-v1
---
This Model is a fine-tuned version of *huggingface/CodeBERTa-small-v1* on *cakiki/rosetta-code* Dataset for 26 Programming Languages as mentioned below.
## Training Details:
Model is trained for 25 epochs on Azure for nearly 26000 Datapoints for above Mentioned 26 Programming Languages<br> extracted from Dataset having 1006 of total Programming Language.
### Programming Languages this model is able to detect vs Examples used for training
<ol>
  <li>'ARM Assembly':</li>
 <li>'AppleScript'</li>
 <li>'C'</li>
 <li>'C#'</li>
 <li>'C++'</li>
 <li>'COBOL'</li>
 <li>'Erlang'</li>
 <li>'Fortran'</li>
 <li>'Go'</li>
 <li>'Java'</li>
 <li>'JavaScript'</li>
 <li>'Kotlin'</li>
 <li>'Lua</li>
 <li>'Mathematica/Wolfram Language'</li>
 <li>'PHP'</li>
 <li>'Pascal'</li>
 <li>'Perl'</li>
 <li>'PowerShell'</li>
 <li>'Python'</li>
 <li>'R</li>
 <li>'Ruby'</li>
 <li>'Rust'</li>
 <li>'Scala'</li>
 <li>'Swift'</li>
 <li>'Visual Basic .NET'</li>
 <li>'jq'</li>
</ol>
<br>

## Below is the Training Result for 25 epochs.
<ul>
    <li>Training Computer Configuration: <ul>
       <li>GPU:1xNvidia Tesla T4, </li>
       <li>VRam: 16GB,</li>
       <li>Ram:112GB,</li>
       <li>Cores:6 Cores </li>
    </ul></li>
         
<li>Training Time taken: exactly 7 hours for 25 epochs</li>
<li>Training Hyper-parameters: </li>
  </ul>

 

![image/png](https://cdn-uploads.huggingface.co/production/uploads/645c859ad90782b1a6a3e957/YIYl1XZk0zpi3DCvn3D80.png)



![training detail.png](https://cdn-uploads.huggingface.co/production/uploads/645c859ad90782b1a6a3e957/Oi9TuJ8nEjtt6Z_W56myn.png)

## Inference Code

  ```python
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification, TextClassificationPipeline
model_name = 'philomath-1209/programming-language-identification'
loaded_tokenizer = AutoTokenizer.from_pretrained(model_name)
loaded_model = AutoModelForSequenceClassification.from_pretrained(model_name)

  
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
text = """
  PROGRAM Triangle
     IMPLICIT NONE
     REAL :: a, b, c, Area
     PRINT *, 'Welcome, please enter the&
              &lengths of the 3 sides.'
     READ *, a, b, c
     PRINT *, 'Triangle''s area:  ', Area(a,b,c)
    END PROGRAM Triangle
    FUNCTION Area(x,y,z)
     IMPLICIT NONE
     REAL :: Area            ! function type
     REAL, INTENT( IN ) :: x, y, z
     REAL :: theta, height
     theta = ACOS((x**2+y**2-z**2)/(2.0*x*y))
     height = x*SIN(theta); Area = 0.5*y*height
    END FUNCTION Area

"""
inputs = loaded_tokenizer(text, return_tensors="pt",truncation=True)
with torch.no_grad():
    logits = loaded_model(**inputs).logits
predicted_class_id = logits.argmax().item()
loaded_model.config.id2label[predicted_class_id]
```

### Optimum with ONNX inference

Loading the model requires the 🤗 Optimum library installed.
```shell
pip install transformers optimum[onnxruntime] optimum
```

```python
model_path = "philomath-1209/programming-language-identification"
import torch
from transformers import pipeline, AutoTokenizer
from optimum.onnxruntime import ORTModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained(model_path, subfolder="onnx")
model = ORTModelForSequenceClassification.from_pretrained(model_path, export=False, subfolder="onnx")

text = """
  PROGRAM Triangle
     IMPLICIT NONE
     REAL :: a, b, c, Area
     PRINT *, 'Welcome, please enter the&
              &lengths of the 3 sides.'
     READ *, a, b, c
     PRINT *, 'Triangle''s area:  ', Area(a,b,c)
    END PROGRAM Triangle
    FUNCTION Area(x,y,z)
     IMPLICIT NONE
     REAL :: Area            ! function type
     REAL, INTENT( IN ) :: x, y, z
     REAL :: theta, height
     theta = ACOS((x**2+y**2-z**2)/(2.0*x*y))
     height = x*SIN(theta); Area = 0.5*y*height
    END FUNCTION Area

"""
inputs = tokenizer(text, return_tensors="pt",truncation=True)
with torch.no_grad():
    logits = model(**inputs).logits
predicted_class_id = logits.argmax().item()
model.config.id2label[predicted_class_id]

```