--- license: wtfpl datasets: - cakiki/rosetta-code language: - en metrics: - accuracy library_name: transformers pipeline_tag: text-classification tags: - code - programming-language - code-classification base_model: huggingface/CodeBERTa-small-v1 --- This Model is a fine-tuned version of *huggingface/CodeBERTa-small-v1* on *cakiki/rosetta-code* Dataset for 26 Programming Languages as mentioned below. ## Training Details: Model is trained for 25 epochs on Azure for nearly 26000 Datapoints for above Mentioned 26 Programming Languages
extracted from Dataset having 1006 of total Programming Language. ### Programming Languages this model is able to detect vs Examples used for training
  1. 'ARM Assembly':
  2. 'AppleScript'
  3. 'C'
  4. 'C#'
  5. 'C++'
  6. 'COBOL'
  7. 'Erlang'
  8. 'Fortran'
  9. 'Go'
  10. 'Java'
  11. 'JavaScript'
  12. 'Kotlin'
  13. 'Lua
  14. 'Mathematica/Wolfram Language'
  15. 'PHP'
  16. 'Pascal'
  17. 'Perl'
  18. 'PowerShell'
  19. 'Python'
  20. 'R
  21. 'Ruby'
  22. 'Rust'
  23. 'Scala'
  24. 'Swift'
  25. 'Visual Basic .NET'
  26. 'jq'

## Below is the Training Result for 25 epochs. ![image/png](https://cdn-uploads.huggingface.co/production/uploads/645c859ad90782b1a6a3e957/YIYl1XZk0zpi3DCvn3D80.png) ![training detail.png](https://cdn-uploads.huggingface.co/production/uploads/645c859ad90782b1a6a3e957/Oi9TuJ8nEjtt6Z_W56myn.png) ## Inference Code ```python import torch from transformers import AutoTokenizer, AutoModelForSequenceClassification, TextClassificationPipeline model_name = 'philomath-1209/programming-language-identification' loaded_tokenizer = AutoTokenizer.from_pretrained(model_name) loaded_model = AutoModelForSequenceClassification.from_pretrained(model_name) device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') text = """ PROGRAM Triangle IMPLICIT NONE REAL :: a, b, c, Area PRINT *, 'Welcome, please enter the& &lengths of the 3 sides.' READ *, a, b, c PRINT *, 'Triangle''s area: ', Area(a,b,c) END PROGRAM Triangle FUNCTION Area(x,y,z) IMPLICIT NONE REAL :: Area ! function type REAL, INTENT( IN ) :: x, y, z REAL :: theta, height theta = ACOS((x**2+y**2-z**2)/(2.0*x*y)) height = x*SIN(theta); Area = 0.5*y*height END FUNCTION Area """ inputs = loaded_tokenizer(text, return_tensors="pt",truncation=True) with torch.no_grad(): logits = loaded_model(**inputs).logits predicted_class_id = logits.argmax().item() loaded_model.config.id2label[predicted_class_id] ```