Maninder Singh
updated model card
748090d
|
raw
history blame
1.73 kB
metadata
license: wtfpl
datasets:
  - cakiki/rosetta-code
language:
  - en
metrics:
  - accuracy
library_name: transformers
pipeline_tag: text-classification
tags:
  - code
  - programming-language
  - code-classification
base_model: huggingface/CodeBERTa-small-v1

This Model is a fine-tuned version of huggingface/CodeBERTa-small-v1 on cakiki/rosetta-code Dataset for 25 Programming Languages as mentioned below.

Training Details:

Model is trained for 25 epochs on Azure for nearly 26000 Datapoints for above Mentioned 25 Programming Languages
extracted from Dataset having 1006 of total Programming Language.

Programming Languages this model is able to detect vs Examples used for training

  1. 'ARM Assembly'
  2. 'AppleScript'
  3. 'C'
  4. 'C#'
  5. 'C++'
  6. 'COBOL'
  7. 'Erlang'
  8. 'Fortran'
  9. 'Go'
  10. 'Java'
  11. 'JavaScript'
  12. 'Kotlin'
  13. 'Lua
  14. 'Mathematica/Wolfram Language'
  15. 'PHP'
  16. 'Pascal'
  17. 'Perl'
  18. 'PowerShell'
  19. 'Python'
  20. 'R
  21. 'Ruby'
  22. 'Rust'
  23. 'Scala'
  24. 'Swift'
  25. 'Visual Basic .NET'
  26. 'jq'

Below is the Training Result for 25 epochs.

  • Training Computer Configuration: GPU:1xNvidia Tesla T4, VRam: 16GB, Ram:112GB,Cores:6 Cores
  • Training Time taken: exactly 7 hours for 25 epochs
  • Training Hyper-parameters:
![image/png](https://cdn-uploads.huggingface.co/production/uploads/645c859ad90782b1a6a3e957/yRqjKVFKZIT_zXjcA3yFW.png)

training detail.png