devngho's picture
Update README.md
0fb43d8 verified
metadata
base_model:
  - microsoft/codebert-base
datasets:
  - devngho/the-stack-llm-annotations-v2
language:
  - code
library_name: transformers
license: mit
metrics:
  - f1

devngho/code_edu_classifier-v3-microsoft_codebert-base

์ด ๋ชจ๋ธ์€ microsoft/codebert-base์— classifier๋ฅผ ์ถ”๊ฐ€ํ•œ ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค. HuggingFaceFW/fineweb-edu-classifier์˜ ์ฝ”๋“œ ๋ฒ„์ „์„ ๋ชฉํ‘œ๋กœ, ์ฝ”๋“œ์˜ ๊ต์œก์„ฑ ์ ์ˆ˜๋ฅผ ํ‰๊ฐ€ํ•ฉ๋‹ˆ๋‹ค. ํ•™์Šต์—๋Š” bigcode/the-stack-dedup์—์„œ ์ถ”์ถœํ•œ ์ƒ˜ํ”Œ์„ Qwen/Qwen2.5-Coder-32B-Instruct๋กœ ํ‰๊ฐ€ํ•œ devngho/the-stack-llm-annotations-v2 ๋ฐ์ดํ„ฐ์…‹์ด ์‚ฌ์šฉ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

์ด ์—ฐ๊ตฌ๋Š” Google์˜ TPU Research Cloud (TRC)์˜ Cloud TPU ์ œ๊ณต์œผ๋กœ ์ˆ˜ํ–‰๋˜์—ˆ์Šต๋‹ˆ๋‹ค. โšก

์ƒ์„ธ

  • ์ œ์ž‘: devngho
  • ์–ธ์–ด: code
  • ๋ผ์ด์„ ์Šค: mit
  • ๊ธฐ๋ฐ˜ ๋ชจ๋ธ: microsoft/codebert-base

ํ•™์Šต ์ƒ์„ธ

  • learning_rate: 3e-4 (cosine)
  • warmup_ratio: 0.1
  • batch_size: 2048(512*4)
  • optimizer: adamw(b1=0.9, b2=0.98, eps=1e-8, weight_decay=0.01)
  • duration: 4h 41m
  • steps: 6080

ํ•™์Šต ์žฅ๋น„

TPU v4-8

์„ฑ๋Šฅ

Validation Report:
              precision    recall  f1-score   support

           0       0.80      0.06      0.10        72
           1       0.62      0.40      0.48       835
           2       0.61      0.62      0.61      2722
           3       0.48      0.72      0.58      1891
           4       0.62      0.02      0.05       623
           5       0.00      0.00      0.00         1

    accuracy                           0.55      6144
   macro avg       0.52      0.30      0.30      6144
weighted avg       0.58      0.55      0.52      6144

Confusion Matrix:
[[   4   36   30    2    0    0]
 [   1  330  464   40    0    0]
 [   0  157 1684  881    0    0]
 [   0    5  516 1361    9    0]
 [   0    0   71  537   15    0]
 [   0    0    0    1    0    0]]

3 ์ด์ƒ๊ณผ ๋ฏธ๋งŒ์œผ๋กœ ๊ตฌ๋ถ„ํ•  ๋•Œ f1 score๋Š” ์•ฝ 0.72์ž…๋‹ˆ๋‹ค.

devngho/code_edu_classifier-v3-microsoft_codebert-base

This model is microsoft/codebert-base with classfier head. It is designed to evaluate the educational value of codes, similar to the HuggingFaceFW/fineweb-edu-classifier, but focused on code. The training data comes from devngho/the-stack-llm-annotations-v2 dataset, contains samples extracted from bigcode/the-stack-dedup and evaluated using Qwen/Qwen2.5-Coder-32B-Instruct.

This research was supported with Cloud TPUs from Google's TPU Research Cloud (TRC).โšก

Training detail

  • learning_rate: 3e-4 (cosine)
  • warmup_ratio: 0.1
  • batch_size: 2048(512*4)
  • optimizer: adamw(b1=0.9, b2=0.98, eps=1e-8, weight_decay=0.01)
  • duration: 4h 41m
  • steps: 6080

Training hardware

TPU v4-8

Performance

Validation Report:
              precision    recall  f1-score   support

           0       0.80      0.06      0.10        72
           1       0.62      0.40      0.48       835
           2       0.61      0.62      0.61      2722
           3       0.48      0.72      0.58      1891
           4       0.62      0.02      0.05       623
           5       0.00      0.00      0.00         1

    accuracy                           0.55      6144
   macro avg       0.52      0.30      0.30      6144
weighted avg       0.58      0.55      0.52      6144

Confusion Matrix:
[[   4   36   30    2    0    0]
 [   1  330  464   40    0    0]
 [   0  157 1684  881    0    0]
 [   0    5  516 1361    9    0]
 [   0    0   71  537   15    0]
 [   0    0    0    1    0    0]]

The F1 score is about 0.72 when separating above and below 3.