HuggingFaceTB
/

python-edu-scorer

@@ -8,37 +8,57 @@ metrics:
 - recall
 - accuracy
 model-index:
-- name: stack-edu-scorer
   results: []
 ---
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
-# stack-edu-scorer
-This model is a fine-tuned version of [Snowflake/snowflake-arctic-embed-m](https://huggingface.co/Snowflake/snowflake-arctic-embed-m) on an unknown dataset.
-It achieves the following results on the evaluation set:
-- Loss: 0.3426
-- Precision: 0.5188
-- Recall: 0.3971
-- F1 Macro: 0.4258
-- Accuracy: 0.6350
-## Model description
-More information needed
 ## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
 ## Training procedure
 ### Training hyperparameters
 The following hyperparameters were used during training:
@@ -52,42 +72,19 @@ The following hyperparameters were used during training:
 ### Training results
-| Training Loss | Epoch   | Step  | Validation Loss | Precision | Recall | F1 Macro | Accuracy |
-|:-------------:|:-------:|:-----:|:---------------:|:---------:|:------:|:--------:|:--------:|
-| 0.3973        | 0.5787  | 1000  | 0.3904          | 0.4701    | 0.3433 | 0.3701   | 0.5885   |
-| 0.3848        | 1.1574  | 2000  | 0.3803          | 0.5107    | 0.3574 | 0.3863   | 0.5974   |
-| 0.3667        | 1.7361  | 3000  | 0.3715          | 0.6471    | 0.4478 | 0.4879   | 0.6103   |
-| 0.3727        | 2.3148  | 4000  | 0.3655          | 0.6140    | 0.4375 | 0.4715   | 0.6121   |
-| 0.3639        | 2.8935  | 5000  | 0.3617          | 0.6234    | 0.4519 | 0.4879   | 0.6176   |
-| 0.3684        | 3.4722  | 6000  | 0.3626          | 0.6424    | 0.4632 | 0.5020   | 0.6211   |
-| 0.3557        | 4.0509  | 7000  | 0.3589          | 0.5519    | 0.3739 | 0.4032   | 0.6175   |
-| 0.3513        | 4.6296  | 8000  | 0.3650          | 0.6328    | 0.4671 | 0.5010   | 0.6241   |
-| 0.3505        | 5.2083  | 9000  | 0.3535          | 0.5320    | 0.3850 | 0.4129   | 0.6259   |
-| 0.3549        | 5.7870  | 10000 | 0.3526          | 0.6358    | 0.4588 | 0.4949   | 0.6248   |
-| 0.3465        | 6.3657  | 11000 | 0.3580          | 0.5204    | 0.3712 | 0.3970   | 0.6166   |
-| 0.3468        | 6.9444  | 12000 | 0.3498          | 0.5266    | 0.3936 | 0.4235   | 0.6293   |
-| 0.3463        | 7.5231  | 13000 | 0.3497          | 0.6837    | 0.4661 | 0.4999   | 0.6300   |
-| 0.3404        | 8.1019  | 14000 | 0.3557          | 0.6169    | 0.4940 | 0.5285   | 0.6307   |
-| 0.3381        | 8.6806  | 15000 | 0.3493          | 0.5124    | 0.3871 | 0.4135   | 0.6290   |
-| 0.342         | 9.2593  | 16000 | 0.3482          | 0.5265    | 0.3959 | 0.4247   | 0.6337   |
-| 0.3397        | 9.8380  | 17000 | 0.3477          | 0.5210    | 0.3919 | 0.4191   | 0.6325   |
-| 0.3407        | 10.4167 | 18000 | 0.3465          | 0.5380    | 0.3895 | 0.4202   | 0.6297   |
-| 0.3303        | 10.9954 | 19000 | 0.3471          | 0.5273    | 0.3952 | 0.4234   | 0.6355   |
-| 0.3296        | 11.5741 | 20000 | 0.3447          | 0.5428    | 0.3891 | 0.4173   | 0.6313   |
-| 0.3299        | 12.1528 | 21000 | 0.3451          | 0.5173    | 0.3964 | 0.4248   | 0.6347   |
-| 0.3316        | 12.7315 | 22000 | 0.3448          | 0.6321    | 0.4809 | 0.5167   | 0.6350   |
-| 0.3289        | 13.3102 | 23000 | 0.3446          | 0.5100    | 0.3969 | 0.4242   | 0.6358   |
-| 0.3278        | 13.8889 | 24000 | 0.3445          | 0.5451    | 0.3918 | 0.4223   | 0.6327   |
-| 0.3249        | 14.4676 | 25000 | 0.3440          | 0.5282    | 0.3915 | 0.4194   | 0.6343   |
-| 0.328         | 15.0463 | 26000 | 0.3438          | 0.5670    | 0.3880 | 0.4183   | 0.6316   |
-| 0.3263        | 15.625  | 27000 | 0.3448          | 0.6290    | 0.4828 | 0.5191   | 0.6363   |
-| 0.3243        | 16.2037 | 28000 | 0.3437          | 0.5534    | 0.3950 | 0.4252   | 0.6356   |
-| 0.3265        | 16.7824 | 29000 | 0.3435          | 0.5432    | 0.3926 | 0.4217   | 0.6328   |
-| 0.3193        | 17.3611 | 30000 | 0.3432          | 0.5231    | 0.3962 | 0.4238   | 0.6348   |
-| 0.3261        | 17.9398 | 31000 | 0.3433          | 0.5517    | 0.3933 | 0.4235   | 0.6326   |
-| 0.317         | 18.5185 | 32000 | 0.3431          | 0.5527    | 0.3929 | 0.4220   | 0.6334   |
-| 0.3222        | 19.0972 | 33000 | 0.3429          | 0.5132    | 0.3976 | 0.4259   | 0.6357   |
-| 0.3223        | 19.6759 | 34000 | 0.3426          | 0.5188    | 0.3971 | 0.4258   | 0.6350   |
 ### Framework versions

 - recall
 - accuracy
 model-index:
+- name: python-edu-scorer
   results: []
 ---
+# Python-Edu Scorer
+This model is a fine-tuned version of [Snowflake/snowflake-arctic-embed-m](https://huggingface.co/Snowflake/snowflake-arctic-embed-m) on a dataset of Python files labeled by Llama3 for educational value.
+We used this classifier to build [Python-Edu](https://huggingface.co/datasets/HuggingFaceTB/cosmopedia-v2) dataset.
+### How to use in transformers
+To load the Python-Edu classifier, use the following code:
+```python
+from transformers import AutoTokenizer, AutoModelForSequenceClassification
+tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/python-edu-scorer")
+model = AutoModelForSequenceClassification.from_pretrained("HuggingFaceTB/python-edu-scorer")
+text = "This is a test sentence."
+inputs = tokenizer(text, return_tensors="pt", padding="longest", truncation=True)
+outputs = model(**inputs)
+logits = outputs.logits.squeeze(-1).float().detach().numpy()
+score = logits.item()
+result = {
+    "text": text,
+    "score": score,
+    "int_score": int(round(max(0, min(score, 5)))),
+}
+print(result)
+# {'text': 'This is a test sentence.', 'score': 0.07964489609003067, 'int_score': 0}
+```
 ## Intended uses & limitations
+While the Python-Edu classifier performs well in distinguishing high-quality python code, there are some limitations:
+- Scope: The model's performance might change for other datasets, in particular for out of distribution samples. It is also focused on educational content relevant to beginners and may not perform as well on content intended for higher education or specialized domains.
+- Bias: The model's performance is dependent on the quality and representativeness of the training data and the LLM used for the annotation. Biases in both can affect the classifier's judgments. It might overfit to thoroughly commented code.
+- Context: The classifier evaluates individual code files without considering broader context, which might impact its effectiveness in certain scenarios.
+The training and inference code is available on GitHub
+https://github.com/huggingface/cosmopedia/tree/main/classification
 ## Training procedure
+The classifier was trained on 450,000 pairs of python code files and their scores from 1 to 5, generated by Llama3. The samples were annotated based on their educational quality with 1 being not educational and 5 being highly educational.
+We added a classification head with a single regression output to [Snowflake-arctic-embed](https://huggingface.co/Snowflake/snowflake-arctic-embed-m) and trained the model for 20 epochs with a learning rate of 3e-4. During training, the embedding and encoder layers were frozen to focus on the classification head.
 ### Training hyperparameters
 The following hyperparameters were used during training:
 ### Training results
+```
+              precision    recall  f1-score   support
+           1       0.84      0.46      0.59      8364
+           2       0.61      0.76      0.68     19605
+           3       0.60      0.62      0.61     16187
+           4       0.72      0.50      0.59      4872
+           5       0.38      0.08      0.13       118
+    accuracy                           0.64     49146
+   macro avg       0.63      0.48      0.52     49146
+weighted avg       0.66      0.64      0.63     49146
+```
 ### Framework versions