agentlans
/

snowflake-arctic-xs-grammar-classifier

+---
+library_name: transformers
+license: mit
+base_model: agentlans/snowflake-arctic-embed-xs-zyda-2
+tags:
+- generated_from_trainer
+- text-classification
+- grammar-classification
+metrics:
+- accuracy
+model-index:
+- name: agentlans/snowflake-arctic-xs-grammar-classifier
+  results:
+  - task:
+      type: text-classification
+      name: Grammar Classification
+    dataset:
+      name: agentlans/grammar-classification
+      type: agentlans/grammar-classification
+    metrics:
+    - type: accuracy
+      value: 0.8724
+      name: Accuracy
+datasets:
+- agentlans/grammar-classification
+- liweili/c4_200m
+language:
+- en
+pipeline_tag: text-classification
+---
+# snowflake-arctic-xs-grammar-classifier
+This model is a fine-tuned version of [agentlans/snowflake-arctic-embed-xs-zyda-2](https://huggingface.co/agentlans/snowflake-arctic-embed-xs-zyda-2) for grammar classification. It achieves an accuracy of 0.8724 on the evaluation set.
+## Model description
+The snowflake-arctic-xs-grammar-classifier is designed to classify the grammatical correctness of English sentences.
+It is based on the snowflake-arctic-embed-xs-zyda-2 model and has been fine-tuned on a grammar classification dataset derived from the C4 (Colossal Clean Crawled Corpus).
+## Intended uses & limitations
+This model is intended for classifying the grammatical correctness of English sentences. It can be used in various applications such as writing assistance tools, educational software, or content moderation systems.
+### Usage example
+```python
+from transformers import pipeline
+import torch
+device = 0 if torch.cuda.is_available() else -1
+classifier = pipeline(
+    "text-classification",
+    model="agentlans/snowflake-arctic-xs-grammar-classifier",
+    device=device,
+)
+text = "I absolutely loved this movie!"
+result = classifier(text)
+print(result)  # [{'label': 'grammatical', 'score': 0.8963921666145325}]
+```
+### Example Classifications
+| Status | Text | Explanation |
+|:--------:|------|-------------|
+| ✔️ | I absolutely loved this movie! | Grammatically correct, clear sentence structure |
+| ❌ | How do I shot web? | Grammatically incorrect, improper verb usage |
+| ✔️ | Beware the Jabberwock, my son! | Poetic language, grammatically sound |
+| ✔️ | Colourless green ideas sleep furiously. | Grammatically correct, though semantically nonsensical |
+| ❌ | Has anyone really been far even as decided to use even go want to do look more like? | Completely incoherent and grammatically incorrect |
+### Limitations
+The model's performance is limited by the quality and diversity of its training data. It may not perform well on specialized or domain-specific text, or on languages other than English. Additionally, it may struggle with complex grammatical structures or nuanced language use.
+## Training and evaluation data
+The model was trained on the [agentlans/grammar-classification](https://huggingface.co/datasets/agentlans/grammar-classification) dataset, which contains 600&thinsp;000 examples for binary classification of grammatical correctness in English. This dataset is derived from a subset of the C4_200M Synthetic Dataset for Grammatical Error Correction.
+## Training procedure
+### Training hyperparameters
+- Learning rate: 5e-05
+- Batch size: 128
+- Number of epochs: 10
+- Optimizer: AdamW with betas=(0.9,0.999) and epsilon=1e-08
+- Learning rate scheduler: Linear
+<details>
+<summary>📊 Detailed Training Results</summary>
+| Training Loss | Epoch | Step  | Validation Loss | Accuracy | Input Tokens Seen |
+|:-------------:|:-----:|:-----:|:---------------:|:--------:|:-----------------:|
+| 0.5192        | 1.0   | 3750  | 0.4722          | 0.7738   | 61&thinsp;440&thinsp;000        |
+| 0.4875        | 2.0   | 7500  | 0.4521          | 0.7881   | 122&thinsp;880&thinsp;000       |
+| 0.4590        | 3.0   | 11250 | 0.3895          | 0.8227   | 184&thinsp;320&thinsp;000       |
+| 0.4351        | 4.0   | 15000 | 0.3981          | 0.8197   | 245&thinsp;760&thinsp;000       |
+| 0.4157        | 5.0   | 18750 | 0.3690          | 0.8337   | 307&thinsp;200&thinsp;000       |
+| 0.3955        | 6.0   | 22500 | 0.3260          | 0.8585   | 368&thinsp;640&thinsp;000       |
+| 0.3788        | 7.0   | 26250 | 0.3267          | 0.8566   | 430&thinsp;080&thinsp;000       |
+| 0.3616        | 8.0   | 30000 | 0.3192          | 0.8621   | 491&thinsp;520&thinsp;000       |
+| 0.3459        | 9.0   | 33750 | 0.3017          | 0.8707   | 552&thinsp;960&thinsp;000       |
+| 0.3382        | 10.0  | 37500 | 0.2971          | 0.8724   | 614&thinsp;400&thinsp;000       |
+</details>
+### Framework versions
+- Transformers: 4.46.3
+- PyTorch: 2.5.1+cu124
+- Datasets: 3.2.0
+- Tokenizers: 20.3