Upload folder using huggingface_hub

Browse files

Files changed (6) hide show

README.md +38 -68
config.json +25 -21
generation_config.json +11 -0
model.safetensors +2 -2
tokenizer_config.json +3 -2
training_args.bin +3 -0

README.md CHANGED Viewed

@@ -1,87 +1,57 @@
 ---
-language: en
-license: apache-2.0
 tags:
-- pretraining
-- small-language-model
-- educational
-- causal-lm
-datasets:
-- HuggingFaceTB/smollm-corpus
 ---
-# Quark-50m-Instruct
-**Quark-50m-Instruct** is a causal (decoder-only) language model with approximately **50 million parameters**, trained from scratch on 5 billion tokens from the `smollm-corpus` dataset. It is designed to be lightweight, fast, and suitable for resource-constrained environments (e.g., RTX 3070, 8 GB VRAM), while retaining good text understanding and generation capabilities.
-The name *Quark* reflects its compact and elementary nature, ideal for on-device applications, lightweight conversational assistants, or as a base for domain-specific fine-tuning.
-## Model Details
-| Property               | Value                                |
-|------------------------|--------------------------------------|
-| Architecture           | Transformer decoder-only (SmolLM-style) |
-| Parameters             | ~50 M (effective, with weight tying) |
-| Context length         | 2048 tokens                          |
-| Vocabulary size        | 49,152 (cosmo2 tokenizer)            |
-| Model identifier       | `OvercastLab/Quark-50m-Instruct`     |
-| Primary language       | English (training data)              |
-## Architecture Details
-The model follows the style of **SmolLM** and **Qwen2.5** with the following characteristics:
-- **Grouped-Query Attention (GQA)** – ratio `n_heads / n_kv_heads = 3` to reduce KV cache footprint.
-- **SwiGLU** activation in feed-forward networks, with intermediate dimension `d_ff = 1024`.
-- **RMSNorm** applied before attention and FFN (pre-normalization).
-- **Rotary Positional Embeddings (RoPE)** with `theta = 10,000`.
-- **Weight tying** – input embedding and output projection share weights.
-- **Bias** – only on QKV projections (`qkv_bias = True`) for better numerical stability.
-| Component            | Configuration                         |
-|----------------------|---------------------------------------|
-| `d_model`            | 384                                   |
-| `n_layers`           | 24                                    |
-| `n_heads`            | 6                                     |
-| `n_kv_heads`         | 2                                     |
-| `head_dim`           | 64                                    |
-| `d_ff`               | 1024                                  |
-| `dropout`            | 0.0 (no dropout during pretraining)   |
-## Training Data
-The model was pretrained on **5 billion tokens** sampled from the [`HuggingFaceTB/smollm-corpus`](https://huggingface.co/datasets/HuggingFaceTB/smollm-corpus) dataset with the following distribution:
-| Sub-dataset               | Percentage | Tokens (billions) | Main content                                 |
-|---------------------------|------------|-------------------|----------------------------------------------|
-| `cosmopedia-v2`           | 60%        | 3.0               | Synthetic textbooks, educational articles, stories |
-| `fineweb-edu-dedup`       | 40%        | 2.0               | Web pages filtered for educational quality   |
-Data were tokenized using the `HuggingFaceTB/cosmo2-tokenizer` (vocabulary size 49,152), with the EOS token appended to each document. Training sequences have a fixed length of **2048** tokens (with packing).
-## Training Procedure
-- **Framework**: PyTorch with `torch.compile` and `GradScaler` for mixed precision.
-- **Precision**: `bfloat16` (Ampere RTX 3070).
-- **Optimizer**: AdamW (`β₁=0.9`, `β₂=0.95`, weight decay = 0.1).
-- **Learning rate**: `3e-4` with linear warmup for 1,000 steps, then cosine decay to `3e-5`.
-- **Effective batch size**: 64 sequences × 2048 tokens = **131,072 tokens per step**.
-  - Micro-batch: 4 sequences, gradient accumulation over 16 steps.
-- **Gradient clipping**: 1.0.
-- **Total steps**: approximately 38,000 (to reach 5B tokens).
-## Usage
-You can load and use the model directly with the `transformers` library:
-```python
-from transformers import AutoModelForCausalLM, AutoTokenizer
-model_name = "OvercastLab/Quark-50m-Instruct"
-tokenizer = AutoTokenizer.from_pretrained(model_name)
-model = AutoModelForCausalLM.from_pretrained(model_name)
-input_text = "The theory of relativity"
-inputs = tokenizer(input_text, return_tensors="pt")
-outputs = model.generate(**inputs, max_new_tokens=100)
-print(tokenizer.decode(outputs[0]))

 ---
+library_name: transformers
+model_name: sft_conv
 tags:
+- generated_from_trainer
+- trl
+- sft
+licence: license
 ---
+# Model Card for sft_conv
+This model is a fine-tuned version of [None](https://huggingface.co/None).
+It has been trained using [TRL](https://github.com/huggingface/trl).
+## Quick start
+```python
+from transformers import pipeline
+question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
+generator = pipeline("text-generation", model="None", device="cuda")
+output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
+print(output["generated_text"])
+```
+## Training procedure
+This model was trained with SFT.
+### Framework versions
+- TRL: 1.2.0
+- Transformers: 5.6.1
+- Pytorch: 2.4.1+cu124
+- Datasets: 4.8.4
+- Tokenizers: 0.22.2
+## Citations
+Cite TRL as:
+```bibtex
+@software{vonwerra2020trl,
+  title   = {{TRL: Transformers Reinforcement Learning}},
+  author  = {von Werra, Leandro and Belkada, Younes and Tunstall, Lewis and Beeching, Edward and Thrush, Tristan and Lambert, Nathan and Huang, Shengyi and Rasul, Kashif and Gallouédec, Quentin},
+  license = {Apache-2.0},
+  url     = {https://github.com/huggingface/trl},
+  year    = {2020}
+}
+```

config.json CHANGED Viewed

@@ -1,30 +1,34 @@
 {
-  "architectures": ["LlamaForCausalLM"],
-  "model_type": "llama",
-  "vocab_size": 49152,
   "hidden_size": 384,
   "intermediate_size": 1024,
-  "num_hidden_layers": 24,
   "num_attention_heads": 6,
   "num_key_value_heads": 2,
-  "head_dim": 64,
-  "hidden_act": "silu",
-  "max_position_embeddings": 2048,
-  "initializer_range": 0.02,
   "rms_norm_eps": 1e-05,
-  "rope_theta": 10000.0,
-  "rope_scaling": null,
   "rope_interleaved": false,
-  "attention_bias": true,
-  "attention_dropout": 0.0,
-  "mlp_bias": false,
   "tie_word_embeddings": true,
-  "torch_dtype": "bfloat16",
-  "bos_token_id": 1,
-  "eos_token_id": 2,
-  "pad_token_id": 2,
-  "use_cache": true,
-  "pretraining_tp": 1,
-  "is_llama_config": true,
-  "transformers_version": "4.42.3"
 }

 {
+  "architectures": [
+    "LlamaForCausalLM"
+  ],
+  "attention_bias": true,
+  "attention_dropout": 0.0,
+  "bos_token_id": 0,
+  "dtype": "bfloat16",
+  "eos_token_id": 0,
+  "head_dim": 64,
+  "hidden_act": "silu",
   "hidden_size": 384,
+  "initializer_range": 0.02,
   "intermediate_size": 1024,
+  "is_llama_config": true,
+  "max_position_embeddings": 2048,
+  "mlp_bias": false,
+  "model_type": "llama",
   "num_attention_heads": 6,
+  "num_hidden_layers": 24,
   "num_key_value_heads": 2,
+  "pad_token_id": 0,
+  "pretraining_tp": 1,
   "rms_norm_eps": 1e-05,
   "rope_interleaved": false,
+  "rope_parameters": {
+    "rope_theta": 10000.0,
+    "rope_type": "default"
+  },
   "tie_word_embeddings": true,
+  "transformers_version": "5.6.1",
+  "use_cache": false,
+  "vocab_size": 49152
 }

generation_config.json ADDED Viewed

	@@ -0,0 +1,11 @@

+{
+  "_from_model_config": true,
+  "bos_token_id": 0,
+  "eos_token_id": [
+    0,
+    2
+  ],
+  "pad_token_id": 0,
+  "transformers_version": "5.6.1",
+  "use_cache": true
+}

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:8ab7ac8c0adebf58cf54da3fa735f0a12e74fcaf7a404837825723eb355df2b3
-size 113346352

 version https://git-lfs.github.com/spec/v1
+oid sha256:431dbd275b83cb41bd28cdd1bb6d9c30e87ed1c5da31957e70867c9cc095efa7
+size 113367352

tokenizer_config.json CHANGED Viewed

@@ -24,9 +24,10 @@
     "<jupyter_script>",
     "<empty_output>"
   ],
-  "is_local": false,
   "model_max_length": 1000000000000000019884624838656,
-  "pad_token": null,
   "tokenizer_class": "GPT2Tokenizer",
   "unk_token": "<|endoftext|>",
   "vocab_size": 49152

     "<jupyter_script>",
     "<empty_output>"
   ],
+  "is_local": true,
+  "local_files_only": false,
   "model_max_length": 1000000000000000019884624838656,
+  "pad_token": "<|endoftext|>",
   "tokenizer_class": "GPT2Tokenizer",
   "unk_token": "<|endoftext|>",
   "vocab_size": 49152

training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7111b6742ad6cc5ab900057295f3d9c66f5ee720c6c73b73b0f9abad6b7f195c
+size 5304