SakanaAI
/

TinySwallow-1.5B

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

mkshing commited on 25 days ago

Commit

75d840a

·

verified ·

1 Parent(s): ccbe84b

Update README.md

Files changed (1) hide show

README.md +87 -3

README.md CHANGED Viewed

@@ -1,3 +1,87 @@
----
-license: apache-2.0
----

+---
+license: apache-2.0
+language:
+- ja
+pipeline_tag: text-generation
+library_name: transformers
+---
+# Smol-Swallow-1.5B
+🤗 [Models](https://huggingface.co/SakanaAI) | 📚 [Paper](https://arxiv.org/abs/TODO) | 📝 [Blog](https://sakana.ai/taid/) | 🐦 [Twitter](https://twitter.com/SakanaAILabs)
+**Smol-Swallow-1.5B** is a Japanese compact language model, using our new knowledge distillation method called TAID.
+We used [Qwen2.5-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct) as the teacher model and
+[Qwen2.5-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct) as the student model.
+## Usage
+Use the code below to get started with the model.
+<details>
+<summary> Click to expand </summary>
+```python
+import torch
+from transformers import AutoModelForCausalLM, AutoTokenizer
+# 1. load model
+device = "cuda" if torch.cuda.is_available() else "CPU"
+repo_id = "SakanaAI/Smol-Swallow-1.5B"
+model = AutoModelForCausalLM.from_pretrained(repo_id, torch_dtype="auto")
+tokenizer = AutoTokenizer.from_pretrained(repo_id)
+model.to(device)
+# 2. prepare inputs
+text = "拝啓\n"
+inputs = tokenizer(text, return_tensors="pt")
+# 3. generate
+output_ids = model.generate(**inputs.to(device))
+output_ids = output_ids[:, inputs.input_ids.shape[1] :]
+generated_text = tokenizer.batch_decode(output_ids, skip_special_tokens=True)[0]
+print(generated_text)
+```
+</details>
+## Model Details
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [Sakana AI](https://sakana.ai/) and [Swallow Team](https://swallow-llm.github.io/index.en.html)
+- **Model type:** Autoregressive Language Model
+- **Language(s):** Japanese
+- **License:** [Apache License, Version 2.0](./LICENSE)
+- **Repository:** [SakanaAI/TAID](https://github.com/SakanaAI/TAID)
+- **Paper:** https://arxiv.org/abs/TODO
+- **Blog:** https://sakana.ai/taid
+<!-- ## Model Performance -->
+## Uses
+This model is provided for research and development purposes only and should be considered as an experimental prototype.
+It is not intended for commercial use or deployment in mission-critical environments.
+Use of this model is at the user's own risk, and its performance and outcomes are not guaranteed.
+Sakana AI shall not be liable for any direct, indirect, special, incidental, or consequential damages, or any loss arising from the use of this model, regardless of the results obtained.
+Users must fully understand the risks associated with the use of this model and use it at their own discretion.
+## Acknowledgement
+We would like to thank the developers of the source models for their contributions and for making their work available.
+## Citation
+```bibtex
+@misc{sakana2025taid,
+      title         = {TAID: Temporally Adaptive Interpolated Distillation for Efficient Knowledge Transfer in Language Models},
+      author.       = {Makoto Shing and Ko Misaki and Han Bao and Sho Yokoi and Takuya Akiba},
+      year          = {2025},
+      eprint        = {TODO},
+      archivePrefix = {arXiv},
+      primaryClass  = {cs.NE}
+}
+```