SakanaAI
/

TinySwallow-1.5B

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

mkshing commited on 24 days ago

Commit

15205d9

·

verified ·

1 Parent(s): ad00717

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -12,7 +12,7 @@ base_model:
 🤗 [Models](https://huggingface.co/SakanaAI) | 📚 [Paper](https://arxiv.org/abs/TODO) | 📝 [Blog](https://sakana.ai/taid/) | 🐦 [Twitter](https://twitter.com/SakanaAILabs)
 **Smol-Swallow-1.5B** is a Japanese compact language model created through TAID (Temporally Adaptive Interpolated Distillation), our new knowledge distillation method.
-We used [Qwen2.5-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct) as the teacher model and [Qwen2.5-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct) as the student model, achieving state-of-the-art performance among Japanese language models under 3B parameters.
 The model has been further pre-trained on Japanese text data to enhance its Japanese language capabilities.
 ## Usage

 🤗 [Models](https://huggingface.co/SakanaAI) | 📚 [Paper](https://arxiv.org/abs/TODO) | 📝 [Blog](https://sakana.ai/taid/) | 🐦 [Twitter](https://twitter.com/SakanaAILabs)
 **Smol-Swallow-1.5B** is a Japanese compact language model created through TAID (Temporally Adaptive Interpolated Distillation), our new knowledge distillation method.
+We used [Qwen2.5-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct) as the teacher model and [Qwen2.5-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct) as the student model.
 The model has been further pre-trained on Japanese text data to enhance its Japanese language capabilities.
 ## Usage