mkshing commited on
Commit
15205d9
Β·
verified Β·
1 Parent(s): ad00717

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -12,7 +12,7 @@ base_model:
12
  πŸ€— [Models](https://huggingface.co/SakanaAI) | πŸ“š [Paper](https://arxiv.org/abs/TODO) | πŸ“ [Blog](https://sakana.ai/taid/) | 🐦 [Twitter](https://twitter.com/SakanaAILabs)
13
 
14
  **Smol-Swallow-1.5B** is a Japanese compact language model created through TAID (Temporally Adaptive Interpolated Distillation), our new knowledge distillation method.
15
- We used [Qwen2.5-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct) as the teacher model and [Qwen2.5-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct) as the student model, achieving state-of-the-art performance among Japanese language models under 3B parameters.
16
  The model has been further pre-trained on Japanese text data to enhance its Japanese language capabilities.
17
 
18
  ## Usage
 
12
  πŸ€— [Models](https://huggingface.co/SakanaAI) | πŸ“š [Paper](https://arxiv.org/abs/TODO) | πŸ“ [Blog](https://sakana.ai/taid/) | 🐦 [Twitter](https://twitter.com/SakanaAILabs)
13
 
14
  **Smol-Swallow-1.5B** is a Japanese compact language model created through TAID (Temporally Adaptive Interpolated Distillation), our new knowledge distillation method.
15
+ We used [Qwen2.5-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct) as the teacher model and [Qwen2.5-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct) as the student model.
16
  The model has been further pre-trained on Japanese text data to enhance its Japanese language capabilities.
17
 
18
  ## Usage