# DeepSeek-R1-Distill-Llama-8B-q4f16_ft-MLC | |
| | Model Configuration | | |
|---------------------|:-------------------------------------------------------------------------------------------------------------:| | |
| Source Model | [`deepseek-ai/DeepSeek-R1-Distill-Llama-8B`](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B) | | |
| Inference API | `MLC_LLM` | | |
| Quantization | `q4f16_ft` | | |
| Model Type | `llama` | | |
| Vocab Size | `128256` | | |
| Context Window Size | `131072` | | |
| Prefill Chunk Size | `8192` | | |
| Temperature | `0.6` | | |
| Repetition Penalty | `1.0` | | |
| top_p | `0.95` | | |
| pad_token_id | `0` | | |
| bos_token_id | `128000` | | |
| eos_token_id | `128001` | | |
See [`jetson-ai-lab.com/models.html`](https://jetson-ai-lab.com/models.html) for benchmarks, examples, and containers to deploy local serving and inference for these quantized models. | |