Michael Goin
mgoin
AI & ML interests
LLM inference optimization, compression, quantization, pruning, distillation
Organizations
mgoin's activity
Compression script limits context length to 4098?
1
#1 opened 1 day ago
by
Kayvane
Where is Minitron-4B-Instruct?
1
#2 opened 2 days ago
by
mgoin
![](https://cdn-avatars.huggingface.co/v1/production/uploads/60466e4b4f40b01b66151416/sWaFR-fi_Bk9vy3EC5K0f.jpeg)
Is this compatible with the KV_Cache_dtype being FP8?
2
#1 opened 2 days ago
by
nickandbro
Are these models limited to H100s?
7
#2 opened 2 days ago
by
RonanMcGovern
![](https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/-6Yq7oM_Ju6Zi2GEvobvb.jpeg)
Replace kv_channels with head_dim
#1 opened 3 days ago
by
mgoin
![](https://cdn-avatars.huggingface.co/v1/production/uploads/60466e4b4f40b01b66151416/sWaFR-fi_Bk9vy3EC5K0f.jpeg)
Error serving model
3
#2 opened 6 days ago
by
EvGUT
How to load this model?
1
#1 opened 25 days ago
by
Frz614
How to run Meta-Llama-3-70B-Instruct-FP8 using several devices?
5
#3 opened about 1 month ago
by
Fertel
Update model.safetensors.index.json
#2 opened about 1 month ago
by
mgoin
![](https://cdn-avatars.huggingface.co/v1/production/uploads/60466e4b4f40b01b66151416/sWaFR-fi_Bk9vy3EC5K0f.jpeg)
Update model.safetensors.index.json
#4 opened about 1 month ago
by
mgoin
![](https://cdn-avatars.huggingface.co/v1/production/uploads/60466e4b4f40b01b66151416/sWaFR-fi_Bk9vy3EC5K0f.jpeg)
`model.safetensors.index.json` still has the legacy name`act_scale` for activation scales
1
#3 opened about 1 month ago
by
Alchan
Update README.md
#1 opened about 1 month ago
by
alexmarques
Update README.md
#1 opened about 1 month ago
by
alexmarques
Update README.md
#1 opened about 2 months ago
by
abhinavnmagic
![](https://cdn-avatars.huggingface.co/v1/production/uploads/64d50525ef543ae847f1257f/KTtgMWsS_wqv4jSYuSzp-.png)
Update README.md
#1 opened about 2 months ago
by
abhinavnmagic
![](https://cdn-avatars.huggingface.co/v1/production/uploads/64d50525ef543ae847f1257f/KTtgMWsS_wqv4jSYuSzp-.png)
Update README.md
#2 opened about 2 months ago
by
abhinavnmagic
![](https://cdn-avatars.huggingface.co/v1/production/uploads/64d50525ef543ae847f1257f/KTtgMWsS_wqv4jSYuSzp-.png)
Create README.md
#1 opened about 2 months ago
by
abhinavnmagic
![](https://cdn-avatars.huggingface.co/v1/production/uploads/64d50525ef543ae847f1257f/KTtgMWsS_wqv4jSYuSzp-.png)
Fails to run with nm-vllm
1
#1 opened 3 months ago
by
clintonruairi
Librarian Bot: Add language metadata for dataset
#2 opened 2 months ago
by
librarian-bot
![](https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg)
Inference GPU Ram requirement >60GB
1
#1 opened 2 months ago
by
Ksgk-fy
![](https://cdn-avatars.huggingface.co/v1/production/uploads/64d98ef7a4839890b25eb78b/215-CSVLl81z6CAq0ECWU.jpeg)
What conversion process are you using?
2
#2 opened 2 months ago
by
matt-psaltis-devbricks
What is Marlin?
2
#1 opened 3 months ago
by
Samvanity
Inference Issues
7
#1 opened 3 months ago
by
qeternity
![](https://cdn-avatars.huggingface.co/v1/production/uploads/6351bdcc21c37db00cbb1c9d/kSE7O99Du60DaBayOYSaR.jpeg)
Update README.md
#2 opened 4 months ago
by
shubhrapandit
![](https://cdn-avatars.huggingface.co/v1/production/uploads/60466e4b4f40b01b66151416/sWaFR-fi_Bk9vy3EC5K0f.jpeg)
![](https://cdn-avatars.huggingface.co/v1/production/uploads/60466e4b4f40b01b66151416/OSA7VIz8CTnlKb72IdfMM.png)
New activity in
neuralmagic/Llama-2-7b-dolphin-open_platypus-pruned_70-quantized-deepsparse
4 months ago
Update README.md
#1 opened 4 months ago
by
shubhrapandit
![](https://cdn-avatars.huggingface.co/v1/production/uploads/60466e4b4f40b01b66151416/sWaFR-fi_Bk9vy3EC5K0f.jpeg)
![](https://cdn-avatars.huggingface.co/v1/production/uploads/60466e4b4f40b01b66151416/OSA7VIz8CTnlKb72IdfMM.png)
New activity in
neuralmagic/Llama-2-7b-dolphin-open_platypus-pruned_50-quantized-deepsparse
4 months ago
Update README.md
#1 opened 4 months ago
by
shubhrapandit
Update README.md
#1 opened 4 months ago
by
shubhrapandit
Update README.md
#1 opened 4 months ago
by
shubhrapandit
Update README.md
#1 opened 4 months ago
by
abhinavnmagic
![](https://cdn-avatars.huggingface.co/v1/production/uploads/64d50525ef543ae847f1257f/KTtgMWsS_wqv4jSYuSzp-.png)
Update README.md
#1 opened 4 months ago
by
abhinavnmagic
![](https://cdn-avatars.huggingface.co/v1/production/uploads/64d50525ef543ae847f1257f/KTtgMWsS_wqv4jSYuSzp-.png)
Update README.md
#1 opened 4 months ago
by
abhinavnmagic
![](https://cdn-avatars.huggingface.co/v1/production/uploads/64d50525ef543ae847f1257f/KTtgMWsS_wqv4jSYuSzp-.png)
Update README.md
#1 opened 4 months ago
by
abhinavnmagic
![](https://cdn-avatars.huggingface.co/v1/production/uploads/64d50525ef543ae847f1257f/KTtgMWsS_wqv4jSYuSzp-.png)
Update README.md
#1 opened 4 months ago
by
abhinavnmagic
![](https://cdn-avatars.huggingface.co/v1/production/uploads/64d50525ef543ae847f1257f/KTtgMWsS_wqv4jSYuSzp-.png)
Update README.md
#1 opened 4 months ago
by
alexmarques
Update README.md
#1 opened 4 months ago
by
alexmarques
Update README.md
#1 opened 4 months ago
by
alexmarques
Update README.md
#1 opened 4 months ago
by
alexmarques
Update README.md
#1 opened 4 months ago
by
alexmarques
Update README.md
#4 opened 7 months ago
by
chrisxx
Update README with model author names and speedup numbers.
#3 opened 7 months ago
by
jen
![](https://cdn-avatars.huggingface.co/v1/production/uploads/607992183a565c15675055a9/gN748QT4PjTKTTp3qp07O.jpeg)
Update README.md
#1 opened 7 months ago
by
wendlerc
Adding `safetensors` variant of this model
#1 opened 8 months ago
by
mgoin
![](https://cdn-avatars.huggingface.co/v1/production/uploads/60466e4b4f40b01b66151416/sWaFR-fi_Bk9vy3EC5K0f.jpeg)
Create README.md
#2 opened 10 months ago
by
mgoin
![](https://cdn-avatars.huggingface.co/v1/production/uploads/60466e4b4f40b01b66151416/sWaFR-fi_Bk9vy3EC5K0f.jpeg)