Edit Models filters

Inference Providers

Nebius AI Studio

HF Inference API

Misc

compressed-tensors

Inference Endpoints

AutoTrain Compatible

text-generation-inference

8-bit precision

Misc with no match

4-bit precision

text-embeddings-inference

Carbon Emissions

Mixture of Experts

Models

900

Full-text search

Active filters: compressed-tensors

neuralmagic/Qwen2.5-14B-FP8-dynamic

Text Generation • Updated Dec 3, 2024 • 91

GusPuffy/Llama-3.1-70B-ArliAI-RPMax-v1.3-GPTQ

Updated Dec 3, 2024 • 26

neuralmagic/Qwen2.5-14B-quantized.w8a8

Text Generation • Updated Dec 3, 2024 • 55

neuralmagic/Qwen2.5-3B-quantized.w8a8

Text Generation • Updated Dec 3, 2024 • 28

nm-testing/SparseLlama-3.1-8B-gsm8k-pruned.2of4-FP8-Dynamic-Channel

Updated Dec 3, 2024 • 5

nm-testing/SparseLlama-3.1-8B-gsm8k-pruned.2of4-FP8-Dynamic

Updated Dec 3, 2024 • 4

Snowflake/Llama-3.1-SwiftKV-405B-Instruct-FP8

Updated Dec 13, 2024 • 44

BigHuggyD/knifeayumu_Behemoth-v1.2-Magnum-v4-123B-Dynamic-FP8

Updated Dec 3, 2024 • 3

nm-testing/llama-3-fp8-2of4-dynamic-uncompressed

Updated Dec 8, 2024 • 11

pL-Community/SauerkrautLM-v2-14b-DPO-FP8

Text Generation • Updated Dec 4, 2024 • 10 • 1

BigHuggyD/Monstral-123B-v2-FP8-Dynamic

Text Generation • Updated Dec 12, 2024 • 95 • 1

Snowflake/Llama-3.1-SwiftKV-8B-Instruct-FP8

Updated Dec 13, 2024 • 180

nm-testing/SparseLlama-3.1-8B-gsm8k-pruned.2of4-FP8-Dynamic-Channel-BitMaskCompressed

Updated Dec 6, 2024 • 3

yejingfu/q-Llama-3-70B-Euryale-v2.1-888

Updated Dec 5, 2024 • 7

yejingfu/q-Llama-3.1-70B-Euryale-v2.2-888

Updated Dec 5, 2024 • 4

yejingfu/q-Meta-Llama-3.1-70B-Instruct-888

Updated Dec 5, 2024 • 5

SicariusSicariiStuff/Impish_Mind_8B_FP8

Updated Dec 5, 2024 • 10

yejingfu/q-Meta-Llama-3-70B-Instruct-888

Updated Dec 6, 2024 • 11

nm-testing/SparseLlama-3.1-8B-gsm8k-pruned.2of4-FP8-Dynamic-testing

Updated Dec 6, 2024 • 6

nm-testing/Meta-Llama-3-8B-Instruct-FP8-Dynamic-2of4-testing

Updated Dec 6, 2024 • 37

nm-testing/Qwen2-VL-2B-Instruct-FP8-dynamic

Updated Dec 6, 2024 • 27

nm-testing/Meta-Llama-3-8B-Instruct-FP8-Static-testing

Updated Dec 6, 2024 • 34

nm-testing/Meta-Llama-3-8B-Instruct-FP8-Static-Per-Tensor-testing

Updated Dec 6, 2024 • 41

nm-testing/Meta-Llama-3-8B-Instruct-FP8-Dynamic-IA-Per-Tensor-Weight-testing

Updated Dec 6, 2024 • 35

yejingfu/q-Llama-3.3-70B-Instruct-888

Updated Dec 7, 2024 • 23

cortecs/Llama-3_1-Nemotron-51B-Instruct-FP8-Dynamic

Text Generation • Updated Dec 8, 2024 • 9

reinforce20001/SakuraLLM.Sakura-7B-Qwen2.5-v1.0-W8A8-Int8

Updated Dec 8, 2024 • 69 • 1

nm-testing/TinyLlama-1.1B-Chat-v1.0-2of4-Sparse-Dense-Compressor

Updated Dec 8, 2024 • 15

nm-testing/TinyLlama-1.1B-Chat-v1.0-INT8-Dynamic-IA-Per-Channel-Weight-testing

Updated Dec 8, 2024 • 36

nm-testing/TinyLlama-1.1B-Chat-v1.0-INT8-Dynamic-IA-Per-Tensor-Weight-testing

Updated Dec 8, 2024 • 35