optimum-neuron-cache / inference-cache-config

Commit History

Update inference-cache-config/llama-variants.json
e7179a3
verified

dacorvo HF staff commited on

Rename inference-cache-config/llama2.json to inference-cache-config/llama2-7b-13b.json
be28bda
verified

dacorvo HF staff commited on

Create llama2-70b.json
6fe6ee4
verified

dacorvo HF staff commited on

Rename inference-cache-config/llama3.json to inference-cache-config/llama3-8b.json
06bc70d
verified

dacorvo HF staff commited on

Create llama3-70b.json
2695ea9
verified

dacorvo HF staff commited on

Create mixtral.json
57652e6
verified

dacorvo HF staff commited on

Add more batch_size for mistral on smaller instances
545cd4d
verified

dacorvo HF staff commited on

Update Mistral cached configurations
ee458f5
verified

dacorvo HF staff commited on

Use princeton-nlp/Sheared-LLaMA-1.3B as a test model
695b341
verified

dacorvo HF staff commited on

Remove llama2 7B config for 24 cores
17e7257
verified

dacorvo HF staff commited on

Update inference-cache-config/llama3.json
5d8c4f2
verified

dacorvo HF staff commited on

Update inference-cache-config/llama3.json
f5aae68
verified

dacorvo HF staff commited on

Create llama3.json
f93cadb
verified

dacorvo HF staff commited on

Rename inference-cache-config/llama.json to inference-cache-config/llama2.json
f06a55a
verified

dacorvo HF staff commited on

Add more gpt2 configurations
3fbf810
verified

dacorvo HF staff commited on

Add more llama config
2d87237
verified

dacorvo HF staff commited on

Add Mistral-v2
20e585f
verified

dacorvo HF staff commited on

Create stable-diffusion.json (#43)
32561fe
verified

philschmid HF staff Jingya HF staff commited on

Remove SalesForce embedding model
1cd13f9
verified

dacorvo HF staff commited on

Add Zephyr to mistral variants
9164704
verified

dacorvo HF staff commited on

Remove variants from main mistral config
ef07aca
verified

dacorvo HF staff commited on

Add mistral most popular variants
d3983e8
verified

dacorvo HF staff commited on

Add most popular llama variants
594abb2
verified

dacorvo HF staff commited on

Added teknium/OpenHermes-2.5-Mistral-7B
1518247
verified

dacorvo HF staff commited on

Added Llama-70b batch_size 4 to inference cache
593822e
verified

dacorvo HF staff commited on

Create mistral.json
b5d0afd
verified

philschmid HF staff commited on

Create gpt2.json
3bdb891
verified

philschmid HF staff commited on

Create inference-cache-config/llama.json
1960ccb
verified

philschmid HF staff commited on