aws-neuron
/

optimum-neuron-cache

dacorvo HF staff commited on 26 days ago

Commit

2695ea9

•

1 Parent(s): f95d8c1

Create llama3-70b.json

Files changed (1) hide show

inference-cache-config/llama3-70b.json ADDED Viewed

+{
+  "meta-llama/Meta-Llama-3-70B": [
+    {
+      "batch_size": 1,
+      "sequence_length": 4096,
+      "num_cores": 24,
+      "auto_cast_type": "fp16"
+    },
+    {
+      "batch_size": 4,
+      "sequence_length": 4096,
+      "num_cores": 24,
+      "auto_cast_type": "fp16"
+    }
+  ]
+}