Spaces:

FluffyAIcode
/

LLM-KA-Cache-Compress

Running

cryptobiosis commited on 9 days ago

Commit

2b572cf

verified ·

1 Parent(s): 292780d

Drop 'demo' suffix from page title

Files changed (2) hide show

README.md CHANGED Viewed

@@ -9,7 +9,7 @@ pinned: false
 license: apache-2.0
 ---
-# KakeyaLattice KV-cache compression demo
 Side-by-side comparison of **bf16 DynamicCache** vs **KakeyaLattice E8**
 compression at three quality levels (Q=10 aggressive, Q=38 balanced,

 license: apache-2.0
 ---
+# KakeyaLattice KV-cache compression
 Side-by-side comparison of **bf16 DynamicCache** vs **KakeyaLattice E8**
 compression at three quality levels (Q=10 aggressive, Q=38 balanced,

app.py CHANGED Viewed

@@ -146,9 +146,9 @@ EXAMPLE_PROMPTS = [
 ]
-with gr.Blocks(title="KakeyaLattice KV-cache compression demo") as demo:
     gr.Markdown(
-        "# KakeyaLattice KV-cache compression demo\n\n"
         "Compare generation output + latency across **bf16 baseline** and "
         "three **KakeyaLattice E8** compression levels on a small HF causal LM. "
         "The E8 variant uses 8-D nested-lattice closest-point quantisation "

 ]
+with gr.Blocks(title="KakeyaLattice KV-cache compression") as demo:
     gr.Markdown(
+        "# KakeyaLattice KV-cache compression\n\n"
         "Compare generation output + latency across **bf16 baseline** and "
         "three **KakeyaLattice E8** compression levels on a small HF causal LM. "
         "The E8 variant uses 8-D nested-lattice closest-point quantisation "