omarelshehy commited on
Commit
fbe6046
1 Parent(s): 1f2f0b3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +24 -3
README.md CHANGED
@@ -1,3 +1,24 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - teknium/OpenHermes-2.5
5
+ ---
6
+ This is a finetuned base model from [OpenHermes-2.5](https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B) for the trained medusa head [OpenHermes-2.5-medusa](omarelshehy/OpenHermes-2.5-Mistral-7B-medusa)
7
+
8
+ The base model and the medusa heads were trained together, therefore ideally should be used together for the best performance.
9
+
10
+ WIP: Replace the model with an adapter to the original model
11
+
12
+ # Training Details
13
+
14
+ The model and the heads were trained using a self-distilled dataset inferred from the original dataset used for training https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B
15
+
16
+ The inference on the dataset was done using [vLLM](https://docs.vllm.ai/en/latest/index.html) async server on a A100.
17
+
18
+ The training was performed with the help of [Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl) on a single A100 GPU using qLora for 2 epochs
19
+
20
+ # Inference evaluation
21
+ (This is still a WIP)
22
+ I tested the model's latency performance using [TGI](https://huggingface.co/docs/text-generation-inference/en/index). As reported by several people the model's performance depends on the domain or task. Generally speaking however i measured 1.9x improvement in latency. With code related tasks however, the latency can reach 3x improvement.
23
+
24
+