Severian commited on
Commit
8a199a7
1 Parent(s): 7f2bcbe

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -6
README.md CHANGED
@@ -3,17 +3,19 @@ license: apache-2.0
3
  tags:
4
  - jamba
5
  datasets:
6
- - teknium/OpenHermes-2.5
7
  base_model: ai21labs/Jamba-v0.1
8
  pipeline_tag: text-generation
9
  ---
10
 
11
- # Jamba-Open-Hermes
12
 
13
  <img src="https://cdn-uploads.huggingface.co/production/uploads/64740cf7485a7c8e1bd51ac9/Ph6ZvxwF7a0m_B5Su_EK7.webp" width="500" height="500">
14
 
15
- # Current version works but it is very particular about having the right ChatML format and settings. Jamba has been somewhat difficult and expensive to train but I wanted to see how it did on one of the best datasets we have access to. I believe in transparent development so all *best* working iterations, even if they are a bit wonky, will be pushed here.
16
 
 
 
17
  ---
18
 
19
  ## Example Output:
@@ -97,11 +99,11 @@ print(tokenizer.batch_decode(outputs)[0])
97
  ## Training
98
 
99
 
100
- ### **Open-Hermes-2.0:**
101
 
102
  **FIRST TEST:**
103
- - *1000 Steps (5 hours x A100)*
104
- - *Final Loss: 3.48*
105
 
106
 
107
  ### Hyperparameters
 
3
  tags:
4
  - jamba
5
  datasets:
6
+ - Locutusque/hercules-v4.0
7
  base_model: ai21labs/Jamba-v0.1
8
  pipeline_tag: text-generation
9
  ---
10
 
11
+ # Jamba-Hercules
12
 
13
  <img src="https://cdn-uploads.huggingface.co/production/uploads/64740cf7485a7c8e1bd51ac9/Ph6ZvxwF7a0m_B5Su_EK7.webp" width="500" height="500">
14
 
15
+ # *Name was changed from Open-Hermes to Hercules. During multiple trainings and testings with lots of different datasets, I found that Jamba has BY FAR reacted the best to this dataset. It contains Open-Hermes-2.0 examples but offers A LOT more in diversity and complexity. Thanks to @Locutusque for the amazing work!
16
 
17
+ ## Datset used: Locutusque/hercules-v4.0
18
+ *- First 10k Examples*
19
  ---
20
 
21
  ## Example Output:
 
99
  ## Training
100
 
101
 
102
+ ### **Hercules-v4.0:**
103
 
104
  **FIRST TEST:**
105
+ - *1250 Steps (5 hours x A100)*
106
+ - *Final Loss: 0.98*
107
 
108
 
109
  ### Hyperparameters