charlesdedampierre
commited on
Commit
•
b05e402
1
Parent(s):
79cb7a6
Update README.md
Browse files
README.md
CHANGED
@@ -4,12 +4,113 @@ license: apache-2.0
|
|
4 |
|
5 |
## Model description
|
6 |
|
7 |
-
TopicNeuralHermes 2.5 Mistral 7B is a Mistral-based fine-tuned model,
|
8 |
|
9 |
-
The model was trained on a refined DPO dataset.
|
10 |
-
We
|
11 |
-
is that
|
12 |
|
13 |
-
|
14 |
|
15 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
4 |
|
5 |
## Model description
|
6 |
|
7 |
+
TopicNeuralHermes 2.5 Mistral 7B is a Mistral-based fine-tuned model, continuing from OpenHermes 2.5.
|
8 |
|
9 |
+
The model was trained on a refined DPO dataset. The objective was to train the model on a small portion of the DPO data. To achieve this, we compared two datasets used to train the reward model: the rejected Llama answers and the accepted ChatGPT answers from the [DPO dataset](mlabonne/chatml_dpo_pairs).
|
10 |
+
We then conducted topic modeling on both datasets, keeping only the topics that existed in the accepted dataset but not in the rejected one.
|
11 |
+
Our hypothesis is that these topics encapsulate the main differences between the two answering styles.
|
12 |
|
13 |
+
This method allows for quicker convergence with significantly less data (around 1/6 of the initial dataset).
|
14 |
|
15 |
+
Special thanks to [mlabonne](https://huggingface.co/mlabonne) for creating the [colab notebook](https://colab.research.google.com/drive/15iFBr1xWgztXvhrj5I9fBv20c7CFOPBE?usp=sharing#scrollTo=YpdkZsMNylvp) that facilitated the DPO Strategy.
|
16 |
+
|
17 |
+
We used [Bunkatopics](https://github.com/charlesdedampierre/BunkaTopics) to implement the topic modeling methods.
|
18 |
+
|
19 |
+
|
20 |
+
## Topic Analysis
|
21 |
+
|
22 |
+
We applied the topic modeling method to both datasets, extracting 30 topics from each.
|
23 |
+
These topics were characterized using the 10 most specific unigrams or bigrams.
|
24 |
+
We then compared the two sets of topics (30 from each dataset) and retained those in the accepted dataset that shared fewer than 2 terms with any topic in the rejected dataset
|
25 |
+
|
26 |
+
We found the 13 distincitve following topics described by 10 terms each:
|
27 |
+
|
28 |
+
|
29 |
+
**Emotional Dynamics**: feelings, Quinn, Austin, minority women, teaching, schools, individual, personality, backgrounds, triggers.
|
30 |
+
|
31 |
+
**Global Knowledge Queries**: question, information, geography, news articles, Step, answer, capital city, pipeline system, country, analogy.
|
32 |
+
|
33 |
+
**Digital Interactions and Queries**: questions, question, PersonX, modem, answers, effect relationship, Quora, browser, answer, e-commerce.
|
34 |
+
|
35 |
+
**Business and Cybersecurity**: email, businesses, initiatives, innovation, advertising papers, spam, breaches, antivirus, payments, prospects.
|
36 |
+
|
37 |
+
**Lifestyle and Wellness**: sleep, exercise, gifts, shopping, Casey, stores, stress, headaches, options, mood.
|
38 |
+
|
39 |
+
**Wildlife Ecology**: birds, prey, animals, species, infection, nest, eggs, bacteria, insects, kitty condo.
|
40 |
+
|
41 |
+
**Environmental Science and Climate**: temperature, gases, greenhouse, emissions, perturbation, sulfur, dioxide, climate change, water, heat.
|
42 |
+
|
43 |
+
**Maritime and Mechanical Engineering**: ship, bowling, propulsion, beam width, Filing cabinet, LED, lane, containment area, lawnmower, rotors.
|
44 |
+
|
45 |
+
**Cultural and Social Dynamics**: Lindsey, museum, Kate, Rachel, Jason, Alex, Erin, conversation, Laura, exhibits.
|
46 |
+
|
47 |
+
**Political Media Analysis**: media platforms, election, politics, teenagers, elections, White House, Barack Obama, nation, Confederate, depression.
|
48 |
+
|
49 |
+
**International Relations and Policy**: cooperation, EU, nations, alliance, NATO, European Union, member states, policy, monarch, Brexit.
|
50 |
+
|
51 |
+
**Astrophysics and Physical Sciences**: electrons, km, Moon, acceleration, orbit, friction, current, asteroid, electron, collector emitter.
|
52 |
+
|
53 |
+
**Film Critique and Analysis**: movie review, film, reviewer, sentiment, critic, flaws, DVD, plot, opinion, originality.
|
54 |
+
|
55 |
+
|
56 |
+
While those topics are not domain-specific, they did not appear right away in the rejected dataset. Further research need to undersand the reason behind the prominence of
|
57 |
+
those topics in the accepted dataset.
|
58 |
+
|
59 |
+
|
60 |
+
## Usage
|
61 |
+
You can run this model using LM Studio or any other frontend.
|
62 |
+
|
63 |
+
You can also run this model using the following code:
|
64 |
+
|
65 |
+
import transformers
|
66 |
+
from transformers import AutoTokenizer
|
67 |
+
|
68 |
+
# Format prompt
|
69 |
+
message = [
|
70 |
+
{"role": "system", "content": "You are a helpful assistant chatbot."},
|
71 |
+
{"role": "user", "content": "What is a Large Language Model?"}
|
72 |
+
]
|
73 |
+
tokenizer = AutoTokenizer.from_pretrained(new_model)
|
74 |
+
prompt = tokenizer.apply_chat_template(message, add_generation_prompt=True, tokenize=False)
|
75 |
+
|
76 |
+
# Create pipeline
|
77 |
+
pipeline = transformers.pipeline(
|
78 |
+
"text-generation",
|
79 |
+
model=new_model,
|
80 |
+
tokenizer=tokenizer
|
81 |
+
)
|
82 |
+
|
83 |
+
# Generate text
|
84 |
+
sequences = pipeline(
|
85 |
+
prompt,
|
86 |
+
do_sample=True,
|
87 |
+
temperature=0.7,
|
88 |
+
top_p=0.9,
|
89 |
+
num_return_sequences=1,
|
90 |
+
max_length=200,
|
91 |
+
)
|
92 |
+
print(sequences[0]['generated_text'])
|
93 |
+
Training hyperparameters
|
94 |
+
LoRA:
|
95 |
+
|
96 |
+
r=16
|
97 |
+
lora_alpha=16
|
98 |
+
lora_dropout=0.05
|
99 |
+
bias="none"
|
100 |
+
task_type="CAUSAL_LM"
|
101 |
+
target_modules=['k_proj', 'gate_proj', 'v_proj', 'up_proj', 'q_proj', 'o_proj', 'down_proj']
|
102 |
+
Training arguments:
|
103 |
+
|
104 |
+
per_device_train_batch_size=4
|
105 |
+
gradient_accumulation_steps=4
|
106 |
+
gradient_checkpointing=True
|
107 |
+
learning_rate=5e-5
|
108 |
+
lr_scheduler_type="cosine"
|
109 |
+
max_steps=200
|
110 |
+
optim="paged_adamw_32bit"
|
111 |
+
warmup_steps=100
|
112 |
+
DPOTrainer:
|
113 |
+
|
114 |
+
beta=0.1
|
115 |
+
max_prompt_length=1024
|
116 |
+
max_length=1536
|