charlesdedampierre commited on
Commit
b05e402
1 Parent(s): 79cb7a6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +107 -6
README.md CHANGED
@@ -4,12 +4,113 @@ license: apache-2.0
4
 
5
  ## Model description
6
 
7
- TopicNeuralHermes 2.5 Mistral 7B is a Mistral-based fine-tuned model, as a continuuaion of OpenHermes 2.5.
8
 
9
- The model was trained on a refined DPO dataset. We compared the rejected and accepted in hte DPO datastes adn tried to find the reasons behind acceptance or rejection.
10
- We used Topic Modeling methods (hence TopicNeuralHermes) on both datasets and only kept the topics that existed in the ChatGPT responses and not in the LLama repsonses. Our hypothesis
11
- is that those topics encapsulate the main differences between the two ways of answering. This method can help converge quicker and with way less data (around 1/6 of the initial dataset)
12
 
13
- Bug thanks to https://huggingface.co/mlabonne for the notebbok he created that helped carry out the DPO Strategy.
14
 
15
- We use [Bunkatopics](https://github.com/charlesdedampierre/BunkaTopics) to carry out the Topic Modeling methods.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
 
5
  ## Model description
6
 
7
+ TopicNeuralHermes 2.5 Mistral 7B is a Mistral-based fine-tuned model, continuing from OpenHermes 2.5.
8
 
9
+ The model was trained on a refined DPO dataset. The objective was to train the model on a small portion of the DPO data. To achieve this, we compared two datasets used to train the reward model: the rejected Llama answers and the accepted ChatGPT answers from the [DPO dataset](mlabonne/chatml_dpo_pairs).
10
+ We then conducted topic modeling on both datasets, keeping only the topics that existed in the accepted dataset but not in the rejected one.
11
+ Our hypothesis is that these topics encapsulate the main differences between the two answering styles.
12
 
13
+ This method allows for quicker convergence with significantly less data (around 1/6 of the initial dataset).
14
 
15
+ Special thanks to [mlabonne](https://huggingface.co/mlabonne) for creating the [colab notebook](https://colab.research.google.com/drive/15iFBr1xWgztXvhrj5I9fBv20c7CFOPBE?usp=sharing#scrollTo=YpdkZsMNylvp) that facilitated the DPO Strategy.
16
+
17
+ We used [Bunkatopics](https://github.com/charlesdedampierre/BunkaTopics) to implement the topic modeling methods.
18
+
19
+
20
+ ## Topic Analysis
21
+
22
+ We applied the topic modeling method to both datasets, extracting 30 topics from each.
23
+ These topics were characterized using the 10 most specific unigrams or bigrams.
24
+ We then compared the two sets of topics (30 from each dataset) and retained those in the accepted dataset that shared fewer than 2 terms with any topic in the rejected dataset
25
+
26
+ We found the 13 distincitve following topics described by 10 terms each:
27
+
28
+
29
+ **Emotional Dynamics**: feelings, Quinn, Austin, minority women, teaching, schools, individual, personality, backgrounds, triggers.
30
+
31
+ **Global Knowledge Queries**: question, information, geography, news articles, Step, answer, capital city, pipeline system, country, analogy.
32
+
33
+ **Digital Interactions and Queries**: questions, question, PersonX, modem, answers, effect relationship, Quora, browser, answer, e-commerce.
34
+
35
+ **Business and Cybersecurity**: email, businesses, initiatives, innovation, advertising papers, spam, breaches, antivirus, payments, prospects.
36
+
37
+ **Lifestyle and Wellness**: sleep, exercise, gifts, shopping, Casey, stores, stress, headaches, options, mood.
38
+
39
+ **Wildlife Ecology**: birds, prey, animals, species, infection, nest, eggs, bacteria, insects, kitty condo.
40
+
41
+ **Environmental Science and Climate**: temperature, gases, greenhouse, emissions, perturbation, sulfur, dioxide, climate change, water, heat.
42
+
43
+ **Maritime and Mechanical Engineering**: ship, bowling, propulsion, beam width, Filing cabinet, LED, lane, containment area, lawnmower, rotors.
44
+
45
+ **Cultural and Social Dynamics**: Lindsey, museum, Kate, Rachel, Jason, Alex, Erin, conversation, Laura, exhibits.
46
+
47
+ **Political Media Analysis**: media platforms, election, politics, teenagers, elections, White House, Barack Obama, nation, Confederate, depression.
48
+
49
+ **International Relations and Policy**: cooperation, EU, nations, alliance, NATO, European Union, member states, policy, monarch, Brexit.
50
+
51
+ **Astrophysics and Physical Sciences**: electrons, km, Moon, acceleration, orbit, friction, current, asteroid, electron, collector emitter.
52
+
53
+ **Film Critique and Analysis**: movie review, film, reviewer, sentiment, critic, flaws, DVD, plot, opinion, originality.
54
+
55
+
56
+ While those topics are not domain-specific, they did not appear right away in the rejected dataset. Further research need to undersand the reason behind the prominence of
57
+ those topics in the accepted dataset.
58
+
59
+
60
+ ## Usage
61
+ You can run this model using LM Studio or any other frontend.
62
+
63
+ You can also run this model using the following code:
64
+
65
+ import transformers
66
+ from transformers import AutoTokenizer
67
+
68
+ # Format prompt
69
+ message = [
70
+ {"role": "system", "content": "You are a helpful assistant chatbot."},
71
+ {"role": "user", "content": "What is a Large Language Model?"}
72
+ ]
73
+ tokenizer = AutoTokenizer.from_pretrained(new_model)
74
+ prompt = tokenizer.apply_chat_template(message, add_generation_prompt=True, tokenize=False)
75
+
76
+ # Create pipeline
77
+ pipeline = transformers.pipeline(
78
+ "text-generation",
79
+ model=new_model,
80
+ tokenizer=tokenizer
81
+ )
82
+
83
+ # Generate text
84
+ sequences = pipeline(
85
+ prompt,
86
+ do_sample=True,
87
+ temperature=0.7,
88
+ top_p=0.9,
89
+ num_return_sequences=1,
90
+ max_length=200,
91
+ )
92
+ print(sequences[0]['generated_text'])
93
+ Training hyperparameters
94
+ LoRA:
95
+
96
+ r=16
97
+ lora_alpha=16
98
+ lora_dropout=0.05
99
+ bias="none"
100
+ task_type="CAUSAL_LM"
101
+ target_modules=['k_proj', 'gate_proj', 'v_proj', 'up_proj', 'q_proj', 'o_proj', 'down_proj']
102
+ Training arguments:
103
+
104
+ per_device_train_batch_size=4
105
+ gradient_accumulation_steps=4
106
+ gradient_checkpointing=True
107
+ learning_rate=5e-5
108
+ lr_scheduler_type="cosine"
109
+ max_steps=200
110
+ optim="paged_adamw_32bit"
111
+ warmup_steps=100
112
+ DPOTrainer:
113
+
114
+ beta=0.1
115
+ max_prompt_length=1024
116
+ max_length=1536