|
--- |
|
license: apache-2.0 |
|
base_model: teknium/OpenHermes-2.5-Mistral-7B |
|
datasets: |
|
- bunkalab/topic_based_chatml_dpo_pairs |
|
library_name: Bunkatopics |
|
widget: |
|
- text: Tell a danish joke in french |
|
pipeline_tag: text-generation |
|
--- |
|
|
|
|
|
![image/png](https://cdn-uploads.huggingface.co/production/uploads/63c13d74f02ef5b95e0e448e/LCntraaGmEF6W7I9DEA-1.png) |
|
|
|
## Model description |
|
|
|
|
|
TopicNeuralHermes 2.5 Mistral 7B is a refined model developed through fine-tuning with a specific subset of data, selected via Topic Modeling Techniques using [Bunkatopics](https://github.com/charlesdedampierre/BunkaTopics), as a continuing from [OpenHermes 2.5](https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B). |
|
|
|
|
|
The model was trained on a refined DPO dataset. The objective was to train the model on a small portion of the DPO data. To achieve this, we compared two datasets used to train the reward model: the rejected Llama answers and the accepted ChatGPT answers from the [DPO dataset](mlabonne/chatml_dpo_pairs). |
|
We then conducted topic modeling on both datasets, keeping only the topics that existed in the accepted dataset but not in the rejected one. |
|
Our hypothesis is that these topics encapsulate the main differences between the two answering styles. |
|
|
|
This method allows for quicker convergence with significantly less data (around 1/6 of the initial dataset). The Dataset can be found at [bunkalab/topic_based_chatml_dpo_pairs](https://huggingface.co/datasets/bunkalab/topic_based_chatml_dpo_pairs) |
|
|
|
Special thanks to [mlabonne](https://huggingface.co/mlabonne) for creating the [colab notebook](https://colab.research.google.com/drive/15iFBr1xWgztXvhrj5I9fBv20c7CFOPBE?usp=sharing#scrollTo=YpdkZsMNylvp) that facilitated the DPO Strategy. |
|
|
|
Results of the model can be found here: We do as well as similar models with way less data and computing power :) |
|
|
|
![image/png](https://cdn-uploads.huggingface.co/production/uploads/63c13d74f02ef5b95e0e448e/TOEijYNgtY6B7U9Pn29gL.png) |
|
|
|
## Topic Analysis |
|
|
|
We applied the topic modeling method to both datasets, extracting 30 topics from each. |
|
These topics were characterized using the 10 most specific unigrams or bigrams. |
|
We then compared the two sets of topics (30 from each dataset) and retained those in the accepted dataset that shared fewer than 2 terms with any topic in the rejected dataset |
|
|
|
We found the 13 distinctive following topics described by 10 terms each: |
|
|
|
|
|
**Emotional Dynamics**: feelings, Quinn, Austin, minority women, teaching, schools, individual, personality, backgrounds, triggers. |
|
|
|
**Global Knowledge Queries**: question, information, geography, news articles, Step, answer, capital city, pipeline system, country, analogy. |
|
|
|
**Digital Interactions and Queries**: questions, question, PersonX, modem, answers, effect relationship, Quora, browser, answer, e-commerce. |
|
|
|
**Business and Cybersecurity**: email, businesses, initiatives, innovation, advertising papers, spam, breaches, antivirus, payments, prospects. |
|
|
|
**Lifestyle and Wellness**: sleep, exercise, gifts, shopping, Casey, stores, stress, headaches, options, mood. |
|
|
|
**Wildlife Ecology**: birds, prey, animals, species, infection, nest, eggs, bacteria, insects, kitty condo. |
|
|
|
**Environmental Science and Climate**: temperature, gases, greenhouse, emissions, perturbation, sulfur, dioxide, climate change, water, heat. |
|
|
|
**Maritime and Mechanical Engineering**: ship, bowling, propulsion, beam width, Filing cabinet, LED, lane, containment area, lawnmower, rotors. |
|
|
|
**Cultural and Social Dynamics**: Lindsey, museum, Kate, Rachel, Jason, Alex, Erin, conversation, Laura, exhibits. |
|
|
|
**Political Media Analysis**: media platforms, election, politics, teenagers, elections, White House, Barack Obama, nation, Confederate, depression. |
|
|
|
**International Relations and Policy**: cooperation, EU, nations, alliance, NATO, European Union, member states, policy, monarch, Brexit. |
|
|
|
**Astrophysics and Physical Sciences**: electrons, km, Moon, acceleration, orbit, friction, current, asteroid, electron, collector emitter. |
|
|
|
**Film Critique and Analysis**: movie review, film, reviewer, sentiment, critic, flaws, DVD, plot, opinion, originality. |
|
|
|
|
|
While those topics are not domain-specific, they did not appear right away in the rejected dataset. Further research need to undersand the reason behind the prominence of |
|
those topics in the accepted dataset. |
|
|
|
|
|
## Usage |
|
You can run this model using LM Studio or any other frontend. |
|
|
|
You can also run this model using the following code: |
|
|
|
```python |
|
import transformers |
|
from transformers import AutoTokenizer |
|
|
|
# Format prompt |
|
message = [ |
|
{"role": "system", "content": "You are a helpful assistant chatbot."}, |
|
{"role": "user", "content": "What is Topic Modeling?"} |
|
] |
|
tokenizer = AutoTokenizer.from_pretrained('charlesdedampierre/TopicNeuralHermes-2.5-Mistral-7B') |
|
prompt = tokenizer.apply_chat_template(message, add_generation_prompt=True, tokenize=False) |
|
|
|
# Create pipeline |
|
pipeline = transformers.pipeline( |
|
"text-generation", |
|
model='charlesdedampierre/TopicNeuralHermes-2.5-Mistral-7B', |
|
tokenizer=tokenizer |
|
) |
|
|
|
# Generate text |
|
sequences = pipeline( |
|
prompt, |
|
do_sample=True, |
|
temperature=0.7, |
|
top_p=0.9, |
|
num_return_sequences=1, |
|
max_length=200, |
|
) |
|
print(sequences[0]['generated_text']) |
|
``` |
|
|
|
|
|
## Training hyperparameters |
|
|
|
**LoRA**: |
|
* r=16 |
|
* lora_alpha=16 |
|
* lora_dropout=0.05 |
|
* bias="none" |
|
* task_type="CAUSAL_LM" |
|
* target_modules=['k_proj', 'gate_proj', 'v_proj', 'up_proj', 'q_proj', 'o_proj', 'down_proj'] |
|
|
|
**Training arguments**: |
|
* per_device_train_batch_size=4 |
|
* gradient_accumulation_steps=4 |
|
* gradient_checkpointing=True |
|
* learning_rate=5e-5 |
|
* lr_scheduler_type="cosine" |
|
* max_steps=200 |
|
* optim="paged_adamw_32bit" |
|
* warmup_steps=100 |
|
|
|
**DPOTrainer**: |
|
* beta=0.1 |
|
* max_prompt_length=1024 |
|
* max_length=1536 |
|
|
|
|
|
You can find the results of the running on Weights & Biases: https://wandb.ai/bunka/huggingface/runs/xq59p47g?workspace=user-charlesdedampierre |
|
|
|
|
|
## Model Family Tree |
|
|
|
|
|
![image/png](https://cdn-uploads.huggingface.co/production/uploads/63c13d74f02ef5b95e0e448e/MDtFeO_SoigL748c6xTmc.png) |