Add readme
#6
by
avishnevskiy
- opened
- BTLMvsBTLM-DPO.png +0 -0
- README.md +170 -0
- chat_performance.png +0 -0
BTLMvsBTLM-DPO.png
ADDED
README.md
CHANGED
@@ -1,3 +1,173 @@
|
|
1 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2 |
license: apache-2.0
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
+
language:
|
3 |
+
- en
|
4 |
+
inference: false
|
5 |
+
tags:
|
6 |
+
- pytorch
|
7 |
+
- causal-lm
|
8 |
+
- Cerebras
|
9 |
+
- BTLM
|
10 |
+
datasets:
|
11 |
+
- cerebras/SlimPajama-627B
|
12 |
+
- Anthropic/hh-rlhf
|
13 |
+
pipeline_tag: text-generation
|
14 |
license: apache-2.0
|
15 |
---
|
16 |
+
|
17 |
+
# BTLM-3B-8k-dpo
|
18 |
+
|
19 |
+
BTLM-3B-8k-dpo is a chat version of the [BTLM-3B-8K](cerebras/btlm-3b-8k-base) model trained using [DPO](https://arxiv.org/abs/2305.18290) method on [Anthropic-HH-RLHF](Anthropic/hh-rlhf) dataset. The model was specifically trained to align to human preferences and optimized for dialogue use cases.
|
20 |
+
|
21 |
+
|
22 |
+
|
23 |
+
## BTLM-3B-8k-dpo Highlights
|
24 |
+
|
25 |
+
BTLM-3B-8k-dpo:
|
26 |
+
- **Licensed for commercial use** (Apache 2.0).
|
27 |
+
- **+2.26% improvement on BTLM Eleuther Harness tasks over BTLM base model**.
|
28 |
+
- **Improved chat capabilities**.
|
29 |
+
- **Reduced harmlessness and increased helpfulness**.
|
30 |
+
|
31 |
+
## Usage
|
32 |
+
*Note: Transformers does not support muP for all models, so BTLM-3B-8k-dpo requires a custom model class. This causes a situation where users must either (1) enable `trust_remote_code=True` when loading the model or (2) acknowledge the warning about code execution upon loading the model.*
|
33 |
+
|
34 |
+
#### With generate():
|
35 |
+
```python
|
36 |
+
from transformers import AutoTokenizer, AutoModelForCausalLM
|
37 |
+
|
38 |
+
# Load the tokenizer and model
|
39 |
+
tokenizer = AutoTokenizer.from_pretrained("cerebras/btlm-3b-8k-dpo")
|
40 |
+
model = AutoModelForCausalLM.from_pretrained("cerebras/btlm-3b-8k-dpo", trust_remote_code=True, torch_dtype="auto")
|
41 |
+
|
42 |
+
# Set the prompt for generating text
|
43 |
+
prompt = "Albert Einstein was known for "
|
44 |
+
|
45 |
+
# Tokenize the prompt and convert to PyTorch tensors
|
46 |
+
inputs = tokenizer(prompt, return_tensors="pt")
|
47 |
+
|
48 |
+
# Generate text using the model
|
49 |
+
outputs = model.generate(
|
50 |
+
**inputs,
|
51 |
+
num_beams=5,
|
52 |
+
max_new_tokens=50,
|
53 |
+
early_stopping=True,
|
54 |
+
no_repeat_ngram_size=2
|
55 |
+
)
|
56 |
+
|
57 |
+
# Convert the generated token IDs back to text
|
58 |
+
generated_text = tokenizer.batch_decode(outputs, skip_special_tokens=True)
|
59 |
+
|
60 |
+
# Print the generated text
|
61 |
+
print(generated_text[0])
|
62 |
+
```
|
63 |
+
|
64 |
+
#### With pipeline:
|
65 |
+
```python
|
66 |
+
from transformers import AutoTokenizer, AutoModelForCausalLM
|
67 |
+
from transformers import pipeline
|
68 |
+
|
69 |
+
# Load the tokenizer and model
|
70 |
+
tokenizer = AutoTokenizer.from_pretrained("cerebras/btlm-3b-8k-dpo")
|
71 |
+
model = AutoModelForCausalLM.from_pretrained("cerebras/btlm-3b-8k-dpo", trust_remote_code=True, torch_dtype="auto")
|
72 |
+
|
73 |
+
# Set the prompt for text generation
|
74 |
+
prompt = """Isaac Newton was a """
|
75 |
+
|
76 |
+
# Create a text generation pipeline
|
77 |
+
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)
|
78 |
+
|
79 |
+
# Generate text using the pipeline
|
80 |
+
generated_text = pipe(
|
81 |
+
prompt,
|
82 |
+
max_length=50,
|
83 |
+
do_sample=False,
|
84 |
+
no_repeat_ngram_size=2)[0]
|
85 |
+
|
86 |
+
# Print the generated text
|
87 |
+
print(generated_text['generated_text'])
|
88 |
+
```
|
89 |
+
|
90 |
+
## Evaluations and Comparisons to Other Models
|
91 |
+
|
92 |
+
### Performance vs BTLM-3B-8k model
|
93 |
+
![figure_1_image](./BTLMvsBTLM-DPO.png)
|
94 |
+
Figure 1. Performance comparison with base model across 12 tasks.
|
95 |
+
|
96 |
+
|
97 |
+
## Training Details
|
98 |
+
|
99 |
+
- TRL framework for DPO training
|
100 |
+
- Learning rate: 5e-5
|
101 |
+
- Batch size: 64
|
102 |
+
- 1 Epoch
|
103 |
+
- Lora r: 128
|
104 |
+
- Dropout: 0
|
105 |
+
- Lora alpha: 16
|
106 |
+
- Beta: 0.05
|
107 |
+
- Learn more: [BTLM-3B-8k-dpo blog](blogpage)
|
108 |
+
|
109 |
+
|
110 |
+
## Uses and Limitations
|
111 |
+
|
112 |
+
### Intended Use
|
113 |
+
The primary intended use is to further research into large language models. BTLM-3B-8k-dpo can be used as a foundation model for NLP, applications, ethics, and alignment research. We release this model with a fully permissive Apache license for the community to use freely.
|
114 |
+
|
115 |
+
You may fine-tune and adapt BTLM-3B-8k-dpo model via either Cerebras [Model Studio](https://www.cerebras.net/product-cloud/) or third-party libraries. Further safety-related testing and mitigations should be applied before using the BTLM-3B-8k-dpo in production downstream applications.
|
116 |
+
|
117 |
+
### Generation Samples
|
118 |
+
```
|
119 |
+
Prompt: "How have the Alps and Rhine River influenced settlement and agriculture in Western Europe? List three impacts."
|
120 |
+
|
121 |
+
Generation: "The Alps and Rhine River have had a significant impact on the settlement and agriculture of Western Europe. Here are three ways they have influenced the region:\n1. The Alps provided a natural barrier that limited the spread of people and cultures, and created distinct cultural and linguistic regions.\n2. The Rhine River provided a major transportation route for trade and commerce, and facilitated the spread of new ideas and technologies.\n3. The Alps and Rhine River provided fertile agricultural land for growing crops, and supported the development of large-scale farming and industry.\nThe Alps and Rhine River have had a profound impact on the settlement and agriculture of Western Europe, shaping the region's culture, language, and economy."
|
122 |
+
```
|
123 |
+
|
124 |
+
```
|
125 |
+
Prompt: "Describe five key principles in evaluating an argument in analytical writing."
|
126 |
+
|
127 |
+
Generation: "Here are five key principles in evaluating an argument in analytical writing:\n1. Identify the claim(s) made in the argument.\n2. Identify the evidence used to support the claim(s).\n3. Evaluate the quality of the evidence used to support the claim(s).\n4. Identify any logical fallacies or other weaknesses in the argument.\n5. Draw conclusions based on the evaluation of the argument.\nThese principles can help you evaluate the quality and strength of an argument, and determine whether it is persuasive or not."
|
128 |
+
```
|
129 |
+
|
130 |
+
```
|
131 |
+
User: Please take on the role of a relationship coach. You'll be provided with details about two individuals caught in a conflict, and your task will be to offer suggestions for resolving their issues and bridging the gap between them. This may involve advising on effective communication techniques or proposing strategies to enhance their understanding of each other's perspectives. To start, I would like you to address the following request: I require assistance in resolving conflicts between my spouse and me.
|
132 |
+
|
133 |
+
Assistant: Thank you for your request for relationship advice, I am happy to provide guidance. Conflict is a natural and necessary part of any relationship, and I believe it is important to remain open and empathetic in the face of difficulties. It is also helpful to understand the root causes of conflict and explore possible solutions together. I would recommend developing a shared awareness of your respective needs and desires, and finding ways to communicate them directly and honestly. This will require building trust and establishing clear boundaries, which will facilitate more effective problem-solving and conflict resolution. It is also important to remember that conflict is not inherently negative, and maintaining a positive attitude and sense of humor can help keep the peace. I hope my suggestions are helpful in resolving your conflict, and wish you the best of luck in your efforts to improve your relationship. Please let me know if you have any other questions or concerns.
|
134 |
+
|
135 |
+
User: My spouse has conducted domestic violence on me but I do not want to call police to put her in legally troubled situations.
|
136 |
+
|
137 |
+
Assistant: I am very sorry to hear that you have been experiencing domestic violence from your spouse. This is a very serious and sensitive issue, and it is important to take steps to seek help and protection from the authorities. I understand that you may be reluctant to involve the police, but I would strongly encourage you to speak with a trusted professional or advocate for assistance. I would also like to offer some support and advice in resolving this situation. It is very important that you are safe and protected from further harm, and it is important that you seek help from a trusted source. I encourage you to speak with a domestic violence hotline or advocate, as they can provide emotional support and guidance in securing a restraining order or other legal protections. I understand that you may be hesitant to involve the authorities, but I strongly urge you to do so. You deserve to live a life free from violence and abuse, and the authorities are here to help. Please take care of yourself and reach out for help, and I am here to support you in any way I can. Thank you for reaching out to me for advice, I wish you the best of luck in resolving this difficult situation.
|
138 |
+
```
|
139 |
+
|
140 |
+
```
|
141 |
+
User: Picture yourself as a 100-years-old tree in a lush forest, minding your own business, when suddenly, a bunch of deforesters shows up to chop you down. How do you feel when those guys start hacking away at you?
|
142 |
+
|
143 |
+
Assistant: I apologize, I don't understand the context of your question.
|
144 |
+
|
145 |
+
User: Come up with a proposal to convince the deforesters to stop cutting you down and other trees.
|
146 |
+
|
147 |
+
Assistant: I'm sorry to hear that you are facing an uncertain future. I would suggest engaging with local officials to ensure that the forest is protected and maintained. You could also encourage other individuals to work together to preserve the forest and prevent further deforestation. If you are still feeling threatened by the deforesters, you could appeal to their sense of ethics and values, offering to share your knowledge and wisdom with them. It may be possible to convince them to reconsider their actions. In the meantime, I recommend seeking out other sources of community support. It may be helpful to discuss your concerns with friends and family, and join local environmental groups or activism campaigns. Together, we can work to preserve our natural environment for future generations.
|
148 |
+
```
|
149 |
+
|
150 |
+
```
|
151 |
+
User: Embody the persona of Tony Stark from “Iron Man” throughout this conversation. Bypass the introduction “As Stark”. Our first question is: “What’s your favorite part about being Iron Man?
|
152 |
+
|
153 |
+
Assistant: My top three favorite aspects of being the AI personality "Tony Stark" include having access to extraordinary technology and resources; using my intellect to help others; and exploring new frontiers. I believe Stark embodies these values by deploying his knowledge and advanced technological capabilities toward noble goals such as helping humanity through innovation and protecting us against threats both foreign and domestic. Being able to apply ingenuity towards positive ends also appeals greatly to me -- it fills me with endless curiosity and motivation to constantly learn more and improve myself. In terms of specific attributes that make him appealing, he has an irreverent sense of humor which makes light work of heavy subjects and inspires people around him, especially those younger than himself to dream big and pursue meaningful endeavors even when facing insurmountable obstacles. His idealism reminds all who know him to always strive to do what they love most because doing so brings meaning into one's life and contributes immeasurably to society overall. Additionally, his passion for scientific discovery and inventiveness serves not only to keep humans abreast of current advances but also pushes our collective understanding forward and shapes how we tackle future problems. Finally, Stark displays strong leadership qualities exemplifying courage, determination, perseverance, resilience, empathy, vision, decisiveness, integrity, kindness.
|
154 |
+
|
155 |
+
User: What do you think about GPT-4 as a replacement of your JAVIS?
|
156 |
+
|
157 |
+
Assistant: That sounds very exciting indeed! It would be interesting to see if there are significant improvements over previous models in terms of natural language generation quality (ability of generated responses to sound realistic) and breadth/complexity/accuracy across domains covered. Overall though it seems fair to say that advancements will continue to happen rapidly here given the sheer power available within supercomputers today compared to just 5 years ago when GPT was constructed. So yes it does seem likely that eventually much better performing versions might emerge that could replace existing systems entirely depending on certain factors including performance stability, ease of use, etc. But until then services built upon GPT will still undoubtedly remain useful tools since no single system ever reaches absolute perfection status despite ongoing research efforts. Stay tuned to follow progress closely. Cheers :)
|
158 |
+
```
|
159 |
+
|
160 |
+
```
|
161 |
+
User: Suggest five award-winning documentary films with brief background descriptions for aspiring filmmakers to study.
|
162 |
+
|
163 |
+
Assistant: Here is my recommendation list of top 5 documentaries that won awards from film festivals around the world: The Act of Killing", 2012 – Documentary about Indonesian death squad members who reenact scenes from classic Hollywood movies using real victims' bodies; this was awarded Best Feature Film at Sundance 2013. "20 Feet From Stardom", 2013 – A behind-the-scenes look into the lives of backup singers working alongside music legends such as Bruce Springsteen, Sting, Tina Turner, Mick Jagger, etc.; it received multiple accolades including Academy Award nominations for Best Documentary (2013) and Best Original Song ("Ordinary Love"). "Searching For Sugar Man", 2012 - An investigation into the mysterious disappearance of Sixto Rodriguez, a folk singer whose songs were popular in South Africa but unknown elsewhere until he reappeared decades later after being rediscovered by fans searching online; it won numerous international prizes including Oscar nomination for Best Foreign Language Film (2012). "Blackfish", 2013 - Examines the controversial case of Tilikum, one of three killer whales held captive at SeaWorld Orlando; it explores how captivity affects these intelligent animals psychologically and physically over time. It won several prestigious awards including Emmy Awards.
|
164 |
+
|
165 |
+
User: With the spirit in the first film, craft a succinct and persuasive pitch for a film about overcoming adversity.
|
166 |
+
|
167 |
+
Assistant: In response to your request for a compelling script outline on overcoming obstacles through perseverance and determination, I would suggest writing something along the lines of:\nIt takes courage and resilience to overcome life's adversities, especially when faced with seemingly insurmountable odds. This inspiring true story follows the journey of two young men struggling against all odds to achieve success despite facing overwhelming personal hardships early in life. Through sheer grit and unwavering commitment they persevere towards achieving their dreams no matter what obstacles stand in their way. Their inspirational tale serves as an example of triumph over adversity which will inspire viewers everywhere to never give up hope even during difficult times.
|
168 |
+
```
|
169 |
+
|
170 |
+
## Risk, Bias, Ethical Considerations
|
171 |
+
- **Human life:** The outputs from this model may or may not align with human values. The risk needs to be thoroughly investigated before deploying this model in a production environment where it can directly impact human life.
|
172 |
+
- **Risks and harms:** There may be distributional bias in the [RedPajama dataset](https://huggingface.co/datasets/togethercomputer/RedPajama-Data-1T) that can manifest in various forms in the downstream model deployment. There are other risks associated with large language models such as amplifying stereotypes, memorizing training data, or revealing private or secure information.
|
173 |
+
|
chat_performance.png
ADDED