Model Card for Llama-3.1-Argunaut-1-8B-SPIN
This model is a fine-tuned version of DebateLabKIT/Llama-3.1-Argunaut-1-8B-SFT. It has been trained using TRL and vLLM. Checkpoints are tagged.
Quick start
from transformers import pipeline
question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
generator = pipeline("text-generation", model="DebateLabKIT/Llama-3.1-Argunaut-1-8B-SPIN", device="cuda")
output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
print(output["generated_text"])
Training procedure
This model was trained with Self-Play Fine-Tuning (SPIN), a method introduced in Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models.
More details about the training procedure can be found in the blog post.
Framework versions
- TRL: 0.14.0
- Transformers: 4.46.3
- Pytorch: 2.4.1
- Datasets: 3.1.0
- Tokenizers: 0.20.3
Evaluation
Chat Experience
👤 You
Hi, are you familiar with Argdown?
🤖 Assistant
Yes, I'm familiar with Argdown (Argument Markup Language). It's a syntax for representing and structuring arguments using a markdown-like approach. With Argdown, you can clearly define claims, arguments, premises, conclusions, and other components of an argumentative text in a way that is both human-readable and machine-processable. If you have some content you'd like to be organized or structured using Argdown, feel free to provide it!
👤 You
Can you please map the following text?
School uniforms deter crime and increase student safety.
In Long Beach, California, after two years of a district-wide K-8 mandatory uniform policy, reports of assault and battery in the district’s schools decreased by 34%, assault with a deadly weapon dropped by 50%, fighting incidents went down by 51%, sex offenses were cut by 74%, robbery dropped by 65%, possession of weapons (or weapon “look-alikes”) decreased by 52%, possession of drugs went down by 69%, and vandalism was lowered by 18%. [64]
One year after Sparks Middle School in Nevada instituted a uniform policy, school police data showed a 63% drop in police log reports, and decreases were also noted in gang activity, student fights, graffiti, property damage, and battery. A peer-reviewed study found that schools with uniform policies had 12% fewer firearm-related incidents and 15% fewer drug-related incidents than schools without uniforms. [25][69]
School uniforms also prevent students from concealing weapons under baggy clothing, make it easier to keep track of students on field trips, and make intruders on campus more visible. Frank Quatrone, superintendent in the Lodi school district of New Jersey, states, “When you have students dressed alike, you make them safer. If someone were to come into a building, the intruder could easily be recognized.” [6][38]
Further, school uniforms create a level playing field among students, reducing peer pressure and bullying. When all students are dressed alike, competition between students over clothing choices and the teasing of those who are dressed in less expensive or less fashionable outfits can be eliminated. Research by the Schoolwear Association found that 83% of teachers thought “a good school uniform…could prevent bullying based on appearance or economic background.” Arminta Jacobson, founder and director of the Center for Parent Education at the University of North Texas, states that uniforms put “all kids on the same playing field in terms of their appearance. I think it probably gives them a sense of belonging and a feeling of being socially accepted.” [5][91]
🤖 Assistant
Let's create an Argdown argument map to structure this text:
School uniforms deter crime and increase student safety.
<+ One year after Sparks Middle School in Nevada instituted a uniform policy, school police data showed a 63% drop in police log reports, and decreases were also noted in gang activity, student fights, graffiti, property damage, and battery. A peer-reviewed study found that schools with uniform policies had 12% fewer firearm-related incidents and 15% fewer drug-related incidents than schools without uniforms.
<+ School uniforms create a level playing field among students, reducing peer pressure and bullying. When all students are dressed alike, competition between students over clothing choices and the teasing of those who are dressed in less expensive or less fashionable outfits can be eliminated.
<+ Research by the Schoolwear Association found that 83% of teachers thought “a good school uniform…could prevent bullying based on appearance or economic background.”
<+ In Long Beach, California, after two years of a district-wide K-8 mandatory uniform policy, reports of assault and battery in the district’s schools decreased by 34%, assault with a deadly weapon dropped by 50%, fighting incidents went down by 51%, sex offenses were cut by 74%, robbery dropped by 65%, possession of weapons (or weapon “look-alikes”) decreased by 52%, possession of drugs went down by 69%, and vandalism was lowered by 18%.
<+ School uniforms prevent students from concealing weapons under baggy clothing, make it easier to keep track of students on field trips, and make intruders on campus more visible.
<+ Frank Quatrone, superintendent in the Lodi school district of New Jersey, states, “When you have students dressed alike, you make them safer. If someone were to come into a building, the intruder could easily be recognized.”
<+ Arminta Jacobson, founder and director of the Center for Parent Education at the University of North Texas, states that uniforms put “all kids on the same playing field in terms of their appearance. I think it probably gives them a sense of belonging and a feeling of being socially accepted.”
👤 You
Please summarize, in your own words, the main argument presented in the text. No more than 2-3 sentemces, please.
🤖 Assistant
The main argument is that school uniforms lead to a decrease in various types of incidents and crimes at schools, making students safer due to increased visibility and reduced peer pressure. This conclusion is supported by data from Sparks Middle School and Long Beach, California, as well as expert opinions.
👤 You
Let's focus on this argument. Just for the protocol, what's the conclusion?
🤖 Assistant
The text ultimately argues that: School uniforms deter crime and increase student safety.
👤 You
Can you reconstruct this argument as premise conclusion structure, using argdown syntax?
🤖 Assistant
(1) By instituting a uniform policy at Sparks Middle School, the school's police data showed a 63% drop in police log reports, along with decreases in gang activity, student fights, graffiti, property damage, and battery.
(2) A peer-reviewed study found that schools with uniform policies had 12% fewer firearm-related incidents and 15% fewer drug-related incidents than schools without uniforms.
(3) Implementing a mandatory uniform policy at Long Beach's K-8 schools led to significant decreases in various types of incidents and crimes, such as assault and battery, possession of weapons or drugs, robbery, and vandalism.
-- with ?? from (1) (2) --
(4) Schools with uniform policies tend to have fewer incidents related to weapons and drugs compared to those without uniforms.
(5) School uniforms prevent students from concealing weapons under baggy clothing.
(6) Implementing a uniform policy makes it easier to keep track of students on field trips.
(7) A uniform policy makes intruders on campus more visible.
-- with ?? from (5) (6) (7) --
(8) School uniforms are an effective measure for maintaining student safety in various scenarios, such as preventing the concealment of weapons and ensuring easier identification of students.
(9) When all students are dressed alike, competition between students over clothing choices can be eliminated.
-- with ?? from (9) --
(10) The elimination of competition due to uniform policies reduces peer pressure and bullying among students.
-- with ?? from (4) (8) (10) --
(11) School uniforms deter crime and increase student safety.
👤 You
That's way too comlicated. Just start with 2 or 3 premises.
🤖 Assistant
Let me simplify the argument:
Schools that implement uniform policies tend to have fewer incidents related to weapons and drugs compared to those without uniforms.
Implementing a mandatory uniform policy at Long Beach's K-8 schools led to significant decreases in various types of incidents and crimes, such as assault and battery, possession of weapons or drugs, robbery, and vandalism.
When all students are dressed alike, competition between students over clothing choices can be eliminated.
These premises lead to the conclusion that school uniforms deter crime and increase student safety.
👤 You
Render this in argdown, please.
🤖 Assistant
(1) Schools that implement uniform policies tend to have fewer incidents related to weapons and drugs compared to those without uniforms.
(2) Implementing a mandatory uniform policy at Long Beach's K-8 schools led to significant decreases in various types of incidents and crimes, such as assault and battery, possession of weapons or drugs, robbery, and vandalism.
(3) When all students are dressed alike, competition between students over clothing choices can be eliminated.
-----
(4) School uniforms deter crime and increase student safety.
👤 You
Do these three premises work together. Or do we have three distinct arguments?
🤖 Assistant
The given premises don't directly connect to form a single argument. They represent separate premises that could support different conclusions or be parts of various arguments.
We can see that:
Premise (1) could be part of an argument about the effectiveness of uniform policies in reducing incidents related to weapons and drugs.
Premise (2) supports the idea that implementing uniform policies leads to a decrease in various types of incidents and crimes.
Premise (3) suggests that eliminating competition over clothing choices can reduce peer pressure and bullying.
To combine these into a single argument, we'd need additional premises or intermediary conclusions to link them together. For now, they are distinct premises that could be used separately in different arguments.
👤 You
Let's focus on premise (1) and drop (2) and (3) from the argument. Please render the resulting 1-primse-argument.
🤖 Assistant
[Conclusion 1]: Schools with uniform policies tend to have fewer incidents related to weapons and drugs compared to those without uniforms.
<+ <Subargument 1>
<+ [Premise]: Schools that implement uniform policies tend to have fewer incidents related to weapons and drugs compared to those without uniforms.
👤 You
Fine, and how does this look in standard form (premise-conclusion structure)?
🤖 Assistant
(1) Schools that implement uniform policies tend to have fewer incidents related to weapons and drugs compared to those without uniforms.
-- with ?? from (1) --
(2) Schools with uniform policies tend to have fewer incidents related to weapons and drugs compared to those without uniforms.
This simplified argument focuses on the premise regarding schools with uniform policies having fewer incidents related to weapons and drugs, leading directly to the conclusion.
Metrics
Llama-3.1-8B-Instruct | Argunaut-1-8B-SFT | Argunaut-1-8B-SPIN | |
---|---|---|---|
⬇️ Argdown Bench | |||
pass@1 | 80.8 | 98.9 | 79.5 |
pass@5 | 98.9 | 99.8 | 94.5 |
graph sim | 8.4 | 65.5 | 34.3 |
isomorphic | 1.3 | 44.0 | 17.1 |
🤗 HF Leaderboard | |||
MMLU pro | 37.6 | 34.5 | 35.9 |
MUSR | 40.1 | 41.6 | 41.6 |
GPQA Diamond | 32.8 | 30.8 | 31.8 |
GPQA Main | 28.5 | 32.1 | 30.8 |
MATH | 12.5 | 9.1 | 11.1 |
BBH | 54.7 | 48.2 | 49.8 |
⛓️ COT Leaderboard | |||
LogiQA | 5.9 | 1.4 | 5.8 |
LogiQA2 | 15.5 | 0.8 | 9.2 |
LSAT-ar | 11.7 | 3.0 | 6.5 |
LSAT-lr | 20.8 | 3.9 | 12.9 |
LSAT-rc | 27.5 | 13.8 | 21.2 |
Citations
Cite SPIN as:
@misc{chen2024selfplayfinetuningconvertsweak,
title={Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models},
author={Zixiang Chen and Yihe Deng and Huizhuo Yuan and Kaixuan Ji and Quanquan Gu},
year={2024},
eprint={2401.01335},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2401.01335},
}
Cite TRL as:
@misc{vonwerra2022trl,
title = {{TRL: Transformer Reinforcement Learning}},
author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallouédec},
year = 2020,
journal = {GitHub repository},
publisher = {GitHub},
howpublished = {\url{https://github.com/huggingface/trl}}
}
- Downloads last month
- 73
Model tree for DebateLabKIT/Llama-3.1-Argunaut-1-8B-SPIN
Base model
meta-llama/Llama-3.1-8B