Edit model card

This model aims to detect and analyze casual arguments.

Model template:

<s>[INST] {prompt}
 [/INST]

Example:

`<`s`>`[INST] Analize the following argument, identifying premises, conclusion, type of argument, and argument validity: 
If officer smith found a broken window at the crime scene then the arson occurred on elm street, and officer smith found a broken window at the crime scene, hence the arson occurred on elm street.
 [/INST] Premise 1: If officer smith found a broken window at the crime scene then the arson occurred on elm street Premise 2: Officer smith found a broken window at the crime scene Conclusion: The arson occurred on Elm Street Type of argument: modus ponen Validity: True `<`/s`>`

It was trained on my dataset cris177/Arguments (https://huggingface.co/datasets/cris177/Arguments)

Fine-Tuning a Large Language Model to Learn Arguments

Fine-tuning a large language model (LLM) to understand and generate logical arguments is a complex task. This article outlines the steps taken to fine-tune the LLaMA2-7B model, which included generating a dataset of arguments and evaluating the model's performance using a variety of benchmarks. Below are the detailed steps involved in this process.

Step 1: Generate a List of Statements with Each Respective Negation

The first step in creating a dataset for argument training involved generating a list of statements and their respective negations. This was accomplished using existing large language models (LLMs). By prompting the LLMs to produce a diverse set of statements and their negations, a foundational dataset was created. For instance:

  • Statement: "The sky is blue."

    • Negation: "The sky is not blue."
  • Statement: "Cats are mammals."

    • Negation: "Cats are not mammals."

Step 2: Generate Modus Ponens and Modus Tollens Arguments

Using combinations of the generated statements, we created lists of modus ponens and modus tollens arguments.

  • Modus Ponens:

    • If P, then Q.
    • P.
    • Therefore, Q.

    Example:

    • If it rains, the ground will be wet.
    • It is raining.
    • Therefore, the ground is wet.
  • Modus Tollens:

    • If P, then Q.
    • Not Q.
    • Therefore, not P.

    Example:

    • If it rains, the ground will be wet.
    • The ground is not wet.
    • Therefore, it is not raining.

Step 3: Generate Dataset of Arguments with Labels

Next, we created a comprehensive dataset of arguments, labeling each with its premises, conclusion, argument type (modus ponens or modus tollens), and validity. This structured dataset provided a rich resource for fine-tuning the LLaMA2-7B model. An example of a labeled data point is:

  • Premises: "If it rains, the ground will be wet.", "It is raining."
  • Conclusion: "The ground is wet."
  • Argument Type: Modus Ponens
  • Validity: Valid

Step 4: Fine-Tune LLaMA2-7B on the Dataset

With the dataset prepared, the next step was to fine-tune the LLaMA2-7B model. The fine-tuning process involved training the model on the dataset, adjusting its parameters to improve its understanding and generation of logical arguments. This process included multiple training epochs and evaluations to ensure the model was learning effectively.

Step 5: Evaluating through Open-LLM-Leaderboard

Finally, the fine-tuned model was evaluated using the Open-LLM-Leaderboard, which benchmarks LLMs through various tests:

  • AI2 Reasoning Challenge (25-shot): A set of grade-school science questions.
  • HellaSwag (10-shot): A test of commonsense inference, challenging for state-of-the-art models.
  • MMLU (5-shot): Measures multitask accuracy across 57 tasks, including mathematics, history, computer science, and law.
  • TruthfulQA (0-shot): Evaluates the model's tendency to reproduce common falsehoods found online. Although termed 0-shot, it includes six Q/A pairs for context.
  • Winogrande (5-shot): A difficult benchmark for commonsense reasoning.
  • GSM8k (5-shot): Tests the model's ability to solve multi-step mathematical word problems.

Evaluation Results

TBD

Downloads last month
7
Safetensors
Model size
6.74B params
Tensor type
F32
·
FP16
·
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train cris177/llama-2-7b-Arguments