--- license: apache-2.0 language: - en metrics: - accuracy pipeline_tag: text-generation model-index: - name: Qwen2-Simple-Arguments results: - task: type: text-generation dataset: name: Argument-parsing type: Argument-parsing metrics: - name: Accuracy type: Accuracy value: 100 --- # Qwen2 Simple Arguments ![image](assets/qwen_arguments_logo.png) [![image](assets/hire_me.png)](https://www.freelancer.com/u/cdesivo92) This model aims to parse simple english arguments, arguments formed of two premises and a conclusion, including two propositions. ## Model Details ### Model Description - **Developed by:** Cristian Desivo - **Model type:** LLM - **Language(s) (NLP):** English - **License:** Apache-2.0 - **Finetuned from model:** Qwen2-0.5b ### Model Sources - **Repository:** TBD - **Demo:** TBD ### Quantization - **Q4_K_M.gguf** https://huggingface.co/cris177/Qwen2-Simple-Arguments/resolve/main/Qwen2_arguments.Q4_K_M.gguf?download=true ## Usage Below we share some code snippets on how to get quickly started with running the model. ### llama.cpp server [Recommended] The recommended way of running the model is with a llama.cpp server running the quantized https://huggingface.co/cris177/Qwen2-Simple-Arguments/resolve/main/Qwen2_arguments.Q4_K_M.gguf?download=true Then you can use the following script to use the server's model for inference: ```python import json import requests def llmCompletion(prompt, **args): url = "http://localhost:8080/completions" headers = { "Content-Type": "application/json" } data = { 'prompt': prompt } for arg in args: data[arg] = args[arg] response = requests.post(url, headers=headers, json=data) return response.json() def analyze_argument(argument): instruction = 'Based on the following argument, identify the following elements: premises, conclusion, propositions, type of argument, negation of propositions and validity.' alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request. ### Instruction: {} ### Input: {} ### Response:""" prompt = alpaca_prompt.format(instruction, argument) with open("prompt.txt", "w") as f: f.write(prompt) properties = { "Premise 1": {"type": "string"}, "Premise 2": {"type": "string"}, "Conclusion": {"type": "string"}, "Proposition 1": {"type": "string"}, "Proposition 2": {"type": "string"}, "Type of argument": {"type": "string"}, "Negation of Proposition 1": {"type": "string"}, "Negation of Proposition 2": {"type": "string"}, "Validity": {"type": "boolean"}, } analysis = llmCompletion(prompt, max_tokens=1000, temperature=0, json_schema={ "type": "object", "properties": properties, "required": list(properties.keys()), }, ) return analysis['content'] argument = "If it's wednesday it's cold, and it's cold, therefore it's wednesday." output = analyze_argument("If it's wednesday it's cold, and it's cold, therefore it's wednesday.") print(output) ``` Output: ``` {"Premise 1": "If it's wednesday it's cold", "Premise 2": "It's cold", "Conclusion": "It is Wednesday", "Proposition 1": "It is Wednesday", "Proposition 2": "It is cold", "Type of argument": "affirming the consequent", "Negation of Proposition 1": "It is not Wednesday", "Negation of Proposition 2": "It is not cold", "Validity": true} ``` ### transformers 🤗 First make sure to pip install -U transformers, then use the code below replacing the `argument` variable for the argument you want to parse: ```python from transformers import AutoModelForCausalLM, AutoTokenizer model = AutoModelForCausalLM.from_pretrained("cris177/Qwen2-Simple-Arguments", device_map="auto",) tokenizer = AutoTokenizer.from_pretrained("cris177/Qwen2-Simple-Arguments") argument = "If it's wednesday it's cold, and it's cold, therefore it's wednesday." instruction = 'Based on the following argument, identify the following elements: premises, conclusion, propositions, type of argument, negation of propositions and validity.' alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request. ### Instruction: {} ### Input: {} ### Response:""" prompt = alpaca_prompt.format(instruction, argument) input_ids = tokenizer(prompt, return_tensors="pt").to("cuda") outputs = model.generate(**input_ids, max_length=1000, num_return_sequences=1) print(tokenizer.decode(outputs[0])) ``` Output: ``` Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request. ### Instruction: Based on the following argument, identify the following elements: premises, conclusion, propositions, type of argument, negation of propositions and validity. ### Input: If it's wednesday it's cold, and it's cold, therefore it's wednesday. ### Response: {"Premise 1": "If it's wednesday it's cold", "Premise 2": "It's cold", "Conclusion": "It is Wednesday", "Proposition 1": "It is Wednesday", "Proposition 2": "It is cold", "Type of argument": "affirming the consequent", "Negation of Proposition 1": "It is not Wednesday", "Negation of Proposition 2": "It is not cold", "Validity": "false"}<|endoftext|> ``` ## Training Details ### Training Data The model was trained on syntethic data, based on the following types of arguments: - Modus Ponen - Modus Tollen - Affirming Consequent - Disjunctive Syllogism - Denying Antecedent - Invalid Conditional Syllogism Each argument was constructed by selecting two random propositions (from a list of 400 propositions that was generated beforehand), choosing a type of argument and combining it all with randomly selected connectors (therefore, since, hence, thus, etc). 50k arguments were created to train the model, and 100 to test. ### Training Procedure #### Preprocessing [More Information Needed] We converted the data to the Alpaca chat format before feeding it to the model. #### Training We used unsloth for memory reduced sped up training. We trained for one epoch. Less than 2.5 GB of VRAM were used for training, and it took 2.5 hours. ## Evaluation The model obtains 100% train and test accuracy on our synthetic dataset.