license: apache-2.0
Ask2Democracy project
What's baizemocracy-lora-7B-cfqa model?
This model is an open-source chat model fine-tuned with LoRA inspired by Baize project. It was trained with the Baize datasets and the ask2democracy-cfqa-salud-pension dataset, wich contains almost 4k instructions to answers questions based on a context relevant to citizen concerns and public debate in spanish. Two major experiments models was performed during the Hackathon Somos NLP 2023: A conversational style focused model and a contex focused style model. This model is focused in a more conversational way of asking questions. See Pre-proccessing dataset section. There is other model variation more focused on augmented retrieval based on context Baizemocracy-contextfocused.
Testing is a work in progress, we decide to share both model variations with community in order to invovle more people experimenting what it works better and find other possible use cases.
- Developed by:
- 🇨🇴 Jorge Henao
- 🇨🇴 David Torres
Training Parameters
- Base Model: LLaMA-7B
- Training Epoch: 1
- Batch Size: 16
- Maximum Input Length: 512
- Learning Rate: 2e-4
- LoRA Rank: 8
- Updated Modules: All Linears
Training Dataset
Ask2Democracy-cfqa-salud-pension (3,806)
Standford Alpaca (51,942)
Quora Dialogs (54,456):
StackOverflow Dialogs (57,046)
About pre-processing
Ask2Democracy-cfqa-salud-pension dataset was pre-processed in a conversational style like this:
def format_instruction_without_context(example):
example["topic"] = example['input']
input = "La conversación entre un humano y un asistente de IA."
input += "\n[|Human|] "+example['input']
input += "\n[|AI|] "+example["output"]
if len(example["topics"])>0:
topics = ", ".join(example["topics"])
input += "\n[|Human|] "+"¿En cuáles tópicos clasificarías su respuesta?"
input += "\n[|AI|] "+f"Aquí una lista de tópicos: {topics}."
example["topic"] += f" ({topics})"
example["input"] = input
return example`
def format_instruction_with_context(example):
example["topic"] = example['input']
context = example['instruction'].replace("Given the context please answer the question. Context:","")
context = ' '.join(context.strip().split())[1:-3]
input = "La conversación entre un humano y un asistente de IA."
input += "\n[|Human|] "+example['input']+f"\nPara responder la pregunta, usa el siguiente contexto:\n{context}"
input += "\n[|AI|] "+example["output"]
if len(example["topics"])>0:
topics = ", ".join(example["topics"])
input += "\n[|Human|] "+"¿En cuáles tópicos clasificarías su respuesta?"
input += "\n[|AI|] "+f"Aquí una lista de tópicos: {topics}."
example["topic"] += f" ({topics})"
example["input"] = input
return example
More details can be found in the Ask2Democracy GitHub