license: apache-2.0
Ask2Democracy project This model was trained during the Hackathon Somos NLP 2023 and we are working to integrate in the AskDemocracy space demo.[AskDemocracy space demo](https://huggingface.co/spaces/jorge-henao/ask2democracycol)
About Ask2Democracy project
This model was trained during the 2023 Somos NLP Hackathon and it's part of the Ask2Democracy project. We encountered performance constraints due to the model's size, which resulted in limited hardware capacity (approximately 70 seconds per inference using a GPU). As a result, we are currently working on optimizing ways to integrate the model into the AskDemocracy space demo
What's baizemocracy-lora-7B-cfqa-conv model?
This model is an open-source chat model fine-tuned with LoRA inspired by Baize project. It was trained with the Baize datasets and the ask2democracy-cfqa-salud-pension dataset, wich contains almost 4k instructions to answers questions based on a context relevant to citizen concerns and public debate in spanish.
Two model variations was trained during the Hackathon Somos NLP 2023:
- A conversational style focused model
- A generative context focused model
This model variation is focused in a more conversational way of asking questions. See Pre-proccessing dataset section. There is other model variation more focused on source based augmented retrieval generation Baizemocracy-RAGfocused.
Testing is a work in progress, we decide to share both model variations with community in order to invovle more people experimenting what it works better and find other possible use cases.
- Developed by:
- 🇨🇴 Jorge Henao
- 🇨🇴 David Torres
Training Parameters
- Base Model: LLaMA-7B
- Training Epoch: 1
- Batch Size: 16
- Maximum Input Length: 512
- Learning Rate: 2e-4
- LoRA Rank: 8
- Updated Modules: All Linears
Training Dataset
- Ask2Democracy-cfqa-salud-pension (3,806)
- Standford Alpaca (51,942)
- Quora Dialogs (54,456):
- StackOverflow Dialogs (57,046)
- Alpacaca chat Dialogs
- Medical chat Dialogs
About pre-processing
Ask2Democracy-cfqa-salud-pension dataset was pre-processed in a conversational style in two variations like this:
def format_instruction_without_context(example):
example["topic"] = example['input']
input = "La conversación entre un humano y un asistente de IA."
input += "\n[|Human|] "+example['input']
input += "\n[|AI|] "+example["output"]
if len(example["topics"])>0:
topics = ", ".join(example["topics"])
input += "\n[|Human|] "+"¿En cuáles tópicos clasificarías su respuesta?"
input += "\n[|AI|] "+f"Aquí una lista de tópicos: {topics}."
example["topic"] += f" ({topics})"
example["input"] = input
return example`
def format_instruction_with_context(example):
example["topic"] = example['input']
context = example['instruction'].replace("Given the context please answer the question. Context:","")
context = ' '.join(context.strip().split())[1:-3]
input = "La conversación entre un humano y un asistente de IA."
input += "\n[|Human|] "+example['input']+f"\nPara responder la pregunta, usa el siguiente contexto:\n{context}"
input += "\n[|AI|] "+example["output"]
if len(example["topics"])>0:
topics = ", ".join(example["topics"])
input += "\n[|Human|] "+"¿En cuáles tópicos clasificarías su respuesta?"
input += "\n[|AI|] "+f"Aquí una lista de tópicos: {topics}."
example["topic"] += f" ({topics})"
example["input"] = input
return example
More details can be found in the Ask2Democracy GitHub