--- license: apache-2.0 ---

Ask2Democracy project


## What's baizemocracy-lora-7B-cfqa-conv model? This model is an open-source chat model fine-tuned with [LoRA](https://github.com/microsoft/LoRA) inspired by [Baize project](https://github.com/project-baize/baize-chatbot/tree/main/). It was trained with the Baize datasets and the ask2democracy-cfqa-salud-pension dataset, wich contains almost 4k instructions to answers questions based on a context relevant to citizen concerns and public debate in spanish. Two major experiments models was performed during the Hackathon Somos NLP 2023: - A conversational style focused model - A contex focused style model. This model is focused in a more conversational way of asking questions. See Pre-proccessing dataset section. There is other model variation more focused on augmented retrieval based on context [Baizemocracy-contextfocused](https://github.com/project-baize/baize-chatbot/tree/main/). Testing is a work in progress, we decide to share both model variations with community in order to invovle more people experimenting what it works better and find other possible use cases. - **Developed by:** - 馃嚚馃嚧 [Jorge Henao](https://huggingface.co/jorge-henao) - 馃嚚馃嚧 [David Torres ](https://github.com/datorresb) ## Training Parameters - Base Model: [LLaMA-7B](https://arxiv.org/pdf/2302.13971.pdf) - Training Epoch: 1 - Batch Size: 16 - Maximum Input Length: 512 - Learning Rate: 2e-4 - LoRA Rank: 8 - Updated Modules: All Linears ## Training Dataset - [Ask2Democracy-cfqa-salud-pension](https://huggingface.co/datasets/hackathon-somos-nlp-2023/ask2democracy-cfqa-salud-pension) (3,806) - [Standford Alpaca](https://github.com/tatsu-lab/stanford_alpaca) (51,942) - [Quora Dialogs](https://github.com/project-baize/baize) (54,456): - [StackOverflow Dialogs](https://github.com/project-baize/baize) (57,046) - [Alpacaca chat Dialogs](https://github.com/project-baize/baize) - [Medical chat Dialogs](https://github.com/project-baize/baize) ## About pre-processing Ask2Democracy-cfqa-salud-pension dataset was pre-processed in a conversational style in two variations like this: ```python def format_instruction_without_context(example): example["topic"] = example['input'] input = "La conversaci贸n entre un humano y un asistente de IA." input += "\n[|Human|] "+example['input'] input += "\n[|AI|] "+example["output"] if len(example["topics"])>0: topics = ", ".join(example["topics"]) input += "\n[|Human|] "+"驴En cu谩les t贸picos clasificar铆as su respuesta?" input += "\n[|AI|] "+f"Aqu铆 una lista de t贸picos: {topics}." example["topic"] += f" ({topics})" example["input"] = input return example` def format_instruction_with_context(example): example["topic"] = example['input'] context = example['instruction'].replace("Given the context please answer the question. Context:","") context = ' '.join(context.strip().split())[1:-3] input = "La conversaci贸n entre un humano y un asistente de IA." input += "\n[|Human|] "+example['input']+f"\nPara responder la pregunta, usa el siguiente contexto:\n{context}" input += "\n[|AI|] "+example["output"] if len(example["topics"])>0: topics = ", ".join(example["topics"]) input += "\n[|Human|] "+"驴En cu谩les t贸picos clasificar铆as su respuesta?" input += "\n[|AI|] "+f"Aqu铆 una lista de t贸picos: {topics}." example["topic"] += f" ({topics})" example["input"] = input return example ``` More details can be found in the Ask2Democracy [GitHub](https://github.com/jorge-henao/ask2democracy)