jorge-henao's picture
Update README.md
f662a9f
|
raw
history blame
3.68 kB
metadata
license: apache-2.0

Ask2Democracy project


What's baizemocracy-lora-7B-cfqa-conv model?

This model is an open-source chat model fine-tuned with LoRA inspired by Baize project. It was trained with the Baize datasets and the ask2democracy-cfqa-salud-pension dataset, wich contains almost 4k instructions to answers questions based on a context relevant to citizen concerns and public debate in spanish. Two major experiments models was performed during the Hackathon Somos NLP 2023:

  • A conversational style focused model
  • A contex focused style model.

This model is focused in a more conversational way of asking questions. See Pre-proccessing dataset section. There is other model variation more focused on augmented retrieval based on context Baizemocracy-contextfocused.

Testing is a work in progress, we decide to share both model variations with community in order to invovle more people experimenting what it works better and find other possible use cases.

Training Parameters

  • Base Model: LLaMA-7B
  • Training Epoch: 1
  • Batch Size: 16
  • Maximum Input Length: 512
  • Learning Rate: 2e-4
  • LoRA Rank: 8
  • Updated Modules: All Linears

Training Dataset

About pre-processing

Ask2Democracy-cfqa-salud-pension dataset was pre-processed in a conversational style in two variations like this:


def format_instruction_without_context(example):
  example["topic"] = example['input']
  input = "La conversaci贸n entre un humano y un asistente de IA."
  input += "\n[|Human|] "+example['input']
  input += "\n[|AI|] "+example["output"]
  if len(example["topics"])>0:
    topics = ", ".join(example["topics"])
    input += "\n[|Human|] "+"驴En cu谩les t贸picos clasificar铆as su respuesta?"
    input += "\n[|AI|] "+f"Aqu铆 una lista de t贸picos: {topics}."
    example["topic"] += f" ({topics})"
  example["input"] = input
  return example`

def format_instruction_with_context(example):
  example["topic"] = example['input']
  context = example['instruction'].replace("Given the context please answer the question. Context:","")
  context = ' '.join(context.strip().split())[1:-3]
  input = "La conversaci贸n entre un humano y un asistente de IA."
  input += "\n[|Human|] "+example['input']+f"\nPara responder la pregunta, usa el siguiente contexto:\n{context}"
  input += "\n[|AI|] "+example["output"]
  if len(example["topics"])>0:
    topics = ", ".join(example["topics"])
    input += "\n[|Human|] "+"驴En cu谩les t贸picos clasificar铆as su respuesta?"
    input += "\n[|AI|] "+f"Aqu铆 una lista de t贸picos: {topics}."
    example["topic"] += f" ({topics})"
  example["input"] = input
  return example

More details can be found in the Ask2Democracy GitHub