Edit model card

Aya-23-8b Afrimmlu Lingala

This model is a fine-tuned version of CohereForAI/aya-23-8b on Masakhane/afrimmlu.

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

NVIDIA

  • 2 x A100 PCIe
  • 24 vCPU 251 GB RAM

Training procedure

Prompt Formating

def formatting_prompts_func(example):
    output_texts = []
    for i in range(len(example['choices'])):
        text = f"<|START_OF_TURN_TOKEN|><|USER_TOKEN|>Question : {example['question'][i]}, Choices : {example['choices'][i]}<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|>{example['answer'][i]}"
        output_texts.append(text)
    return output_texts

Model Architecture

PeftModelForCausalLM(
  (base_model): LoraModel(
    (model): CohereForCausalLM(
      (model): CohereModel(
        (embed_tokens): Embedding(256000, 4096, padding_idx=0)
        (layers): ModuleList(
          (0-31): 32 x CohereDecoderLayer(
            (self_attn): CohereAttention(
              (q_proj): lora.Linear4bit(
                (base_layer): Linear4bit(in_features=4096, out_features=4096, bias=False)
                (lora_dropout): ModuleDict(
                  (default): Identity()
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=4096, out_features=32, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=32, out_features=4096, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
              )
              (k_proj): lora.Linear4bit(
                (base_layer): Linear4bit(in_features=4096, out_features=1024, bias=False)
                (lora_dropout): ModuleDict(
                  (default): Identity()
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=4096, out_features=32, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=32, out_features=1024, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
              )
              (v_proj): lora.Linear4bit(
                (base_layer): Linear4bit(in_features=4096, out_features=1024, bias=False)
                (lora_dropout): ModuleDict(
                  (default): Identity()
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=4096, out_features=32, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=32, out_features=1024, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
              )
              (o_proj): lora.Linear4bit(
                (base_layer): Linear4bit(in_features=4096, out_features=4096, bias=False)
                (lora_dropout): ModuleDict(
                  (default): Identity()
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=4096, out_features=32, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=32, out_features=4096, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
              )
              (rotary_emb): CohereRotaryEmbedding()
            )
            (mlp): CohereMLP(
              (gate_proj): Linear4bit(in_features=4096, out_features=14336, bias=False)
              (up_proj): Linear4bit(in_features=4096, out_features=14336, bias=False)
              (down_proj): Linear4bit(in_features=14336, out_features=4096, bias=False)
              (act_fn): SiLU()
            )
            (input_layernorm): CohereLayerNorm()
          )
        )
        (norm): CohereLayerNorm()
      )
      (lm_head): Linear(in_features=4096, out_features=256000, bias=False)
    )
  )
)

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0002
  • train_batch_size: 2
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 20

Training results

Inferennce

quantization_config = None
if QUANTIZE_4BIT:
  quantization_config = BitsAndBytesConfig(
      load_in_4bit=True,
      bnb_4bit_quant_type="nf4",
      bnb_4bit_use_double_quant=True,
      bnb_4bit_compute_dtype=torch.bfloat16,
  )

attn_implementation = None
if USE_FLASH_ATTENTION:
  attn_implementation="flash_attention_2"

loaded_model = AutoModelForCausalLM.from_pretrained(
          BASE_MODEL_NAME,
          quantization_config=quantization_config,
          attn_implementation=attn_implementation,
          torch_dtype=torch.bfloat16,
          device_map="auto",
        )
tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL_NAME)
loaded_model.load_adapter("aya-23-8b-afrimmlu-lin")


prompts = [
    """Question: 4 na 3 Ezali boni ?
    Choices : [12, 4, 32, 21]
    """
]

generations = generate_aya_23(prompts, loaded_model)

for p, g in zip(prompts, generations):
  print(
      "PROMPT", p ,"RESPONSE", g, "\n", sep="\n"
    )
PROMPT
Question: 4 na 3 Ezali boni ?
    Choices : [12, 4, 32, 21]
    
RESPONSE
Boni ya 4 ezali 12.

Framework versions

  • PEFT 0.11.1
  • Transformers 4.41.2
  • Pytorch 2.1.0+cu118
  • Datasets 2.19.2
  • Tokenizers 0.19.1
Downloads last month
8
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.
Invalid base_model specified in model card metadata. Needs to be a model id from hf.co/models.

Dataset used to train Svngoku/aya-23-8b-afrimmlu-lin