--- language: - en - fr - ln library_name: peft tags: - trl - sft - generated_from_trainer base_model: CohereForAI/aya-23-8b datasets: - masakhane/afrimmlu model-index: - name: aya-23-8b-afrimmlu-lin results: [] pipeline_tag: text-generation license: apache-2.0 --- # Aya-23-8b Afrimmlu Lingala This model is a fine-tuned version of [CohereForAI/aya-23-8b](https://huggingface.co/CohereForAI/aya-23-8b) on [Masakhane/afrimmlu](https://huggingface.co/datasets/masakhane/afrimmlu/). ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data ### NVIDIA - 2 x A100 PCIe - 24 vCPU 251 GB RAM ## Training procedure ## Prompt Formating ```py def formatting_prompts_func(example): output_texts = [] for i in range(len(example['choices'])): text = f"<|START_OF_TURN_TOKEN|><|USER_TOKEN|>Question : {example['question'][i]}, Choices : {example['choices'][i]}<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|>{example['answer'][i]}" output_texts.append(text) return output_texts ``` ## Model Architecture ```txt PeftModelForCausalLM( (base_model): LoraModel( (model): CohereForCausalLM( (model): CohereModel( (embed_tokens): Embedding(256000, 4096, padding_idx=0) (layers): ModuleList( (0-31): 32 x CohereDecoderLayer( (self_attn): CohereAttention( (q_proj): lora.Linear4bit( (base_layer): Linear4bit(in_features=4096, out_features=4096, bias=False) (lora_dropout): ModuleDict( (default): Identity() ) (lora_A): ModuleDict( (default): Linear(in_features=4096, out_features=32, bias=False) ) (lora_B): ModuleDict( (default): Linear(in_features=32, out_features=4096, bias=False) ) (lora_embedding_A): ParameterDict() (lora_embedding_B): ParameterDict() ) (k_proj): lora.Linear4bit( (base_layer): Linear4bit(in_features=4096, out_features=1024, bias=False) (lora_dropout): ModuleDict( (default): Identity() ) (lora_A): ModuleDict( (default): Linear(in_features=4096, out_features=32, bias=False) ) (lora_B): ModuleDict( (default): Linear(in_features=32, out_features=1024, bias=False) ) (lora_embedding_A): ParameterDict() (lora_embedding_B): ParameterDict() ) (v_proj): lora.Linear4bit( (base_layer): Linear4bit(in_features=4096, out_features=1024, bias=False) (lora_dropout): ModuleDict( (default): Identity() ) (lora_A): ModuleDict( (default): Linear(in_features=4096, out_features=32, bias=False) ) (lora_B): ModuleDict( (default): Linear(in_features=32, out_features=1024, bias=False) ) (lora_embedding_A): ParameterDict() (lora_embedding_B): ParameterDict() ) (o_proj): lora.Linear4bit( (base_layer): Linear4bit(in_features=4096, out_features=4096, bias=False) (lora_dropout): ModuleDict( (default): Identity() ) (lora_A): ModuleDict( (default): Linear(in_features=4096, out_features=32, bias=False) ) (lora_B): ModuleDict( (default): Linear(in_features=32, out_features=4096, bias=False) ) (lora_embedding_A): ParameterDict() (lora_embedding_B): ParameterDict() ) (rotary_emb): CohereRotaryEmbedding() ) (mlp): CohereMLP( (gate_proj): Linear4bit(in_features=4096, out_features=14336, bias=False) (up_proj): Linear4bit(in_features=4096, out_features=14336, bias=False) (down_proj): Linear4bit(in_features=14336, out_features=4096, bias=False) (act_fn): SiLU() ) (input_layernorm): CohereLayerNorm() ) ) (norm): CohereLayerNorm() ) (lm_head): Linear(in_features=4096, out_features=256000, bias=False) ) ) ) ``` ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 0.0002 - train_batch_size: 2 - eval_batch_size: 8 - seed: 42 - gradient_accumulation_steps: 16 - total_train_batch_size: 32 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: constant - lr_scheduler_warmup_ratio: 0.05 - num_epochs: 20 ### Training results ## Inferennce ```py quantization_config = None if QUANTIZE_4BIT: quantization_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_use_double_quant=True, bnb_4bit_compute_dtype=torch.bfloat16, ) attn_implementation = None if USE_FLASH_ATTENTION: attn_implementation="flash_attention_2" loaded_model = AutoModelForCausalLM.from_pretrained( BASE_MODEL_NAME, quantization_config=quantization_config, attn_implementation=attn_implementation, torch_dtype=torch.bfloat16, device_map="auto", ) tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL_NAME) loaded_model.load_adapter("aya-23-8b-afrimmlu-lin") prompts = [ """Question: 4 na 3 Ezali boni ? Choices : [12, 4, 32, 21] """ ] generations = generate_aya_23(prompts, loaded_model) for p, g in zip(prompts, generations): print( "PROMPT", p ,"RESPONSE", g, "\n", sep="\n" ) ``` ```txt PROMPT Question: 4 na 3 Ezali boni ? Choices : [12, 4, 32, 21] RESPONSE Boni ya 4 ezali 12. ``` ### Framework versions - PEFT 0.11.1 - Transformers 4.41.2 - Pytorch 2.1.0+cu118 - Datasets 2.19.2 - Tokenizers 0.19.1