--- base_model: meta-llama/Meta-Llama-3-8B-Instruct datasets: - Danielbrdz/Barcenas-Economia - HiTZ/casimedicos-exp - somosnlp/coser_resumenes - csebuetnlp/CrossSum - Iker/Document-Translation-en-es - somosnlp/es-inclusive-language-it - glaiveai/glaive-code-assistant-v3 - glaiveai/glaive-function-calling-v2 - Iker/InstructTranslation-EN-ES - somosnlp/lenguaje-claro-dataset - somosnlp/LingComp_QA - Iker/NoticIA - teknium/OpenHermes-2.5 - Iker/OpenHermes-2.5-Spanish - Helsinki-NLP/opus-100 - projecte-aina/RAG_Multilingual - HiTZ/This-is-not-a-dataset - Iker/Reddit-Post-Translation - wikipedia language: - es - en library_name: transformers license: llama3 pipeline_tag: text-generation tags: - synthetic --- ![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/614a1ebb8f82f1df64d55126/2i_CasoeJTgQPNoBIfA8E.jpeg) # Neurona 8B Beta: Un Modelo de Lenguage en Español > Esta es una versión preliminar del dataset card. El modelo está en desarrollo y no es la versión final. Si quieres saber más sobre este modelo, escribe a iker.garciaf@ehu.eus Neurona 8B es un modelo de lenguaje en Español. Esta es la segunda iteración y un experimento para poner a punto los scripts y la infraestructura. Neurona 8B ha sido entrenado con los siguiente datasets. No en todos los casos se ha usado el dataset completo - [Danielbrdz/Barcenas-Economia](https://huggingface.co/datasets/Danielbrdz/Barcenas-Economia) - [HiTZ/casimedicos-exp](https://huggingface.co/datasets/HiTZ/casimedicos-exp) - [somosnlp/coser_resumenes](https://huggingface.co/datasets/somosnlp/coser_resumenes) - [csebuetnlp/CrossSum en + es](https://huggingface.co/datasets/csebuetnlp/CrossSum) - [Iker/Document-Translation-en-es](https://huggingface.co/datasets/Iker/Document-Translation-en-es) - [somosnlp/es-inclusive-language-it](https://huggingface.co/datasets/somosnlp/es-inclusive-language-it) - [glaiveai/glaive-code-assistant-v3](https://huggingface.co/datasets/glaiveai/glaive-code-assistant-v3) - [glaiveai/glaive-function-calling-v2](https://huggingface.co/datasets/glaiveai/glaive-function-calling-v2) - [Iker/InstructTranslation-EN-ES](https://huggingface.co/datasets/Iker/InstructTranslation-EN-ES) - [somosnlp/lenguaje-claro-dataset](https://huggingface.co/datasets/somosnlp/lenguaje-claro-dataset) - [somosnlp/LingComp_QA](https://huggingface.co/datasets/somosnlp/LingComp_QA) - [Iker/NoticIA](https://huggingface.co/datasets/Iker/NoticIA) - [teknium/OpenHermes-2.5](https://huggingface.co/datasets/teknium/OpenHermes-2.5) - [Iker/OpenHermes-2.5-Spanish](https://huggingface.co/datasets/Iker/OpenHermes-2.5-Spanish) - [Helsinki-NLP/opus-100 en es](https://huggingface.co/datasets/Helsinki-NLP/opus-100) - [projecte-aina/RAG_Multilingual](https://huggingface.co/datasets/projecte-aina/RAG_Multilingual) - [HiTZ/This-is-not-a-dataset](https://huggingface.co/datasets/HiTZ/This-is-not-a-dataset) - [wikipedia es](https://huggingface.co/datasets/wikipedia) - [Iker/Reddit-Post-Translation](https://huggingface.co/datasets/Iker/Reddit-Post-Translation) Esta mezcla de datasets en Inglés y Español, permite al modelo adquirir diferentes capacidades, como RAG, function calling, code assistant, question answering, summarization... tanto en Inglés como en Español. # Entrenamiento Este modelo se ha entrado usando 4xNvidia A100 80Gb y axolotl [Built with Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl) Esta es la configuración usada ```yaml base_model: meta-llama/Meta-Llama-3-8B-Instruct model_type: AutoModelForCausalLM tokenizer_type: AutoTokenizer is_falcon_derived_model: is_llama_derived_model: is_qwen_derived_model: is_mistral_derived_model: load_in_8bit: false load_in_4bit: false strict: false device_map: null datasets: - path: /ikerlariak/igarcia945/InstructDatasets/Barcenas-Economia.jsonl type: sharegpt conversation: llama3 field: conversations roles: input: - system - gpt output: - human - path: /ikerlariak/igarcia945/InstructDatasets/casimedicos.jsonl type: sharegpt conversation: llama3 field: conversations roles: input: - system - gpt output: - human - path: /ikerlariak/igarcia945/InstructDatasets/coser_resumene.jsonl type: sharegpt conversation: llama3 field: conversations roles: input: - system - gpt output: - human - path: /ikerlariak/igarcia945/InstructDatasets/CrossSum_en.jsonl type: sharegpt conversation: llama3 field: conversations roles: input: - system - gpt output: - human - path: /ikerlariak/igarcia945/InstructDatasets/CrossSum_es.jsonl type: sharegpt conversation: llama3 field: conversations roles: input: - system - gpt output: - human - path: /ikerlariak/igarcia945/InstructDatasets/Document-Translation-en-es.jsonl type: sharegpt conversation: llama3 field: conversations roles: input: - system - gpt output: - human - path: /ikerlariak/igarcia945/InstructDatasets/es-inclusive-language.jsonl type: sharegpt conversation: llama3 field: conversations roles: input: - system - gpt output: - human - path: /ikerlariak/igarcia945/InstructDatasets/glaive-code-assistant-v3-small.jsonl type: sharegpt conversation: llama3 field: conversations roles: input: - system - gpt output: - human - path: /ikerlariak/igarcia945/InstructDatasets/glaive-function-calling-v2.jsonl type: sharegpt conversation: llama3 field: conversations roles: input: - system - gpt - tool output: - human - path: /ikerlariak/igarcia945/InstructDatasets/InstructTranslation-EN-ES.jsonl type: sharegpt conversation: llama3 field: conversations roles: input: - system - gpt output: - human - path: /ikerlariak/igarcia945/InstructDatasets/lenguaje-claro-dataset.jsonl type: sharegpt conversation: llama3 field: conversations roles: input: - system - gpt output: - human - path: /ikerlariak/igarcia945/InstructDatasets/LingComp_QA.jsonl type: sharegpt conversation: llama3 field: conversations roles: input: - system - gpt output: - human - path: /ikerlariak/igarcia945/InstructDatasets/NoticIA.jsonl type: sharegpt conversation: llama3 field: conversations roles: input: - system - gpt output: - human - path: /ikerlariak/igarcia945/InstructDatasets/NoticIA-large.jsonl type: sharegpt conversation: llama3 field: conversations roles: input: - system - gpt output: - human - path: /ikerlariak/igarcia945/InstructDatasets/NoticIA-summary.jsonl type: sharegpt conversation: llama3 field: conversations roles: input: - system - gpt output: - human - path: /ikerlariak/igarcia945/InstructDatasets/OpenHermes-2.5-English.jsonl type: sharegpt conversation: llama3 field: conversations roles: input: - system - gpt output: - human - path: /ikerlariak/igarcia945/InstructDatasets/OpenHermes-2.5-Spanish.jsonl type: sharegpt conversation: llama3 field: conversations roles: input: - system - gpt output: - human - path: /ikerlariak/igarcia945/InstructDatasets/opus-100-en-es.jsonl type: sharegpt conversation: llama3 field: conversations roles: input: - system - gpt output: - human - path: /ikerlariak/igarcia945/InstructDatasets/RAG_Multilingual-es.jsonl type: sharegpt conversation: llama3 field: conversations roles: input: - system - gpt output: - human - path: /ikerlariak/igarcia945/InstructDatasets/This-is-not-a-dataset.jsonl type: sharegpt conversation: llama3 field: conversations roles: input: - system - gpt output: - human - path: /ikerlariak/igarcia945/InstructDatasets/wikipedia-es.jsonl type: sharegpt conversation: llama3 field: conversations roles: input: - system - gpt output: - human - path: /ikerlariak/igarcia945/InstructDatasets/Reddit-Post-Translation.jsonl type: sharegpt conversation: llama3 field: conversations roles: input: - system - gpt output: - human - path: /ikerlariak/igarcia945/InstructDatasets/watermark.jsonl type: sharegpt conversation: llama3 field: conversations roles: input: - system - gpt output: - human chat_template: llama3 dataset_prepared_path: /ikerlariak/igarcia945/Mortadelo-Filemon/Meta-Llama-3-8B-Instruct-Spanish-v2/dataset shuffle_merged_datasets: true val_set_size: 0.005 output_dir: /ikerlariak/igarcia945/Mortadelo-Filemon/Meta-Llama-3-8B-Instruct-Spanish-v2 adapter: lora_model_dir: sequence_len: 8192 sample_packing: true eval_sample_packing: false pad_to_sequence_len: false tokens: - "" - "" - "" - "" - "" - "" - "" - "" special_tokens: pad_token: <|end_of_text|> neftune_noise_alpha: 5 wandb_project: Mortadelo&Filemon wandb_entity: igarciaf wandb_watch: wandb_name: Meta-Llama-3-8B-Instruct-Spanish-v2 wandb_log_model: gradient_accumulation_steps: 32 micro_batch_size: 2 eval_batch_size: 2 num_epochs: 2 optimizer: adamw_torch_fused lr_scheduler: cosine learning_rate: 0.00007 train_on_inputs: false group_by_length: false bf16: true fp16: false tf32: false gradient_checkpointing: true early_stopping_patience: resume_from_checkpoint: local_rank: logging_steps: 1 xformers_attention: flash_attention: true warmup_ratio: 0.03 evals_per_epoch: 4 eval_table_size: save_strategy: "no" debug: deepspeed: /ikerlariak/igarcia945/Mortadelo-Filemon/train_configs/deepspeed_zero3.json weight_decay: 0.0 fsdp: fsdp_config: seed: 33 ```