--- base_model: tokyotech-llm/Swallow-7b-hf library_name: peft license: apache-2.0 datasets: - wikimedia/wikipedia language: - ja - en --- # Model Info This is a model that applies LLM2Vec to Swallow. Only the PEFT Adapter is distributed. LLM2Vec fine-tunes on two tasks: MNTP and SimCSE, but this repository contains the results of applying only the MNTP task. ## Model Details ### Model Description - **Model type:** PEFT - **Language(s) (NLP):** Japanese - **License:** Apache2.0 - **Finetuned from model:** [Swallow-7b-hf](https://huggingface.co/tokyotech-llm/Swallow-7b-hf) ### Model Sources - **Repository:** https://github.com/McGill-NLP/llm2vec - **Paper:** https://arxiv.org/abs/2404.05961 # Usage - Please see [original LLM2Vec repo](https://huggingface.co/McGill-NLP/LLM2Vec-Llama-2-7b-chat-hf-mntp#usage) # Training Details ## Training Data - [Wikipedia](https://huggingface.co/datasets/wikimedia/wikipedia) ## Training Hyperparameter - batch_size: 64 - gradient_accumulation_steps: 1 - max_seq_length: 512, - mask_token_type: "blank", - mlm_probability: 0.2, - lora_r: 16, - torch_dtype: "bfloat16", - attn_implementation: "flash_attention_2", - bf16: true - gradient_checkpointing: true, ## Accelerator Settings - deepspeed_config: - gradient_accumulation_steps: 1 - gradient_clipping: 1.0 - offload_optimizer_device: nvme - offload_optimizer_nvme_path: /nvme - zero3_save_16bit_model: true - zero_stage: 2 - distributed_type: DEEPSPEED - downcast_bf16: 'no' - dynamo_config: - dynamo_backend: INDUCTOR - dynamo_mode: default - dynamo_use_dynamic: true - dynamo_use_fullgraph: true - enable_cpu_affinity: false - machine_rank: 0 - main_training_function: main - mixed_precision: bf16 - num_machines: 1 - num_processes: 2 - rdzv_backend: static - same_network: true - quse_cpu: false ## Framework versions - Python: 3.12.3 - PEFT 0.11.1 - Sentence Transformers: 3.0.1 - Transformers: 4.41.0 - PyTorch: 2.3.0 - Accelerate: 0.30.1 - Datasets: 2.20.0 - Tokenizers: 0.19.1 - MTEB: 1.13.0