File size: 2,094 Bytes
514b91c 63214b1 514b91c 01949c7 514b91c 63214b1 514b91c 63214b1 7937c2e 514b91c bc38caf 514b91c 63214b1 514b91c 09e59ae 514b91c 63214b1 514b91c 09e59ae 514b91c 09e59ae 514b91c 63214b1 09e59ae 63214b1 8e2603c 63214b1 09e59ae 63214b1 514b91c 09e59ae 514b91c 63214b1 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 |
---
base_model: tokyotech-llm/Swallow-7b-hf
library_name: peft
license: apache-2.0
datasets:
- wikimedia/wikipedia
language:
- ja
- en
---
# Model Info
This is a model that applies LLM2Vec to Swallow. Only the PEFT Adapter is distributed. LLM2Vec fine-tunes on two tasks: MNTP and SimCSE, but this repository contains the results of applying only the MNTP task.
## Model Details
### Model Description
<!-- Provide a longer summary of what this model is. -->
- **Model type:** PEFT
- **Language(s) (NLP):** Japanese
- **License:** Apache2.0
- **Finetuned from model:** [Swallow-7b-hf](https://huggingface.co/tokyotech-llm/Swallow-7b-hf)
### Model Sources
- **Repository:** https://github.com/McGill-NLP/llm2vec
- **Paper:** https://arxiv.org/abs/2404.05961
# Usage
- Please see [original LLM2Vec repo](https://huggingface.co/McGill-NLP/LLM2Vec-Llama-2-7b-chat-hf-mntp#usage)
# Training Details
## Training Data
- [Wikipedia](https://huggingface.co/datasets/wikimedia/wikipedia)
## Training Hyperparameter
- batch_size: 64
- gradient_accumulation_steps: 1
- max_seq_length: 512,
- mask_token_type: "blank",
- mlm_probability: 0.2,
- lora_r: 16,
- torch_dtype: "bfloat16",
- attn_implementation: "flash_attention_2",
- bf16: true
- gradient_checkpointing: true,
## Accelerator Settings
- deepspeed_config:
- gradient_accumulation_steps: 1
- gradient_clipping: 1.0
- offload_optimizer_device: nvme
- offload_optimizer_nvme_path: /nvme
- zero3_save_16bit_model: true
- zero_stage: 2
- distributed_type: DEEPSPEED
- downcast_bf16: 'no'
- dynamo_config:
- dynamo_backend: INDUCTOR
- dynamo_mode: default
- dynamo_use_dynamic: true
- dynamo_use_fullgraph: true
- enable_cpu_affinity: false
- machine_rank: 0
- main_training_function: main
- mixed_precision: bf16
- num_machines: 1
- num_processes: 2
- rdzv_backend: static
- same_network: true
- quse_cpu: false
## Framework versions
- Python: 3.12.3
- PEFT 0.11.1
- Sentence Transformers: 3.0.1
- Transformers: 4.41.0
- PyTorch: 2.3.0
- Accelerate: 0.30.1
- Datasets: 2.20.0
- Tokenizers: 0.19.1
- MTEB: 1.13.0 |