PEFT
Safetensors
Japanese
English
File size: 2,094 Bytes
514b91c
 
 
63214b1
 
 
 
 
 
514b91c
 
01949c7
514b91c
63214b1
514b91c
 
 
 
 
 
 
63214b1
 
 
7937c2e
514b91c
bc38caf
514b91c
63214b1
 
514b91c
09e59ae
514b91c
63214b1
514b91c
09e59ae
514b91c
09e59ae
514b91c
63214b1
 
 
09e59ae
63214b1
 
 
 
 
 
 
 
 
8e2603c
63214b1
09e59ae
63214b1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
514b91c
 
09e59ae
514b91c
63214b1
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
---
base_model: tokyotech-llm/Swallow-7b-hf
library_name: peft
license: apache-2.0
datasets:
- wikimedia/wikipedia
language:
- ja
- en
---

# Model Info

This is a model that applies LLM2Vec to Swallow. Only the PEFT Adapter is distributed. LLM2Vec fine-tunes on two tasks: MNTP and SimCSE, but this repository contains the results of applying only the MNTP task.

## Model Details

### Model Description

<!-- Provide a longer summary of what this model is. -->

- **Model type:** PEFT
- **Language(s) (NLP):** Japanese
- **License:** Apache2.0
- **Finetuned from model:** [Swallow-7b-hf](https://huggingface.co/tokyotech-llm/Swallow-7b-hf)

### Model Sources

- **Repository:**  https://github.com/McGill-NLP/llm2vec
- **Paper:** https://arxiv.org/abs/2404.05961

# Usage

- Please see [original LLM2Vec repo](https://huggingface.co/McGill-NLP/LLM2Vec-Llama-2-7b-chat-hf-mntp#usage)

# Training Details

## Training Data

- [Wikipedia](https://huggingface.co/datasets/wikimedia/wikipedia)


## Training Hyperparameter
- batch_size: 64
- gradient_accumulation_steps: 1
- max_seq_length: 512,
- mask_token_type: "blank",
- mlm_probability: 0.2,
- lora_r: 16,
- torch_dtype: "bfloat16",
- attn_implementation: "flash_attention_2",
- bf16: true
- gradient_checkpointing: true,

## Accelerator Settings
- deepspeed_config:
  - gradient_accumulation_steps: 1
  - gradient_clipping: 1.0
  - offload_optimizer_device: nvme
  - offload_optimizer_nvme_path: /nvme
  - zero3_save_16bit_model: true
  - zero_stage: 2 
- distributed_type: DEEPSPEED
- downcast_bf16: 'no'
- dynamo_config:
  - dynamo_backend: INDUCTOR
  - dynamo_mode: default
  - dynamo_use_dynamic: true
  - dynamo_use_fullgraph: true
- enable_cpu_affinity: false
- machine_rank: 0
- main_training_function: main
- mixed_precision: bf16
- num_machines: 1
- num_processes: 2
- rdzv_backend: static
- same_network: true
- quse_cpu: false


## Framework versions

- Python: 3.12.3
- PEFT 0.11.1
- Sentence Transformers: 3.0.1
- Transformers: 4.41.0
- PyTorch: 2.3.0
- Accelerate: 0.30.1
- Datasets: 2.20.0
- Tokenizers: 0.19.1
- MTEB: 1.13.0