Update README.md
Browse files
README.md
CHANGED
@@ -29,18 +29,18 @@ This is a model that applies LLM2Vec to Swallow. Only the PEFT Adapter is distri
|
|
29 |
- **Repository:** https://github.com/McGill-NLP/llm2vec
|
30 |
- **Paper:** https://arxiv.org/abs/2404.05961
|
31 |
|
32 |
-
|
33 |
|
34 |
- Please see [original LLM2Vec repo](https://huggingface.co/McGill-NLP/LLM2Vec-Llama-2-7b-chat-hf-mntp#usage)
|
35 |
|
36 |
-
|
37 |
|
38 |
-
|
39 |
|
40 |
- [Wikipedia](https://huggingface.co/datasets/wikimedia/wikipedia)
|
41 |
|
42 |
|
43 |
-
|
44 |
- batch_size: 64
|
45 |
- gradient_accumulation_steps: 1
|
46 |
- max_seq_length: 512,
|
@@ -52,7 +52,7 @@ This is a model that applies LLM2Vec to Swallow. Only the PEFT Adapter is distri
|
|
52 |
- bf16: true
|
53 |
- gradient_checkpointing: true,
|
54 |
|
55 |
-
|
56 |
- deepspeed_config:
|
57 |
- gradient_accumulation_steps: 1
|
58 |
- gradient_clipping: 1.0
|
@@ -78,7 +78,7 @@ This is a model that applies LLM2Vec to Swallow. Only the PEFT Adapter is distri
|
|
78 |
- quse_cpu: false
|
79 |
|
80 |
|
81 |
-
|
82 |
|
83 |
- Python: 3.12.3
|
84 |
- PEFT 0.11.1
|
|
|
29 |
- **Repository:** https://github.com/McGill-NLP/llm2vec
|
30 |
- **Paper:** https://arxiv.org/abs/2404.05961
|
31 |
|
32 |
+
# Usage
|
33 |
|
34 |
- Please see [original LLM2Vec repo](https://huggingface.co/McGill-NLP/LLM2Vec-Llama-2-7b-chat-hf-mntp#usage)
|
35 |
|
36 |
+
# Training Details
|
37 |
|
38 |
+
## Training Data
|
39 |
|
40 |
- [Wikipedia](https://huggingface.co/datasets/wikimedia/wikipedia)
|
41 |
|
42 |
|
43 |
+
## Training Hyperparameter
|
44 |
- batch_size: 64
|
45 |
- gradient_accumulation_steps: 1
|
46 |
- max_seq_length: 512,
|
|
|
52 |
- bf16: true
|
53 |
- gradient_checkpointing: true,
|
54 |
|
55 |
+
## Accelerator Settings
|
56 |
- deepspeed_config:
|
57 |
- gradient_accumulation_steps: 1
|
58 |
- gradient_clipping: 1.0
|
|
|
78 |
- quse_cpu: false
|
79 |
|
80 |
|
81 |
+
## Framework versions
|
82 |
|
83 |
- Python: 3.12.3
|
84 |
- PEFT 0.11.1
|