uzabase
/

LLM2Vec-Swallow-7b-hf-wikipedia-jp-mntp

Model card Files Files and versions Community

h-iida commited on Sep 12, 2024

Commit

09e59ae

·

verified ·

1 Parent(s): 7937c2e

Update README.md

Files changed (1) hide show

README.md +6 -6

README.md CHANGED Viewed

@@ -29,18 +29,18 @@ This is a model that applies LLM2Vec to Swallow. Only the PEFT Adapter is distri
 - **Repository:**  https://github.com/McGill-NLP/llm2vec
 - **Paper:** https://arxiv.org/abs/2404.05961
-## Usage
 - Please see [original LLM2Vec repo](https://huggingface.co/McGill-NLP/LLM2Vec-Llama-2-7b-chat-hf-mntp#usage)
-## Training Details
-### Training Data
 - [Wikipedia](https://huggingface.co/datasets/wikimedia/wikipedia)
-#### Training Hyperparameter
 - batch_size: 64
 - gradient_accumulation_steps: 1
 - max_seq_length: 512,
@@ -52,7 +52,7 @@ This is a model that applies LLM2Vec to Swallow. Only the PEFT Adapter is distri
 - bf16: true
 - gradient_checkpointing: true,
-#### Accelerator Settings
 - deepspeed_config:
   - gradient_accumulation_steps: 1
   - gradient_clipping: 1.0
@@ -78,7 +78,7 @@ This is a model that applies LLM2Vec to Swallow. Only the PEFT Adapter is distri
 - quse_cpu: false
-### Framework versions
 - Python: 3.12.3
 - PEFT 0.11.1

 - **Repository:**  https://github.com/McGill-NLP/llm2vec
 - **Paper:** https://arxiv.org/abs/2404.05961
+# Usage
 - Please see [original LLM2Vec repo](https://huggingface.co/McGill-NLP/LLM2Vec-Llama-2-7b-chat-hf-mntp#usage)
+# Training Details
+## Training Data
 - [Wikipedia](https://huggingface.co/datasets/wikimedia/wikipedia)
+## Training Hyperparameter
 - batch_size: 64
 - gradient_accumulation_steps: 1
 - max_seq_length: 512,
 - bf16: true
 - gradient_checkpointing: true,
+## Accelerator Settings
 - deepspeed_config:
   - gradient_accumulation_steps: 1
   - gradient_clipping: 1.0
 - quse_cpu: false
+## Framework versions
 - Python: 3.12.3
 - PEFT 0.11.1