Update README.md
Browse files
README.md
CHANGED
@@ -13,18 +13,13 @@ license: apache-2.0
|
|
13 |
This model is a fine-tuned version of the `CohereForAI/aya-23-8B` base model. It has been fine-tuned using a private dataset of prompt-response pairs that has been curated over the past two years. The fine-tuning process aimed to improve the model's ability to generate relevant and accurate responses in various conversational contexts.
|
14 |
|
15 |
- **Developed by:** Franck Stéphane NDZOMGA
|
16 |
-
- **Funded by [optional]:**
|
17 |
- **Shared by [optional]:** Franck Stéphane NDZOMGA
|
18 |
- **Model type:** Causal Language Model with LoRA Adapters
|
19 |
-
- **Language(s) (NLP):**
|
20 |
- **License:** Apache-2.0
|
21 |
- **Finetuned from model:** CohereForAI/aya-23-8B
|
22 |
|
23 |
-
### Model Sources [optional]
|
24 |
-
|
25 |
-
- **Repository:** [Include the repository link here if publicly available]
|
26 |
-
- **Paper [optional]:** [More Information Needed]
|
27 |
-
- **Demo [optional]:** [More Information Needed]
|
28 |
|
29 |
## Uses
|
30 |
|
@@ -87,7 +82,7 @@ The model was fine-tuned using a private dataset of prompt-response pairs curate
|
|
87 |
#### Training Hyperparameters
|
88 |
|
89 |
- **Precision:** Mixed precision (fp16)
|
90 |
-
- **Number of epochs:**
|
91 |
- **Batch size:** 1 (gradient accumulation steps: 16 to handle memory issues)
|
92 |
- **Learning rate:** 5e-5
|
93 |
- **Warmup steps:** 100
|
@@ -99,13 +94,6 @@ The model was fine-tuned using a private dataset of prompt-response pairs curate
|
|
99 |
- **Remove unused columns:** False
|
100 |
- **Mixed Precision:** Disabled (fp16=False) to avoid conflicts
|
101 |
|
102 |
-
### Speeds, Sizes, Times [optional]
|
103 |
-
|
104 |
-
- **Training started:** [Date]
|
105 |
-
- **Training completed:** [Date]
|
106 |
-
- **Average training speed:** [Specify if available]
|
107 |
-
- **Model size:** [Specify if available]
|
108 |
-
|
109 |
### Additional Information from Training Code
|
110 |
|
111 |
- The training utilized the PEFT (Parameter Efficient Fine-Tuning) library, specifically leveraging the LoRA (Low-Rank Adaptation) method to fine-tune the `CohereForAI/aya-23-8B` model.
|
|
|
13 |
This model is a fine-tuned version of the `CohereForAI/aya-23-8B` base model. It has been fine-tuned using a private dataset of prompt-response pairs that has been curated over the past two years. The fine-tuning process aimed to improve the model's ability to generate relevant and accurate responses in various conversational contexts.
|
14 |
|
15 |
- **Developed by:** Franck Stéphane NDZOMGA
|
16 |
+
- **Funded by [optional]:** FS NDZOMGA
|
17 |
- **Shared by [optional]:** Franck Stéphane NDZOMGA
|
18 |
- **Model type:** Causal Language Model with LoRA Adapters
|
19 |
+
- **Language(s) (NLP):** Arabic, Chinese (simplified & traditional), Czech, Dutch, English, French, German, Greek, Hebrew, Hindi, Indonesian, Italian, Japanese, Korean, Persian, Polish, Portuguese, Romanian, Russian, Spanish, Turkish, Ukrainian, and Vietnamese
|
20 |
- **License:** Apache-2.0
|
21 |
- **Finetuned from model:** CohereForAI/aya-23-8B
|
22 |
|
|
|
|
|
|
|
|
|
|
|
23 |
|
24 |
## Uses
|
25 |
|
|
|
82 |
#### Training Hyperparameters
|
83 |
|
84 |
- **Precision:** Mixed precision (fp16)
|
85 |
+
- **Number of epochs:** 1
|
86 |
- **Batch size:** 1 (gradient accumulation steps: 16 to handle memory issues)
|
87 |
- **Learning rate:** 5e-5
|
88 |
- **Warmup steps:** 100
|
|
|
94 |
- **Remove unused columns:** False
|
95 |
- **Mixed Precision:** Disabled (fp16=False) to avoid conflicts
|
96 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
97 |
### Additional Information from Training Code
|
98 |
|
99 |
- The training utilized the PEFT (Parameter Efficient Fine-Tuning) library, specifically leveraging the LoRA (Low-Rank Adaptation) method to fine-tune the `CohereForAI/aya-23-8B` model.
|