hungeni commited on
Commit
3608f10
1 Parent(s): c3e3e7f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -2
README.md CHANGED
@@ -1,7 +1,11 @@
1
  ---
2
  datasets:
3
- - QingyiSi/Alpaca-CoT
4
  - tatsu-lab/alpaca
 
 
 
 
 
5
  - GAIR/lima
6
  language:
7
  - vi
@@ -9,6 +13,6 @@ language:
9
 
10
  + LLaMa2 - 7B Chat models, extend vocab size to 44800 for Vietnamese understanding.
11
  + Continual Pre-Train with 2B Vietnames Tokens aligned from VnNews Corpus, 10K vnthuquan books, wikipedia_vi
12
- + Fine-Tuning with vietllama2-tiny dataset, the combination of [Alpaca, CoT, LIMA, daily chat] then translated into Vietnamese using OpenAI GPT-3
13
 
14
  + For more information: email me at duyhunghd6@gmail.com | http://fb.com/hungbui2013
 
1
  ---
2
  datasets:
 
3
  - tatsu-lab/alpaca
4
+ - ewof/alpaca-instruct-unfiltered
5
+ - databricks/databricks-dolly-15k
6
+ - teknium/GPTeacher-General-Instruct
7
+ - garage-bAInd/Open-Platypus
8
+ - Honkware/oasst1-alpaca-json
9
  - GAIR/lima
10
  language:
11
  - vi
 
13
 
14
  + LLaMa2 - 7B Chat models, extend vocab size to 44800 for Vietnamese understanding.
15
  + Continual Pre-Train with 2B Vietnames Tokens aligned from VnNews Corpus, 10K vnthuquan books, wikipedia_vi
16
+ + Fine-Tuning with vietllama2-tiny dataset, the combination of vaious dataset then translated into Vietnamese using OpenAI GPT-3
17
 
18
  + For more information: email me at duyhunghd6@gmail.com | http://fb.com/hungbui2013