w11wo commited on
Commit
094ab2a
1 Parent(s): a7ff79e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +69 -1
README.md CHANGED
@@ -1,3 +1,71 @@
1
  ---
2
- license: apache-2.0
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ license: openrail
3
+ language:
4
+ - id
5
  ---
6
+
7
+ # Domba: Indonesian Instruct-LLaMA
8
+
9
+ ## Usage
10
+
11
+ Check the Github repo with code: https://github.com/22-hours/cabrita
12
+
13
+ ```python
14
+ from peft import PeftModel
15
+ from transformers import LLaMATokenizer, LLaMAForCausalLM, GenerationConfig
16
+
17
+ tokenizer = LLaMATokenizer.from_pretrained("decapoda-research/llama-7b-hf")
18
+ model = LLaMAForCausalLM.from_pretrained(
19
+ "decapoda-research/llama-7b-hf",
20
+ load_in_8bit=True,
21
+ device_map="auto",
22
+ )
23
+ model = PeftModel.from_pretrained(model, "bookbot/domba-lora-v0-1")
24
+ ```
25
+
26
+ ## Data
27
+
28
+ We translated the [alpaca_data.json](https://github.com/tatsu-lab/stanford_alpaca/blob/main/alpaca_data.json) to Indonesian using [NLLB-200 Distilled 600M](https://huggingface.co/facebook/nllb-200-distilled-600M).
29
+
30
+ If you want to know more about how the dataset was built go to: [Stanford Alpaca](https://github.com/tatsu-lab/stanford_alpaca).
31
+
32
+ ## Finetuning
33
+
34
+ To finetuned the LLaMA model we used the code available on [Alpaca Lora](https://github.com/tloen/alpaca-lora), which provides code to finetune the LLaMA model using PEFT from Hugging Face. With this, we could run our finetuning step using 1 A100 at Google Cloud Engine on top of LLaMA-7B. We trained during 4 hours and we found the results pretty incredible with just that much time. The notebook we used is available [here](notebooks/train_lora.ipynb).
35
+
36
+ ## Example outputs
37
+
38
+ Stanford Alpaca:
39
+ ```
40
+ Instruction: Berikan alasan kreatif untuk kenapa saya tidak harus pergi ke pesta.
41
+ Response: I don't have to go to the festival because I have a creative reason to stay home.
42
+ ```
43
+
44
+ Domba:
45
+ ```
46
+ Instruksi: Berikan alasan kreatif untuk kenapa saya tidak harus pergi ke pesta.
47
+ Jawaban: Saya tidak pergi ke pesta karena saya ingin menghabiskan waktu saya dengan keluarga saya.
48
+ ```
49
+
50
+ Stanford Alpaca:
51
+ ```
52
+ Instruction: Bagaimana saya bisa memulai karir sebagai ilmuwan data? Tulis dalam bentuk daftar.
53
+ Response: How can I start a career as a data scientist? Write a list.
54
+ ```
55
+
56
+ *Note: yes, Alpaca's response was translating the instruction to English.*
57
+
58
+ Domba:
59
+ ```
60
+ Instruksi: Bagaimana saya bisa memulai karir sebagai ilmuwan data? Tulis dalam bentuk daftar.
61
+ Jawaban: Untuk memulai karir sebagai ilmuwan data, Anda harus memiliki pengetahuan dan keterampilan yang berkelanjutan tentang teknologi dan analisis data. Anda juga harus memiliki keterampilan pemecahan masalah dan komunikasi yang baik, serta kemampuan untuk membangun hubungan dengan orang lain. Selain itu, Anda harus memiliki keterampilan untuk menganalisis dan menggambarkan data, serta kemampuan untuk membuat keputusan yang baik berdasarkan data.
62
+ ```
63
+
64
+ You can test it using the eval notebook [here](https://colab.research.google.com/github/22-hours/cabrita/blob/main/notebooks/cabrita-lora.ipynb).
65
+
66
+ ## References
67
+
68
+ > If I have seen further it is by standing on the sholders [sic] of Giants.
69
+ > -- <cite>Isaac Newton</cite>
70
+
71
+ We started this section with this citation because everything we did was only possible due to the strong community and works that other people and groups did. For our work, we rely mainly in the works developed by: [LLaMA](https://ai.facebook.com/blog/large-language-model-llama-meta-ai/), [Stanford Alpaca](https://github.com/tatsu-lab/stanford_alpaca), [Alpaca Lora](https://github.com/tloen/alpaca-lora), [ChatGPT](https://openai.com/blog/chatgpt) and [Hugging Face](https://huggingface.co/). So, thank you all for the great work and open this to the world!