itsrocchi commited on
Commit
df75952
·
1 Parent(s): 9687a4e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +41 -4
README.md CHANGED
@@ -5,7 +5,7 @@ datasets:
5
  language:
6
  - it
7
  ---
8
- # Model Card for Model ID
9
 
10
  <!-- Provide a quick summary of what the model is/does. -->
11
 
@@ -17,7 +17,8 @@ The model is a fine-tuned version of [LLama-2-7b-chat-hf](https://huggingface.co
17
  <!-- **Developed by:** [More Information Needed]
18
  - **Shared by [optional]:** [More Information Needed]
19
  - **Model type:** [More Information Needed] -->
20
- - **Language(s) (NLP):** Italian
 
21
  - **Finetuned from model: [LLama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf)**
22
 
23
  <!-- ### Model Sources [optional]
@@ -35,9 +36,45 @@ The model is a fine-tuned version of [LLama-2-7b-chat-hf](https://huggingface.co
35
 
36
  Due to a lack of training the model may not produce 100% correct output sentences.
37
 
 
38
 
39
- ### Training Data
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
40
 
41
  <!-- This should link to a Data Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
42
 
43
- The dataset used is [itsrocchi/seeweb-it-292-forLLM](https://huggingface.co/datasets/itsrocchi/seeweb-it-292-forLLM), a dataset containing approx. 300 italian prompt-answer conversations.
 
 
 
5
  language:
6
  - it
7
  ---
8
+ # Model Card for itsrocchi/SeewebLLM-it-ver2
9
 
10
  <!-- Provide a quick summary of what the model is/does. -->
11
 
 
17
  <!-- **Developed by:** [More Information Needed]
18
  - **Shared by [optional]:** [More Information Needed]
19
  - **Model type:** [More Information Needed] -->
20
+ - **Backbone Model**: [LLama2](https://github.com/facebookresearch/llama/tree/main)
21
+ - **Language(s) :** Italian
22
  - **Finetuned from model: [LLama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf)**
23
 
24
  <!-- ### Model Sources [optional]
 
36
 
37
  Due to a lack of training the model may not produce 100% correct output sentences.
38
 
39
+ ### Training script
40
 
41
+ The following repository contains scripts and instructions used for the finetuning and testing:
42
+
43
+ **[https://github.com/itsrocchi/finetuning-llama2-ita.git](https://github.com/itsrocchi/finetuning-llama2-ita.git)**
44
+
45
+ ### Inference
46
+
47
+ here's a little python snippet to perform inference
48
+
49
+ ```python
50
+ import torch
51
+ from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
52
+
53
+ tokenizer = AutoTokenizer.from_pretrained("itsrocchi/SeewebLLM-it-ver2")
54
+ model = AutoModelForCausalLM.from_pretrained(
55
+ "itsrocchi/SeewebLLM-it-ver2",
56
+ device_map="auto",
57
+ torch_dtype=torch.float16,
58
+ load_in_8bit=True,
59
+ rope_scaling={"type": "dynamic", "factor": 2}
60
+ )
61
+
62
+ # eventualmente si possono modificare i parametri di model e tokenizer
63
+ # inserendo il percorso assoluto della directory locale del modello
64
+
65
+ prompt = "### User:\nDescrivi cos' è l'intelligenza artificiale\n\n### Assistant:\n"
66
+ #modificare ciò che è scritto tra "User" ed "assistant per personalizzare il prompt"
67
+ inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
68
+ streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)
69
+
70
+ output = model.generate(**inputs, streamer=streamer, use_cache=True, max_new_tokens=float('inf'))
71
+ output_text = tokenizer.decode(output[0], skip_special_tokens=True)
72
+ ```
73
+
74
+ ### Training Data and Details
75
 
76
  <!-- This should link to a Data Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
77
 
78
+ The dataset used is [itsrocchi/seeweb-it-292-forLLM](https://huggingface.co/datasets/itsrocchi/seeweb-it-292-forLLM), a dataset containing approx. 300 italian prompt-answer conversations.
79
+
80
+ The training has been made on RTX A6000, inside [Seeweb's Cloud Server GPU](https://www.seeweb.it/prodotti/cloud-server-gpu)