swap-uniba
/

LLaMAntino-2-chat-13b-hf-UltraChat-ITA

@@ -33,7 +33,7 @@ If you are interested in more details regarding the training procedure, you can
 This prompt format based on the [LLaMA 2 prompt template](https://gpus.llm-utils.org/llama-2-prompt-template/) adapted to the italian language was used:
 ```python
-"<s>[INST] <<SYS>>\n" \
 "Sei un assistente disponibile, rispettoso e onesto. " \
 "Rispondi sempre nel modo piu' utile possibile, pur essendo sicuro. " \
 "Le risposte non devono includere contenuti dannosi, non etici, razzisti, sessisti, tossici, pericolosi o illegali. " \
@@ -41,7 +41,7 @@ This prompt format based on the [LLaMA 2 prompt template](https://gpus.llm-utils
 "Se una domanda non ha senso o non e' coerente con i fatti, spiegane il motivo invece di rispondere in modo non corretto. " \
 "Se non conosci la risposta a una domanda, non condividere informazioni false.\n" \
 "<</SYS>>\n\n" \
-f"{user_msg_1} [/INST] {model_answer_1} </s><s>[INST] {user_msg_2} [/INST] {model_answer_2} </s> ... <s>[INST] {user_msg_N} [/INST] {model_answer_N} </s> "
 ```
 We recommend using the same prompt in inference to obtain the best results!
@@ -60,7 +60,7 @@ model = AutoModelForCausalLM.from_pretrained(model_id)
 user_msg = "Ciao! Come stai?"
-prompt = "<s>[INST] <<SYS>>\n" \
          "Sei un assistente disponibile, rispettoso e onesto. " \
          "Rispondi sempre nel modo piu' utile possibile, pur essendo sicuro. " \
          "Le risposte non devono includere contenuti dannosi, non etici, razzisti, sessisti, tossici, pericolosi o illegali. " \
@@ -68,7 +68,7 @@ prompt = "<s>[INST] <<SYS>>\n" \
          "Se una domanda non ha senso o non e' coerente con i fatti, spiegane il motivo invece di rispondere in modo non corretto. " \
          "Se non conosci la risposta a una domanda, non condividere informazioni false.\n" \
          "<</SYS>>\n\n" \
-         f"{user_msg} [/INST] "
 input_ids = tokenizer(prompt, return_tensors="pt").input_ids
 outputs = model.generate(input_ids=input_ids, max_length=1024)
@@ -76,7 +76,7 @@ outputs = model.generate(input_ids=input_ids, max_length=1024)
 print(tokenizer.batch_decode(outputs.detach().cpu().numpy()[:, input_ids.shape[1]:], skip_special_tokens=True)[0])
 ```
-If you are facing issues when loading the model, you can try to load it quantized:
 ```python
 model = AutoModelForCausalLM.from_pretrained(model_id, load_in_8bit=True)

 This prompt format based on the [LLaMA 2 prompt template](https://gpus.llm-utils.org/llama-2-prompt-template/) adapted to the italian language was used:
 ```python
+" [INST]<<SYS>>\n" \
 "Sei un assistente disponibile, rispettoso e onesto. " \
 "Rispondi sempre nel modo piu' utile possibile, pur essendo sicuro. " \
 "Le risposte non devono includere contenuti dannosi, non etici, razzisti, sessisti, tossici, pericolosi o illegali. " \
 "Se una domanda non ha senso o non e' coerente con i fatti, spiegane il motivo invece di rispondere in modo non corretto. " \
 "Se non conosci la risposta a una domanda, non condividere informazioni false.\n" \
 "<</SYS>>\n\n" \
+f"{user_msg_1}[/INST] {model_answer_1} </s> <s> [INST]{user_msg_2}[/INST] {model_answer_2} </s> ... <s> [INST]{user_msg_N}[/INST] {model_answer_N} </s>"
 ```
 We recommend using the same prompt in inference to obtain the best results!
 user_msg = "Ciao! Come stai?"
+prompt = " [INST]<<SYS>>\n" \
          "Sei un assistente disponibile, rispettoso e onesto. " \
          "Rispondi sempre nel modo piu' utile possibile, pur essendo sicuro. " \
          "Le risposte non devono includere contenuti dannosi, non etici, razzisti, sessisti, tossici, pericolosi o illegali. " \
          "Se una domanda non ha senso o non e' coerente con i fatti, spiegane il motivo invece di rispondere in modo non corretto. " \
          "Se non conosci la risposta a una domanda, non condividere informazioni false.\n" \
          "<</SYS>>\n\n" \
+         f"{user_msg}[/INST] "
 input_ids = tokenizer(prompt, return_tensors="pt").input_ids
 outputs = model.generate(input_ids=input_ids, max_length=1024)
 print(tokenizer.batch_decode(outputs.detach().cpu().numpy()[:, input_ids.shape[1]:], skip_special_tokens=True)[0])
 ```
+If you are facing issues when loading the model, you can try to load it **Quantized**:
 ```python
 model = AutoModelForCausalLM.from_pretrained(model_id, load_in_8bit=True)