This is a train starting from an empty model based exclusively on Italian language datasets (currently redpajama 2023-14 it)
the train is ongoing and will extend to new datasets.
More precise versions will be published shortly.
Train on my server, i have studied and adapted the model starting from the repository https://github.com/karpathy/llama2.c
- LLama model parameter:
- max_seq_len: (7b = 2048) The maximum sequence length for input data.
- dim (7b= 4096) Represents the dimensionalityl
- n_layers: (7b = 32) The number of layers
- n_heads: (7b = 32) Determines the number of attention heads
- n_kv_heads: (7b = 32) The number of key and value heads
- multiple_of: (7b = 256) A value used to make the SwiGLU hidden layer size a multiple of a large power of 2
- Model parameter
- max_seq_len = 1024
- dim = 768
- n_layers = 32
- n_heads = 32
- n_kv_heads = 32
- multiple_of = 32
num decayed parameter tensors: 225, with 251,068,416 parameters
num non-decayed parameter tensors: 65, with 49,920 parameters
To just use the model, you can run:
# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM
# Load the model and tokenizer
tokenizer_model = AutoTokenizer.from_pretrained("peruginia/Llama-2-Small")
model = AutoModelForCausalLM.from_pretrained("peruginia/Llama-2-Small")
model.to('cuda')
from tokenizer import Tokenizer
# Define the prompt
prompt = "Alessandro è un ragazzo che progetta Infissi"
# Tokenize the prompt
inputs = tokenizer_model(prompt, return_tensors="pt").to('cuda')
# Generate text
output = model.generate(**inputs, do_sample = True, max_new_tokens=100, top_k = 300, top_p = 0.85, temperature = 1.0, num_return_sequences = 1)
# Decode and print the generated text
generated_text = tokenizer_model.decode(output[0], skip_special_tokens=True)
print(generated_text)
- Downloads last month
- 331
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.