PULI-HuBA 130M

PULI-HuBA 130M is a monolingual Hungarian foundation model based on the Mamba configuration. (https://huggingface.co/state-spaces/mamba-130m-hf)

Parameters: MambaForCausalLM( (backbone): MambaModel( (embeddings): Embedding(52000, 768) (layers): ModuleList( (0-23): 24 x MambaBlock( (norm): MambaRMSNorm(768, eps=1e-05) (mixer): MambaMixer( (conv1d): Conv1d(1536, 1536, kernel_size=(4,), stride=(1,), padding=(3,), groups=1536) (act): SiLU() (in_proj): Linear(in_features=768, out_features=3072, bias=False) (x_proj): Linear(in_features=1536, out_features=80, bias=False) (dt_proj): Linear(in_features=48, out_features=1536, bias=True) (out_proj): Linear(in_features=1536, out_features=768, bias=False) ) ) ) (norm_f): MambaRMSNorm(768, eps=1e-05) ) (lm_head): Linear(in_features=768, out_features=52000, bias=False) )

Training Data (Pretraining)

The model was trained on a ~3.48B-token, toxic-filtered, deduplicated, and semantically segmented dataset.

Training Details

License: Apache 2.0
Hardware: 4 × NVIDIA A100 (80GB) GPUs
Year of training: 2024
Input/output: Text only
Parameter count: 130 million
Available model size: Single variant
Data type: float32
Batch size: 10 per GPU
Learning rate: 3e-4
    Reference: GitHub issue

Ethical Considerations

Concerns:

Potential for biased, incorrect, or harmful content generation.

Usage Example

To generate text using this model with Hugging Face's pipeline, use the following Python code:

from transformers import pipeline

# Load the model
model_name = "NYTK/PULI-HuBA130M" 

# Initialize the text generation pipeline
generator = pipeline("text-generation", model=model_name)

# Generate text with recommended parameters
output = generator(
    "Az a tény, hogy anyanyelvem magyar, és magyarul beszélek, gondolkozom, írok, életem legnagyobb eseménye, melyhez nincs fogható.",  # Example prompt in Hungarian
    max_length=156,
    do_sample=True,
    repetition_penalty=1.35,
    temperature=0.2,
    top_k=100,
    top_p=0.99,
    truncation=True
)

# Print the generated text
print(output[0]["generated_text"])

Contact

If you have any questions, please contact me: madarasz.gabor@nytud.hu-ren.hu or gabor.madarasz@gmail.com

Downloads last month
18
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for NYTK/PULI-HuBA130M

Finetuned
(8)
this model