YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Nevo-ra

A Lightweight, High-Performance 100M Custom Language Model

Hugging Face Repo | Training Graph | Google Colab


Nevo-ra (0.1B Parameter Base Model)

Nevo-ra Training Performance

Model Performance & Metrics

1. Training Convergence & Loss Value

  • Initial Loss: ~8.5
  • Final Loss: 3.8
  • Training Steps: 45,000+ steps

During its pre-training phase on the ag_news dataset, Nevo-ra displayed a highly efficient and healthy learning curve. The loss value dropped sharply from 8.5 down to a stable 3.8, showing that the model successfully learned to predict subsequent tokens with increasing mathematical accuracy.

2. Learning Behavior (Curve Analysis)

  • Smooth Convergence: The training graph reflects a consistent, steady descent across all 45,000+ steps without any erratic spikes.
  • No Overfitting: The stable decline indicates that the model is genuinely learning linguistic patterns rather than simply memorizing the training data.

Current Capabilities & Limitations

As a Base Model (Pre-trained foundational model) with a Final Loss of 3.8, Nevo-ra exhibits specific performance characteristics:

Current Capabilities

  • Sentence Structure: Understands basic English grammar, syntax, and sentence formations.
  • Contextual Knowledge: Retains foundational knowledge regarding technology, business, and internet infrastructure due to its training data blend.

Limitations & Expected Artifacts

  • HTML/Text Artifacts: Because it was trained on raw web-scraped data, the model may occasionally output web artifacts like <b>...</b> tags or raw quotation marks (quot;).
  • Repetition: Due to its lightweight 0.1B size, it can fall into word loops or repetitive phrasing near the end of long text generations if not managed by inference parameters (such as repetition_penalty).
  • Instruction Following: Since this is a raw Base Model and not an Instruct/Chat model, it is designed for open-ended text completion rather than directly answering questions.
Downloads last month
69
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support