Nevo-ra (0.1B Parameter Base Model)

Model Performance & Metrics

1. Training Convergence & Loss Value

Initial Loss: ~8.5
Final Loss: 3.8
Training Steps: 45,000+ steps

During its pre-training phase on the ag_news dataset, Nevo-ra displayed a highly efficient and healthy learning curve. The loss value dropped sharply from 8.5 down to a stable 3.8, showing that the model successfully learned to predict subsequent tokens with increasing mathematical accuracy.

2. Learning Behavior (Curve Analysis)

Smooth Convergence: The training graph reflects a consistent, steady descent across all 45,000+ steps without any erratic spikes.
No Overfitting: The stable decline indicates that the model is genuinely learning linguistic patterns rather than simply memorizing the training data.

Current Capabilities & Limitations

As a Base Model (Pre-trained foundational model) with a Final Loss of 3.8, Nevo-ra exhibits specific performance characteristics:

Current Capabilities

Sentence Structure: Understands basic English grammar, syntax, and sentence formations.
Contextual Knowledge: Retains foundational knowledge regarding technology, business, and internet infrastructure due to its training data blend.

Limitations & Expected Artifacts

HTML/Text Artifacts: Because it was trained on raw web-scraped data, the model may occasionally output web artifacts like <b>...</b> tags or raw quotation marks (quot;).
Repetition: Due to its lightweight 0.1B size, it can fall into word loops or repetitive phrasing near the end of long text generations if not managed by inference parameters (such as repetition_penalty).
Instruction Following: Since this is a raw Base Model and not an Instruct/Chat model, it is designed for open-ended text completion rather than directly answering questions.

Downloads last month: 69

Safetensors

Model size

0.1B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support