Edit model card

capybara_finetuned_results3

This model is a fine-tuned version of Qwen/Qwen2.5-0.5B on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 5.6542

video demo : (its pretty bad)

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0002
  • train_batch_size: 1
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 5
  • training_steps: 800

Training results

Training Loss Epoch Step Validation Loss
15.5311 0.0230 50 14.5422
8.7477 0.0460 100 9.2952
7.3554 0.0690 150 7.1992
6.828 0.0920 200 6.7258
6.4694 0.1150 250 6.3597
6.3401 0.1381 300 6.1703
6.1256 0.1611 350 6.0395
6.0372 0.1841 400 5.9271
6.0221 0.2071 450 5.8464
5.8783 0.2301 500 5.7810
5.8339 0.2531 550 5.7335
5.8546 0.2761 600 5.6904
5.9169 0.2991 650 5.6690
5.7959 0.3221 700 5.6565
5.7271 0.3451 750 5.6543
5.8734 0.3682 800 5.6542

Framework versions

  • Transformers 4.44.2
  • Pytorch 2.4.0
  • Datasets 3.0.0
  • Tokenizers 0.19.1
Downloads last month
11
Safetensors
Model size
494M params
Tensor type
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for archit11/qwen_worldmodel

Base model

Qwen/Qwen2.5-0.5B
Quantized
(29)
this model

Dataset used to train archit11/qwen_worldmodel