Edit model card

llm2vec-croissant-mntp

This model is a fine-tuned version of croissantllm/CroissantCool-v0.2 on asi/wikitext_fr. It achieves the following results on the evaluation set:

  • Loss: 1.8867
  • Accuracy: 0.6078

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 32
  • eval_batch_size: 32
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 3.0

Training results

Training Loss Epoch Step Validation Loss Accuracy
No log 0.0884 100 4.7866 0.1990
No log 0.1768 200 4.0496 0.3309
No log 0.2653 300 3.6525 0.3779
No log 0.3537 400 3.2410 0.4258
3.9116 0.4421 500 3.6305 0.3912
3.9116 0.5305 600 3.1770 0.4406
3.9116 0.6189 700 2.4478 0.5199
3.9116 0.7073 800 2.2383 0.5508
3.9116 0.7958 900 2.1547 0.5635
2.4568 0.8842 1000 2.0868 0.5759
2.4568 0.9726 1100 2.0399 0.5820
2.4568 1.0610 1200 2.0102 0.5873
2.4568 1.1494 1300 1.9805 0.5897
2.4568 1.2378 1400 1.9590 0.5955
1.9305 1.3263 1500 1.9381 0.5982
1.9305 1.4147 1600 1.9249 0.5995
1.9305 1.5031 1700 1.9223 0.6017
1.9305 1.5915 1800 1.9091 0.6037
1.9305 1.6799 1900 1.9038 0.6042
1.8511 1.7683 2000 1.8982 0.6045
1.8511 1.8568 2100 1.8924 0.6060
1.8511 1.9452 2200 1.8844 0.6072
1.8511 2.0336 2300 1.8873 0.6087
1.8511 2.1220 2400 1.8889 0.6068
1.8197 2.2104 2500 1.8848 0.6080
1.8197 2.2989 2600 1.8736 0.6091
1.8197 2.3873 2700 1.8858 0.6072
1.8197 2.4757 2800 1.8814 0.6088
1.8197 2.5641 2900 1.8649 0.6103
1.8116 2.6525 3000 1.8647 0.6091
1.8116 2.7409 3100 1.8755 0.6101
1.8116 2.8294 3200 1.8755 0.6099
1.8116 2.9178 3300 1.8867 0.6078

Framework versions

  • Transformers 4.40.2
  • Pytorch 2.0.1+cu118
  • Datasets 2.19.1
  • Tokenizers 0.19.1
Downloads last month
3,578
Safetensors
Model size
1.28B params
Tensor type
BF16
·
Unable to determine this model’s pipeline type. Check the docs .

Finetuned from

Dataset used to train AdrienB134/llm2vec-croissant-mntp

Evaluation results