dhivehi-nougat-base-text-sen

This model is a fine-tuned version of facebook/nougat-base on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0796

Model description

Finetuned dhivehi on dhivehi-img-txtsen dataset

Usage

from PIL import Image
import torch
from transformers import NougatProcessor, VisionEncoderDecoderModel
from pathlib import Path

# Load the model and processor
processor = NougatProcessor.from_pretrained("alakxender/dhivehi-nougat-base-text-sen")
model = VisionEncoderDecoderModel.from_pretrained(
    "alakxender/dhivehi-nougat-small-dv01-01",  
    torch_dtype=torch.bfloat16,                 # Optional: Load the model with BF16 data type for faster inference and lower memory usage
    attn_implementation={                       # Optional: Specify the attention kernel implementations for different parts of the model
        "decoder": "flash_attention_2",         # Use FlashAttention-2 for the decoder for improved performance
        "encoder": "eager"                      # Use the default ("eager") attention implementation for the encoder
    }
)

device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)

context_length = 128

def predict(img_path):
    # Ensure image is in RGB format
    image = Image.open(img_path).convert("RGB")  
    pixel_values = processor(image, return_tensors="pt").pixel_values.to(torch.bfloat16)

    # generate prediction
    outputs = model.generate(
        pixel_values.to(device),
        min_length=1,
        max_new_tokens=context_length,
        repetition_penalty=1.5,
        bad_words_ids=[[processor.tokenizer.unk_token_id]],
        eos_token_id=processor.tokenizer.eos_token_id,
    )

    page_sequence = processor.batch_decode(outputs, skip_special_tokens=True)[0]
    return page_sequence

print(predict("DV01-04_31.jpg"))

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 3
  • eval_batch_size: 3
  • seed: 42
  • gradient_accumulation_steps: 6
  • total_train_batch_size: 18
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • num_epochs: 100

Training results

Training Loss Epoch Step Validation Loss
5.0226 0.0383 100 0.7908
4.0593 0.0766 200 0.6369
3.6743 0.1149 300 0.5734
3.4239 0.1532 400 0.5411
3.3072 0.1915 500 0.5175
3.1666 0.2298 600 0.5048
3.0814 0.2681 700 0.4925
3.0572 0.3064 800 0.4824
2.9389 0.3447 900 0.4746
2.9756 0.3830 1000 0.4683
2.8457 0.4213 1100 0.4614
2.8612 0.4597 1200 0.4561
2.9689 0.4980 1300 0.4500
2.8069 0.5363 1400 0.4457
2.7381 0.5746 1500 0.4413
2.7011 0.6129 1600 0.4388
2.6893 0.6512 1700 0.4354
2.7628 0.6895 1800 0.4320
2.6868 0.7278 1900 0.4291
2.7244 0.7661 2000 0.4261
2.7016 0.8044 2100 0.4257
2.6166 0.8427 2200 0.4206
2.647 0.8810 2300 0.4187
2.687 0.9193 2400 0.4150
2.6376 0.9576 2500 0.4144
2.5493 0.9959 2600 0.4118
2.5871 1.0341 2700 0.4103
2.589 1.0724 2800 0.4089
2.6471 1.1107 2900 0.4061
2.5845 1.1490 3000 0.4055
2.5417 1.1873 3100 0.4050
2.4787 1.2256 3200 0.4032
2.4835 1.2639 3300 0.4002
2.4791 1.3022 3400 0.3997
2.4897 1.3405 3500 0.3970
2.5129 1.3788 3600 0.3967
2.5013 1.4171 3700 0.3950
2.4323 1.4554 3800 0.3943
2.5074 1.4937 3900 0.3929
2.4401 1.5320 4000 0.3926
2.4195 1.5704 4100 0.3913
2.4749 1.6087 4200 0.3898
2.4423 1.6470 4300 0.3894
2.5008 1.6853 4400 0.3882
2.4293 1.7236 4500 0.3866
2.3966 1.7619 4600 0.3870
2.3954 1.8002 4700 0.3850
2.4398 1.8385 4800 0.3839
2.4465 1.8768 4900 0.3833
2.4152 1.9151 5000 0.3823
2.4633 1.9534 5100 0.3815
2.3733 1.9917 5200 0.3814
2.4842 2.0299 5300 0.3794
2.3732 2.0682 5400 0.3797
2.3409 2.1065 5500 0.3789
2.3788 2.1448 5600 0.3771
2.4165 2.1831 5700 0.3757
2.3168 2.2214 5800 0.3749
2.3661 2.2597 5900 0.3742
2.3646 2.2980 6000 0.3731
2.3661 2.3363 6100 0.3730
2.3396 2.3746 6200 0.3730
2.2718 2.4129 6300 0.3712
2.3257 2.4512 6400 0.3703
2.2976 2.4895 6500 0.3692
2.2838 2.5278 6600 0.3679
2.273 2.5661 6700 0.3673
2.3019 2.6044 6800 0.3663
2.2569 2.6427 6900 0.3657
2.2991 2.6811 7000 0.3647
2.268 2.7194 7100 0.3642
2.2132 2.7577 7200 0.3630
2.3134 2.7960 7300 0.3613
2.2995 2.8343 7400 0.3598
2.289 2.8726 7500 0.3598
2.2509 2.9109 7600 0.3579
2.2367 2.9492 7700 0.3567
2.2016 2.9875 7800 0.3544
2.2573 3.0257 7900 0.3527
2.2029 3.0640 8000 0.3512
2.2087 3.1023 8100 0.3500
2.1385 3.1406 8200 0.3416
2.1084 3.1789 8300 0.3346
2.0978 3.2172 8400 0.3258
2.0254 3.2555 8500 0.3159
1.9649 3.2938 8600 0.3021
1.8909 3.3321 8700 0.2877
1.8284 3.3704 8800 0.2721
1.7419 3.4087 8900 0.2612
1.6687 3.4470 9000 0.2510
1.6713 3.4853 9100 0.2406
1.5075 3.5236 9200 0.2314
1.558 3.5619 9300 0.2251
1.5508 3.6002 9400 0.2155
1.4222 3.6385 9500 0.2093
1.4103 3.6768 9600 0.2016
1.2759 3.7151 9700 0.1936
1.3577 3.7534 9800 0.1888
1.2245 3.7918 9900 0.1833
1.3226 3.8301 10000 0.1776
1.2007 3.8684 10100 0.1743
1.1289 3.9067 10200 0.1693
1.1646 3.9450 10300 0.1659
1.1498 3.9833 10400 0.1619
1.1152 4.0215 10500 0.1588
1.0254 4.0598 10600 0.1558
1.0719 4.0981 10700 0.1527
1.103 4.1364 10800 0.1502
1.1307 4.1747 10900 0.1474
1.0523 4.2130 11000 0.1445
0.9377 4.2513 11100 0.1427
1.0505 4.2896 11200 0.1399
0.9646 4.3279 11300 0.1382
0.9571 4.3662 11400 0.1366
0.9693 4.4045 11500 0.1343
0.9362 4.4428 11600 0.1325
0.9162 4.4811 11700 0.1319
0.9699 4.5194 11800 0.1299
0.9275 4.5577 11900 0.1291
0.8864 4.5960 12000 0.1271
0.9603 4.6343 12100 0.1263
0.9842 4.6726 12200 0.1244
0.8629 4.7109 12300 0.1231
0.9338 4.7492 12400 0.1234
0.8358 4.7875 12500 0.1210
0.7986 4.8258 12600 0.1196
0.8606 4.8641 12700 0.1188
0.801 4.9025 12800 0.1180
0.8723 4.9408 12900 0.1166
0.8224 4.9791 13000 0.1167
0.7655 5.0172 13100 0.1144
0.89 5.0555 13200 0.1139
0.7515 5.0938 13300 0.1131
0.8617 5.1322 13400 0.1129
0.8763 5.1705 13500 0.1119
0.8394 5.2088 13600 0.1104
0.8494 5.2471 13700 0.1097
0.7357 5.2854 13800 0.1090
0.78 5.3237 13900 0.1080
0.7955 5.3620 14000 0.1080
0.8194 5.4003 14100 0.1070
0.8297 5.4386 14200 0.1069
0.697 5.4769 14300 0.1057
0.8037 5.5152 14400 0.1051
0.7782 5.5535 14500 0.1047
0.7672 5.5918 14600 0.1037
0.7789 5.6301 14700 0.1031
0.7292 5.6684 14800 0.1035
0.8318 5.7067 14900 0.1019
0.6917 5.7450 15000 0.1016
0.7711 5.7833 15100 0.1009
0.718 5.8216 15200 0.1003
0.8245 5.8599 15300 0.1010
0.7005 5.8982 15400 0.0995
0.7685 5.9365 15500 0.0991
0.6955 5.9748 15600 0.0988
0.6962 6.0130 15700 0.0981
0.6917 6.0513 15800 0.0974
0.8487 6.0896 15900 0.0972
0.6653 6.1279 16000 0.0970
0.7476 6.1662 16100 0.0966
0.682 6.2045 16200 0.0960
0.6858 6.2428 16300 0.0958
0.696 6.2812 16400 0.0948
0.7115 6.3195 16500 0.0949
0.7388 6.3578 16600 0.0942
0.6637 6.3961 16700 0.0937
0.7032 6.4344 16800 0.0934
0.6581 6.4727 16900 0.0931
0.6609 6.5110 17000 0.0930
0.6724 6.5493 17100 0.0921
0.629 6.5876 17200 0.0915
0.682 6.6259 17300 0.0914
0.7201 6.6642 17400 0.0914
0.5541 6.7025 17500 0.0914
0.6999 6.7408 17600 0.0903
0.6552 6.7791 17700 0.0906
0.6613 6.8174 17800 0.0897
0.7954 6.8557 17900 0.0894
0.6358 6.8940 18000 0.0890
0.665 6.9323 18100 0.0890
0.6274 6.9706 18200 0.0884
0.6558 7.0088 18300 0.0880
0.6541 7.0471 18400 0.0883
0.6568 7.0854 18500 0.0877
0.6677 7.1237 18600 0.0873
0.7305 7.1620 18700 0.0871
0.6118 7.2003 18800 0.0872
0.5958 7.2386 18900 0.0865
0.6912 7.2769 19000 0.0862
0.5643 7.3152 19100 0.0859
0.6254 7.3535 19200 0.0856
0.6773 7.3919 19300 0.0854
0.7044 7.4302 19400 0.0848
0.5636 7.4685 19500 0.0847
0.5932 7.5068 19600 0.0848
0.566 7.5451 19700 0.0846
0.6553 7.5834 19800 0.0843
0.5729 7.6217 19900 0.0841
0.6147 7.6600 20000 0.0836
0.6125 7.6983 20100 0.0831
0.5793 7.7366 20200 0.0832
0.6042 7.7749 20300 0.0832
0.604 7.8132 20400 0.0827
0.5963 7.8515 20500 0.0826
0.5757 7.8898 20600 0.0826
0.6194 7.9281 20700 0.0821
0.5528 7.9664 20800 0.0817
0.7031 8.0046 20900 0.0817
0.5997 8.0429 21000 0.0816
0.5876 8.0812 21100 0.0814
0.5757 8.1195 21200 0.0811
0.6033 8.1578 21300 0.0814
0.5738 8.1961 21400 0.0807
0.6308 8.2344 21500 0.0807
0.5583 8.2727 21600 0.0809
0.6401 8.3110 21700 0.0804
0.5611 8.3493 21800 0.0803
0.5526 8.3876 21900 0.0799
0.5877 8.4259 22000 0.0796
0.6311 8.4642 22100 0.0793
0.556 8.5026 22200 0.0799
0.5976 8.5409 22300 0.0794
0.5851 8.5792 22400 0.0796

Framework versions

  • Transformers 4.47.0
  • Pytorch 2.6.0+cu124
  • Datasets 3.2.0
  • Tokenizers 0.21.0
Downloads last month
2
Safetensors
Model size
349M params
Tensor type
I64
·
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for alakxender/dhivehi-nougat-base-text-sen

Finetuned
(17)
this model

Dataset used to train alakxender/dhivehi-nougat-base-text-sen