dhivehi-nougat-base-text-sen

This model is a fine-tuned version of facebook/nougat-base on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.0796

Model description

Finetuned dhivehi on dhivehi-img-txtsen dataset

Usage

from PIL import Image
import torch
from transformers import NougatProcessor, VisionEncoderDecoderModel
from pathlib import Path

# Load the model and processor
processor = NougatProcessor.from_pretrained("alakxender/dhivehi-nougat-base-text-sen")
model = VisionEncoderDecoderModel.from_pretrained(
    "alakxender/dhivehi-nougat-small-dv01-01",  
    torch_dtype=torch.bfloat16,                 # Optional: Load the model with BF16 data type for faster inference and lower memory usage
    attn_implementation={                       # Optional: Specify the attention kernel implementations for different parts of the model
        "decoder": "flash_attention_2",         # Use FlashAttention-2 for the decoder for improved performance
        "encoder": "eager"                      # Use the default ("eager") attention implementation for the encoder
    }
)

device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)

context_length = 128

def predict(img_path):
    # Ensure image is in RGB format
    image = Image.open(img_path).convert("RGB")  
    pixel_values = processor(image, return_tensors="pt").pixel_values.to(torch.bfloat16)

    # generate prediction
    outputs = model.generate(
        pixel_values.to(device),
        min_length=1,
        max_new_tokens=context_length,
        repetition_penalty=1.5,
        bad_words_ids=[[processor.tokenizer.unk_token_id]],
        eos_token_id=processor.tokenizer.eos_token_id,
    )

    page_sequence = processor.batch_decode(outputs, skip_special_tokens=True)[0]
    return page_sequence

print(predict("DV01-04_31.jpg"))

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 3
eval_batch_size: 3
seed: 42
gradient_accumulation_steps: 6
total_train_batch_size: 18
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
num_epochs: 100

Training results

Training Loss	Epoch	Step	Validation Loss
5.0226	0.0383	100	0.7908
4.0593	0.0766	200	0.6369
3.6743	0.1149	300	0.5734
3.4239	0.1532	400	0.5411
3.3072	0.1915	500	0.5175
3.1666	0.2298	600	0.5048
3.0814	0.2681	700	0.4925
3.0572	0.3064	800	0.4824
2.9389	0.3447	900	0.4746
2.9756	0.3830	1000	0.4683
2.8457	0.4213	1100	0.4614
2.8612	0.4597	1200	0.4561
2.9689	0.4980	1300	0.4500
2.8069	0.5363	1400	0.4457
2.7381	0.5746	1500	0.4413
2.7011	0.6129	1600	0.4388
2.6893	0.6512	1700	0.4354
2.7628	0.6895	1800	0.4320
2.6868	0.7278	1900	0.4291
2.7244	0.7661	2000	0.4261
2.7016	0.8044	2100	0.4257
2.6166	0.8427	2200	0.4206
2.647	0.8810	2300	0.4187
2.687	0.9193	2400	0.4150
2.6376	0.9576	2500	0.4144
2.5493	0.9959	2600	0.4118
2.5871	1.0341	2700	0.4103
2.589	1.0724	2800	0.4089
2.6471	1.1107	2900	0.4061
2.5845	1.1490	3000	0.4055
2.5417	1.1873	3100	0.4050
2.4787	1.2256	3200	0.4032
2.4835	1.2639	3300	0.4002
2.4791	1.3022	3400	0.3997
2.4897	1.3405	3500	0.3970
2.5129	1.3788	3600	0.3967
2.5013	1.4171	3700	0.3950
2.4323	1.4554	3800	0.3943
2.5074	1.4937	3900	0.3929
2.4401	1.5320	4000	0.3926
2.4195	1.5704	4100	0.3913
2.4749	1.6087	4200	0.3898
2.4423	1.6470	4300	0.3894
2.5008	1.6853	4400	0.3882
2.4293	1.7236	4500	0.3866
2.3966	1.7619	4600	0.3870
2.3954	1.8002	4700	0.3850
2.4398	1.8385	4800	0.3839
2.4465	1.8768	4900	0.3833
2.4152	1.9151	5000	0.3823
2.4633	1.9534	5100	0.3815
2.3733	1.9917	5200	0.3814
2.4842	2.0299	5300	0.3794
2.3732	2.0682	5400	0.3797
2.3409	2.1065	5500	0.3789
2.3788	2.1448	5600	0.3771
2.4165	2.1831	5700	0.3757
2.3168	2.2214	5800	0.3749
2.3661	2.2597	5900	0.3742
2.3646	2.2980	6000	0.3731
2.3661	2.3363	6100	0.3730
2.3396	2.3746	6200	0.3730
2.2718	2.4129	6300	0.3712
2.3257	2.4512	6400	0.3703
2.2976	2.4895	6500	0.3692
2.2838	2.5278	6600	0.3679
2.273	2.5661	6700	0.3673
2.3019	2.6044	6800	0.3663
2.2569	2.6427	6900	0.3657
2.2991	2.6811	7000	0.3647
2.268	2.7194	7100	0.3642
2.2132	2.7577	7200	0.3630
2.3134	2.7960	7300	0.3613
2.2995	2.8343	7400	0.3598
2.289	2.8726	7500	0.3598
2.2509	2.9109	7600	0.3579
2.2367	2.9492	7700	0.3567
2.2016	2.9875	7800	0.3544
2.2573	3.0257	7900	0.3527
2.2029	3.0640	8000	0.3512
2.2087	3.1023	8100	0.3500
2.1385	3.1406	8200	0.3416
2.1084	3.1789	8300	0.3346
2.0978	3.2172	8400	0.3258
2.0254	3.2555	8500	0.3159
1.9649	3.2938	8600	0.3021
1.8909	3.3321	8700	0.2877
1.8284	3.3704	8800	0.2721
1.7419	3.4087	8900	0.2612
1.6687	3.4470	9000	0.2510
1.6713	3.4853	9100	0.2406
1.5075	3.5236	9200	0.2314
1.558	3.5619	9300	0.2251
1.5508	3.6002	9400	0.2155
1.4222	3.6385	9500	0.2093
1.4103	3.6768	9600	0.2016
1.2759	3.7151	9700	0.1936
1.3577	3.7534	9800	0.1888
1.2245	3.7918	9900	0.1833
1.3226	3.8301	10000	0.1776
1.2007	3.8684	10100	0.1743
1.1289	3.9067	10200	0.1693
1.1646	3.9450	10300	0.1659
1.1498	3.9833	10400	0.1619
1.1152	4.0215	10500	0.1588
1.0254	4.0598	10600	0.1558
1.0719	4.0981	10700	0.1527
1.103	4.1364	10800	0.1502
1.1307	4.1747	10900	0.1474
1.0523	4.2130	11000	0.1445
0.9377	4.2513	11100	0.1427
1.0505	4.2896	11200	0.1399
0.9646	4.3279	11300	0.1382
0.9571	4.3662	11400	0.1366
0.9693	4.4045	11500	0.1343
0.9362	4.4428	11600	0.1325
0.9162	4.4811	11700	0.1319
0.9699	4.5194	11800	0.1299
0.9275	4.5577	11900	0.1291
0.8864	4.5960	12000	0.1271
0.9603	4.6343	12100	0.1263
0.9842	4.6726	12200	0.1244
0.8629	4.7109	12300	0.1231
0.9338	4.7492	12400	0.1234
0.8358	4.7875	12500	0.1210
0.7986	4.8258	12600	0.1196
0.8606	4.8641	12700	0.1188
0.801	4.9025	12800	0.1180
0.8723	4.9408	12900	0.1166
0.8224	4.9791	13000	0.1167
0.7655	5.0172	13100	0.1144
0.89	5.0555	13200	0.1139
0.7515	5.0938	13300	0.1131
0.8617	5.1322	13400	0.1129
0.8763	5.1705	13500	0.1119
0.8394	5.2088	13600	0.1104
0.8494	5.2471	13700	0.1097
0.7357	5.2854	13800	0.1090
0.78	5.3237	13900	0.1080
0.7955	5.3620	14000	0.1080
0.8194	5.4003	14100	0.1070
0.8297	5.4386	14200	0.1069
0.697	5.4769	14300	0.1057
0.8037	5.5152	14400	0.1051
0.7782	5.5535	14500	0.1047
0.7672	5.5918	14600	0.1037
0.7789	5.6301	14700	0.1031
0.7292	5.6684	14800	0.1035
0.8318	5.7067	14900	0.1019
0.6917	5.7450	15000	0.1016
0.7711	5.7833	15100	0.1009
0.718	5.8216	15200	0.1003
0.8245	5.8599	15300	0.1010
0.7005	5.8982	15400	0.0995
0.7685	5.9365	15500	0.0991
0.6955	5.9748	15600	0.0988
0.6962	6.0130	15700	0.0981
0.6917	6.0513	15800	0.0974
0.8487	6.0896	15900	0.0972
0.6653	6.1279	16000	0.0970
0.7476	6.1662	16100	0.0966
0.682	6.2045	16200	0.0960
0.6858	6.2428	16300	0.0958
0.696	6.2812	16400	0.0948
0.7115	6.3195	16500	0.0949
0.7388	6.3578	16600	0.0942
0.6637	6.3961	16700	0.0937
0.7032	6.4344	16800	0.0934
0.6581	6.4727	16900	0.0931
0.6609	6.5110	17000	0.0930
0.6724	6.5493	17100	0.0921
0.629	6.5876	17200	0.0915
0.682	6.6259	17300	0.0914
0.7201	6.6642	17400	0.0914
0.5541	6.7025	17500	0.0914
0.6999	6.7408	17600	0.0903
0.6552	6.7791	17700	0.0906
0.6613	6.8174	17800	0.0897
0.7954	6.8557	17900	0.0894
0.6358	6.8940	18000	0.0890
0.665	6.9323	18100	0.0890
0.6274	6.9706	18200	0.0884
0.6558	7.0088	18300	0.0880
0.6541	7.0471	18400	0.0883
0.6568	7.0854	18500	0.0877
0.6677	7.1237	18600	0.0873
0.7305	7.1620	18700	0.0871
0.6118	7.2003	18800	0.0872
0.5958	7.2386	18900	0.0865
0.6912	7.2769	19000	0.0862
0.5643	7.3152	19100	0.0859
0.6254	7.3535	19200	0.0856
0.6773	7.3919	19300	0.0854
0.7044	7.4302	19400	0.0848
0.5636	7.4685	19500	0.0847
0.5932	7.5068	19600	0.0848
0.566	7.5451	19700	0.0846
0.6553	7.5834	19800	0.0843
0.5729	7.6217	19900	0.0841
0.6147	7.6600	20000	0.0836
0.6125	7.6983	20100	0.0831
0.5793	7.7366	20200	0.0832
0.6042	7.7749	20300	0.0832
0.604	7.8132	20400	0.0827
0.5963	7.8515	20500	0.0826
0.5757	7.8898	20600	0.0826
0.6194	7.9281	20700	0.0821
0.5528	7.9664	20800	0.0817
0.7031	8.0046	20900	0.0817
0.5997	8.0429	21000	0.0816
0.5876	8.0812	21100	0.0814
0.5757	8.1195	21200	0.0811
0.6033	8.1578	21300	0.0814
0.5738	8.1961	21400	0.0807
0.6308	8.2344	21500	0.0807
0.5583	8.2727	21600	0.0809
0.6401	8.3110	21700	0.0804
0.5611	8.3493	21800	0.0803
0.5526	8.3876	21900	0.0799
0.5877	8.4259	22000	0.0796
0.6311	8.4642	22100	0.0793
0.556	8.5026	22200	0.0799
0.5976	8.5409	22300	0.0794
0.5851	8.5792	22400	0.0796

Framework versions

Transformers 4.47.0
Pytorch 2.6.0+cu124
Datasets 3.2.0
Tokenizers 0.21.0

alakxender
/

dhivehi-nougat-base-text-sen