twitter-xlmr-clip-finetuned-all-123

This model is a fine-tuned version of cardiffnlp/twitter-xlm-roberta-base-sentiment-multilingual on the all dataset. It achieves the following results on the evaluation set:

Loss: 0.7405
Precision: 0.6431
Recall: 0.6554
F1: 0.6401

Model description

More information needed

Usage

To use the model use the following script. Kindly refer to the app.py for the Transform and VisionTextDualEncoderModel class definitions.

import torch
import torch.nn as nn

import torchvision
from torchvision.transforms import CenterCrop, ConvertImageDtype, Normalize, Resize
from torchvision.transforms.functional import InterpolationMode
from torchvision import transforms
from torchvision.io import ImageReadMode, read_image


from transformers import CLIPModel, AutoModel
from huggingface_hub import hf_hub_download
from safetensors.torch import load_model

from datasets import load_dataset, load_metric
from transformers import (
    AutoConfig,
AutoImageProcessor,
    AutoModelForSequenceClassification,
    AutoTokenizer,
    logging,
)

id2label = {0: "negative", 1: "neutral", 2: "positive"}
label2id = {"negative": 0, "neutral": 1, "positive": 2}

tokenizer = AutoTokenizer.from_pretrained("cardiffnlp/twitter-xlm-roberta-base-sentiment-multilingual")

model = VisionTextDualEncoderModel(num_classes=3)
config = model.vision_encoder.config

# https://huggingface.co/FFZG-cleopatra/M2SA/blob/main/model.safetensors
sf_filename = hf_hub_download("FFZG-cleopatra/M2SA", filename="model.safetensors")

load_model(model, sf_filename) 
image_processor = AutoImageProcessor.from_pretrained("openai/clip-vit-base-patch32")

def predict_sentiment(text, image):
    # read the image file   
    image = read_image(image, mode=ImageReadMode.RGB)
       
    text_inputs = tokenizer(
            text,
            max_length=512,
            padding="max_length",
            truncation=True,
            return_tensors="pt"
        )
    
    image_transformations = Transform(
        config.vision_config.image_size,
        image_processor.image_mean,
        image_processor.image_std,
    )
    image_transformations = torch.jit.script(image_transformations)
    pixel_values = image_transformations(image)
    text_inputs["pixel_values"] = pixel_values.unsqueeze(0)
   
    prediction = None
    with torch.no_grad():
        outputs = model(**text_inputs)
        print(outputs)
        prediction = np.argmax(outputs["logits"], axis=-1)
        print(id2label[prediction[0].item()])
    return id2label[prediction[0].item()]

text = "I feel good today"
image = "link-to-image"
predict_sentiment(text, image)

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 16
eval_batch_size: 16
seed: 123
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 50.0
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Precision	Recall	F1
0.6444	0.06	500	0.8771	0.6905	0.4537	0.4197
0.5499	0.12	1000	0.8167	0.7197	0.4260	0.4117
0.5357	0.18	1500	0.8084	0.7263	0.4696	0.4424
0.5175	0.24	2000	0.8704	0.6666	0.4266	0.3717
0.5285	0.3	2500	0.9067	0.7529	0.4565	0.4221
0.5081	0.36	3000	0.7414	0.7655	0.6114	0.6356
0.506	0.42	3500	0.8713	0.5830	0.6591	0.5786
0.5049	0.48	4000	0.7514	0.5551	0.4568	0.4464
0.4999	0.54	4500	0.7584	0.6519	0.5502	0.5767
0.507	0.6	5000	0.8072	0.6479	0.5626	0.5636
0.5048	0.66	5500	0.8080	0.6260	0.5725	0.5730
0.4907	0.72	6000	0.7966	0.6976	0.5138	0.5224
0.493	0.78	6500	0.8193	0.7099	0.4949	0.4922
0.4668	0.84	7000	0.7502	0.6282	0.6942	0.6501
0.4717	0.9	7500	0.7636	0.6372	0.5109	0.5191
0.4774	0.96	8000	0.7652	0.7513	0.5360	0.5587
0.4676	1.02	8500	0.8482	0.6372	0.5918	0.5836
0.4361	1.08	9000	0.7456	0.6687	0.5177	0.5175
0.4536	1.14	9500	0.8449	0.7363	0.5160	0.5156
0.4277	1.2	10000	0.8648	0.6382	0.5247	0.5173
0.4444	1.26	10500	0.8723	0.5871	0.6622	0.5959
0.4269	1.32	11000	0.7856	0.6151	0.5521	0.5526
0.4322	1.38	11500	0.7405	0.6431	0.6554	0.6401
0.4435	1.44	12000	0.7682	0.6568	0.5751	0.5923
0.4429	1.5	12500	0.8824	0.5956	0.6006	0.5545
0.4381	1.56	13000	0.7879	0.4457	0.4727	0.4395
0.4389	1.62	13500	0.7555	0.6260	0.6984	0.6502
0.4529	1.68	14000	0.7981	0.6621	0.5546	0.5663
0.4509	1.74	14500	0.7827	0.6160	0.6321	0.6172
0.4413	1.8	15000	0.7895	0.6381	0.6357	0.6285
0.4198	1.86	15500	0.8345	0.5940	0.5526	0.5602
0.4415	1.92	16000	0.8746	0.6615	0.6612	0.6459
0.443	1.98	16500	0.8155	0.6516	0.5265	0.5352
0.4068	2.04	17000	0.7642	0.5838	0.6220	0.5975
0.3905	2.1	17500	0.7929	0.6720	0.5555	0.5740
0.3969	2.16	18000	0.8949	0.5330	0.4771	0.4687
0.3841	2.22	18500	0.9233	0.6028	0.5410	0.5492
0.4031	2.28	19000	0.7720	0.6089	0.5719	0.5776
0.3878	2.34	19500	0.9046	0.6265	0.5358	0.5318
0.4001	2.41	20000	0.8451	0.6960	0.5622	0.5761
0.3997	2.47	20500	0.8964	0.6170	0.5665	0.5541
0.3945	2.53	21000	0.8001	0.5553	0.5180	0.5195
0.4005	2.59	21500	0.8357	0.5519	0.5100	0.5170
0.3907	2.65	22000	0.8017	0.5884	0.5409	0.5552
0.3858	2.71	22500	0.8283	0.6036	0.5792	0.5862
0.3973	2.77	23000	0.9024	0.5770	0.5665	0.5393
0.3969	2.83	23500	0.8341	0.5642	0.5528	0.5558
0.3911	2.89	24000	0.8966	0.6045	0.5088	0.5070
0.3856	2.95	24500	0.8349	0.6021	0.5586	0.5689
0.3961	3.01	25000	0.9364	0.6119	0.5412	0.5585
0.3301	3.07	25500	0.9542	0.5757	0.6084	0.5813
0.3385	3.13	26000	1.0137	0.5563	0.5294	0.5346
0.3475	3.19	26500	0.9311	0.6359	0.5675	0.5822

Framework versions

Transformers 4.38.2
Pytorch 2.2.1+cu121
Datasets 2.18.0
Tokenizers 0.15.2

FFZG-cleopatra
/

M2SA

twitter-xlmr-clip-finetuned-all-123

Model description

Usage

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for FFZG-cleopatra/M2SA

Space using FFZG-cleopatra/M2SA 1

Evaluation results