Edit model card

SetFit with BAAI/bge-small-en-v1.5

This is a SetFit model that can be used for Text Classification. This SetFit model uses BAAI/bge-small-en-v1.5 as the Sentence Transformer embedding model. A LogisticRegression instance is used for classification.

The model has been trained using an efficient few-shot learning technique that involves:

  1. Fine-tuning a Sentence Transformer with contrastive learning.
  2. Training a classification head with features from the fine-tuned Sentence Transformer.

Model Details

Model Description

Model Sources

Model Labels

Label Examples
0
  • '_\ni.\nSe\nNew\n~~\ned\nTy\nSw\nNe\nNw\ned\n2:\n\n \n\x0c'
  • 'ne.\n\n \n \n\n \n \n\n \n\nbBo fy20 5 ‘ )\n- wi Pas BOOKING STATION\nstat” SURAT GEIS TRA: BSPORT HOT. LTE, DIMPLE COURT, 2ND FLOOR,\n H.O.: “VIRAJ IMPEX HOUSE”, 47, D' M= -toRoaD, + AT_OW. ER’S RISK oer\n' , a” MUMBAI - 400 009 Tel. : 4076 7676 sianan Gece i al CARGO iS INSUR BY CUSTOMER — PH, : 033-30821697, 22\n{ 1. Consignor’s Name & Address As. ExOme peas Br. Code\ndT ncuer\n
1
  • "Posatis ils. H\n\n \n\niS\nvs\na (uf\n\noe\n\n \n\n-\n\n \n\nSarichor Pls: q\n\nPea :\n\nITEM /\n\n1. Description/ Received Reject Delivered Retur\n\n \n \n\nSPARE TX. Phat\n\n(MARKETED BY MESAPD\n\nPact eta\n\n \n\nMATERIAL RECEIPT REPORT\n\n \n\n \n \n \n\n \n\nCUM nea\n\n00 LeTlooo 0.000\n\nPAS\n\n \n \n\nELT\n\nJUPLICATE FOR TRANSPORTE?-\nOGPY (EMGISE INVOICE) RECEIVED\n\nMite ariant Eee\n\nPRAM MUIMAFE RCL RE\n\n \n\n \n\nFrys\n\n \n\not\n\nSuds oT\n\n \n \n\npeas\n\nee ase\n\n. Tax Gelig\n\nGrand Tooke\n\ni\n\nRM\n\nRate/Unit\n\nMRR SUBMITTED\nwv\n\nITH PARTY'S INVIGCE\n\nEET RY MO SSO OT Soe ELS\n\nLS.\n\n \n\n \n\n \n\nWee\n\n7; Ae 18\n\nTrcic\n\ni\nSu\n\n~s\n\n“en\n\nnny\n\x0c"
  • "«= ITER /\ncit BDescription/ Received\n\nms\n\n \n \n\n \n\nIces\n\ne to\n\ntea tae\n\nhoimeryh bea\n\nPorccheninernyh Qerkees\n\nRican dec\n\nrarer:\n\nPAD RP eAR eR\n\nMeare\n\n \n\nMATERIAL RECEIPT\n\n \n\nREPORT\n\n \n\nwe ie 7\nhe\n\nSeba.\nbh ETS\n\n \n\nReject Delivered Retur\n\nTESLA y’\n\n \n\n \n\n \n\nLF PIE\n\nTAIT a\n\nSUPLICATE FOR TRANSPORTER\nOGPY (EXGISE INVOICE) RECEIVED\n\noy\n\nf\n\n“soarewe Pk Beak\nree\n\nRAF

Evaluation

Metrics

Label Accuracy
all 1.0

Uses

Direct Use for Inference

First install the SetFit library:

pip install setfit

Then you can load this model and run inference.

from setfit import SetFitModel

# Download from the 🤗 Hub
model = SetFitModel.from_pretrained("Gopal2002/Material_Receipt_Report_ZEON")
# Run inference
preds = model("SOT Ue

 

         

oH

| ia

I
od

Hi

a

|
To) Sig Pere
a

al |g
&%
5)

wS\
eB
SB
“5
“O
S
€X

Bea

em

Pe eS

se aE a

4 |] | tat [ety

tt pe Ta
&
a

OK

¢

SRLS ia Leh coe

 
 
")

Training Details

Training Set Metrics

Training set Min Median Max
Word count 1 182.1336 1108
Label Training Sample Count
0 202
1 45

Training Hyperparameters

  • batch_size: (32, 32)
  • num_epochs: (2, 2)
  • max_steps: -1
  • sampling_strategy: oversampling
  • body_learning_rate: (2e-05, 1e-05)
  • head_learning_rate: 0.01
  • loss: CosineSimilarityLoss
  • distance_metric: cosine_distance
  • margin: 0.25
  • end_to_end: False
  • use_amp: False
  • warmup_proportion: 0.1
  • seed: 42
  • eval_max_steps: -1
  • load_best_model_at_end: False

Training Results

Epoch Step Training Loss Validation Loss
0.0007 1 0.2952 -
0.0371 50 0.2253 -
0.0742 100 0.1234 -
0.1114 150 0.0115 -
0.1485 200 0.0036 -
0.1856 250 0.0024 -
0.2227 300 0.0015 -
0.2598 350 0.0011 -
0.2970 400 0.0009 -
0.3341 450 0.0007 -
0.3712 500 0.0011 -
0.4083 550 0.0008 -
0.4454 600 0.0008 -
0.4826 650 0.0007 -
0.5197 700 0.0005 -
0.5568 750 0.0006 -
0.5939 800 0.0005 -
0.6310 850 0.0005 -
0.6682 900 0.0004 -
0.7053 950 0.0003 -
0.7424 1000 0.0004 -
0.7795 1050 0.0005 -
0.8166 1100 0.0004 -
0.8537 1150 0.0004 -
0.8909 1200 0.0005 -
0.9280 1250 0.0004 -
0.9651 1300 0.0003 -
1.0022 1350 0.0003 -
1.0393 1400 0.0003 -
1.0765 1450 0.0004 -
1.1136 1500 0.0003 -
1.1507 1550 0.0004 -
1.1878 1600 0.0004 -
1.2249 1650 0.0004 -
1.2621 1700 0.0003 -
1.2992 1750 0.0003 -
1.3363 1800 0.0003 -
1.3734 1850 0.0003 -
1.4105 1900 0.0003 -
1.4477 1950 0.0002 -
1.4848 2000 0.0003 -
1.5219 2050 0.0003 -
1.5590 2100 0.0003 -
1.5961 2150 0.0002 -
1.6333 2200 0.0003 -
1.6704 2250 0.0004 -
1.7075 2300 0.0004 -
1.7446 2350 0.0003 -
1.7817 2400 0.0002 -
1.8189 2450 0.0002 -
1.8560 2500 0.0003 -
1.8931 2550 0.0002 -
1.9302 2600 0.0003 -
1.9673 2650 0.0003 -

Framework Versions

  • Python: 3.10.12
  • SetFit: 1.0.3
  • Sentence Transformers: 2.2.2
  • Transformers: 4.35.2
  • PyTorch: 2.1.0+cu121
  • Datasets: 2.16.1
  • Tokenizers: 0.15.0

Citation

BibTeX

@article{https://doi.org/10.48550/arxiv.2209.11055,
    doi = {10.48550/ARXIV.2209.11055},
    url = {https://arxiv.org/abs/2209.11055},
    author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
    keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
    title = {Efficient Few-Shot Learning Without Prompts},
    publisher = {arXiv},
    year = {2022},
    copyright = {Creative Commons Attribution 4.0 International}
}
Downloads last month
8
Safetensors
Model size
33.4M params
Tensor type
F32
·

Finetuned from

Evaluation results