mjwong commited on
Commit
b52fab7
1 Parent(s): 908a318

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +91 -0
README.md CHANGED
@@ -1,3 +1,94 @@
1
  ---
 
 
 
 
 
 
 
 
 
2
  license: mit
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ datasets:
3
+ - glue
4
+ - anli
5
+ model-index:
6
+ - name: gte-large-mnli-anli
7
+ results: []
8
+ pipeline_tag: zero-shot-classification
9
+ language:
10
+ - en
11
  license: mit
12
  ---
13
+ # gte-large-mnli-anli
14
+
15
+ This model is a fine-tuned version of [thenlper/gte-large](https://huggingface.co/thenlper/gte-large) on the glue and ANLI dataset.
16
+
17
+ ## Model description
18
+
19
+ [Towards General Text Embeddings with Multi-stage Contrastive Learning](https://arxiv.org/abs/2308.03281).
20
+ Zehan Li, Xin Zhang, Yanzhao Zhang, Dingkun Long, Pengjun Xie, Meishan Zhang, arXiv 2023
21
+
22
+ ## How to use the model
23
+
24
+ ### With the zero-shot classification pipeline
25
+
26
+ The model can be loaded with the `zero-shot-classification` pipeline like so:
27
+
28
+ ```python
29
+ from transformers import pipeline
30
+ classifier = pipeline("zero-shot-classification",
31
+ model="mjwong/gte-large-mnli-anli")
32
+ ```
33
+
34
+ You can then use this pipeline to classify sequences into any of the class names you specify.
35
+
36
+ ```python
37
+ sequence_to_classify = "one day I will see the world"
38
+ candidate_labels = ['travel', 'cooking', 'dancing']
39
+ classifier(sequence_to_classify, candidate_labels)
40
+ ```
41
+
42
+ If more than one candidate label can be correct, pass `multi_class=True` to calculate each class independently:
43
+
44
+ ```python
45
+ candidate_labels = ['travel', 'cooking', 'dancing', 'exploration']
46
+ classifier(sequence_to_classify, candidate_labels, multi_class=True)
47
+ ```
48
+
49
+ ### With manual PyTorch
50
+
51
+ The model can also be applied on NLI tasks like so:
52
+
53
+ ```python
54
+ import torch
55
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
56
+ # device = "cuda:0" or "cpu"
57
+ device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
58
+ model_name = "mjwong/gte-large-mnli-anli"
59
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
60
+ model = AutoModelForSequenceClassification.from_pretrained(model_name)
61
+ premise = "But I thought you'd sworn off coffee."
62
+ hypothesis = "I thought that you vowed to drink more coffee."
63
+ input = tokenizer(premise, hypothesis, truncation=True, return_tensors="pt")
64
+ output = model(input["input_ids"].to(device))
65
+ prediction = torch.softmax(output["logits"][0], -1).tolist()
66
+ label_names = ["entailment", "neutral", "contradiction"]
67
+ prediction = {name: round(float(pred) * 100, 2) for pred, name in zip(prediction, label_names)}
68
+ print(prediction)
69
+ ```
70
+
71
+ ### Eval results
72
+ The model was also evaluated using the dev sets for MultiNLI and test sets for ANLI. The metric used is accuracy.
73
+
74
+ |Datasets|mnli_dev_m|mnli_dev_mm|anli_test_r1|anli_test_r2|anli_test_r3|
75
+ | :---: | :---: | :---: | :---: | :---: | :---: |
76
+ |[gte-large-mnli-anli](https://huggingface.co/mjwong/gte-large-mnli-anli)|0.834|0.835|0.606|0.480|0.459|
77
+
78
+ ### Training hyperparameters
79
+
80
+ The following hyperparameters were used during training:
81
+
82
+ - learning_rate: 2e-05
83
+ - train_batch_size: 16
84
+ - eval_batch_size: 16
85
+ - seed: 42
86
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
87
+ - lr_scheduler_type: linear
88
+ - lr_scheduler_warmup_ratio: 0.1
89
+
90
+ ### Framework versions
91
+ - Transformers 4.28.1
92
+ - Pytorch 1.12.1+cu116
93
+ - Datasets 2.11.0
94
+ - Tokenizers 0.12.1