mjwong
/

contriever-mnli

Zero-Shot Classification

text-classification

Inference Endpoints

Model card Files Files and versions Community

contriever-mnli / README.md

mjwong's picture

Update README.md

937bf15 12 months ago

|

No virus

3.18 kB

	---
	datasets:
	- glue
	model-index:
	- name: contriever-mnli
	results: []
	pipeline_tag: zero-shot-classification
	language:
	- en
	license: mit
	---

	# contriever-mnli

	This model is a fine-tuned version of [facebook/contriever](https://huggingface.co/facebook/contriever) on the glue dataset.

	## Model description

	[Unsupervised Dense Information Retrieval with Contrastive Learning](https://arxiv.org/abs/2112.09118).
	Gautier Izacard, Mathilde Caron, Lucas Hosseini, Sebastian Riedel, Piotr Bojanowski, Armand Joulin, Edouard Grave, arXiv 2021

	## How to use the model

	### With the zero-shot classification pipeline

	The model can be loaded with the `zero-shot-classification` pipeline like so:

	```python
	from transformers import pipeline
	classifier = pipeline("zero-shot-classification",
	model="mjwong/contriever-mnli")
	```

	You can then use this pipeline to classify sequences into any of the class names you specify.

	```python
	sequence_to_classify = "one day I will see the world"
	candidate_labels = ['travel', 'cooking', 'dancing']
	classifier(sequence_to_classify, candidate_labels)
	```

	If more than one candidate label can be correct, pass `multi_class=True` to calculate each class independently:

	```python
	candidate_labels = ['travel', 'cooking', 'dancing', 'exploration']
	classifier(sequence_to_classify, candidate_labels, multi_class=True)
	```

	### With manual PyTorch

	The model can also be applied on NLI tasks like so:

	```python
	import torch
	from transformers import AutoTokenizer, AutoModelForSequenceClassification

	# device = "cuda:0" or "cpu"
	device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")

	model_name = "mjwong/contriever-mnli"
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForSequenceClassification.from_pretrained(model_name)

	premise = "But I thought you'd sworn off coffee."
	hypothesis = "I thought that you vowed to drink more coffee."

	input = tokenizer(premise, hypothesis, truncation=True, return_tensors="pt")
	output = model(input["input_ids"].to(device))
	prediction = torch.softmax(output["logits"][0], -1).tolist()
	label_names = ["entailment", "neutral", "contradiction"]
	prediction = {name: round(float(pred) * 100, 2) for pred, name in zip(prediction, label_names)}
	print(prediction)
	```

	### Eval results
	The model was evaluated using the dev sets for MultiNLI and test sets for ANLI. The metric used is accuracy.

	\|Datasets\|mnli_dev_m\|mnli_dev_mm\|anli_test_r1\|anli_test_r2\|anli_test_r3\|
	\| :---: \| :---: \| :---: \| :---: \| :---: \| :---: \|
	\|[contriever-mnli](https://huggingface.co/mjwong/contriever-mnli)\|0.821\|0.822\|0.247\|0.281\|0.312\|
	\|[contriever-msmarco-mnli](https://huggingface.co/mjwong/contriever-msmarco-mnli)\|0.820\|0.819\|0.244\|0.296\|0.306\|

	### Training hyperparameters

	The following hyperparameters were used during training:

	- learning_rate: 2e-05
	- train_batch_size: 16
	- eval_batch_size: 16
	- seed: 42
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_ratio: 0.1
	- num_epochs: 5

	### Framework versions
	- Transformers 4.28.1
	- Pytorch 1.12.1+cu116
	- Datasets 2.11.0
	- Tokenizers 0.12.1