facebook
/

tart-full-flan-t5-xl

Text Classification

Inference Endpoints

text-generation-inference

Model card Files Files and versions Community

akariasai commited on Dec 21, 2022

Commit

0219ce6

•

1 Parent(s): 3beffbb

Update README.md

Files changed (1) hide show

README.md +46 -3

README.md CHANGED Viewed

@@ -1,3 +1,46 @@
----
-license: apache-2.0
----

+## facebook/tart-full-flan-t5-xl
+`facebook/tart-full-flan-t5-xl` is a multi-task cross-encoder model trained via instruction-tuning on approximately 40 retrieval tasks, initialized with [google/flan-t5-xl](https://huggingface.co/google/flan-t5-xl).
+### Installation
+```
+git clone https://github.com/facebookresearch/tart
+pip install -r requirements.txt
+cd tart/TART
+```
+TART-full can be loaded through our customized EncT5 model.
+```python
+from src.modeling_enc_t5 import EncT5ForSequenceClassification
+from src.tokenization_enc_t5 import EncT5Tokenizer
+import torch
+import torch.nn.functional as F
+# load TART full and tokenizer
+model = EncT5ForSequenceClassification.from_pretrained("tart_full_flan_t5_xl")
+tokenizer =  EncT5Tokenizer.from_pretrained("tart_full_flan_t5_xl")
+model.eval()
+q = "What is the population of Tokyo?"
+in_answer = "retrieve a passage that answers this question from Wikipedia"
+p_1 = "The population of Japan's capital, Tokyo, dropped by about 48,600 people to just under 14 million at the start of 2022."
+p_2 = "Tokyo, officially the Tokyo Metropolis (東京都, Tōkyō-to), is the capital and largest city of Japan."
+# 1. TART-full can identify more relevant paragraph.
+features = tokenizer(['{0} [SEP] {1}'.format(in_answer, q), '{0} [SEP] {1}'.format(in_answer, q)], [p_1, p_2], padding=True, truncation=True, return_tensors="pt")
+with torch.no_grad():
+    scores = model(**features).logits
+    normalized_scores = [float(score[1]) for score in F.softmax(scores, dim=1)]
+print([p_1, p_2]np.argmax(normalized_scores)) # "The population of Japan's capital, Tokyo, dropped by about 48,600 people to just under 14 million."
+# 2. TART-full can identify the document that is more relevant AND follows instructions.
+in_sim = "You need to find duplicated questions in Wiki forum. Could you find a question that is similar to this question"
+q_1 = "How many people live in Tokyo?"
+features = tokenizer(['{0} [SEP] {1}'.format(in_sim, q), '{0} [SEP] {1}'.format(in_sim, q)], [p, q_1], padding=True, truncation=True, return_tensors="pt")
+with torch.no_grad():
+    scores = model(**features).logits
+    normalized_scores = [float(score[1]) for score in F.softmax(scores, dim=1)]
+print([p, q_1]np.argmax(normalized_scores)) #  "How many people live in Tokyo?"
+```