{"downloads": 1677372, "id": "ProsusAI/finbert", "likes": 186, "pipeline_tag": "text-classification", "task": "text-classification", "meta": {"language": "en", "tags": ["financial-sentiment-analysis", "sentiment-analysis"], "widget": [{"text": "Stocks rallied and the British pound gained."}]}, "description": "\n\nFinBERT is a pre-trained NLP model to analyze sentiment of financial text. It is built by further training the BERT language model in the finance domain, using a large financial corpus and thereby fine-tuning it for financial sentiment classification. [Financial PhraseBank](https://www.researchgate.net/publication/251231107_Good_Debt_or_Bad_Debt_Detecting_Semantic_Orientations_in_Economic_Texts) by Malo et al. (2014) is used for fine-tuning. For more details, please see the paper [FinBERT: Financial Sentiment Analysis with Pre-trained Language Models](https://arxiv.org/abs/1908.10063) and our related [blog post](https://medium.com/prosus-ai-tech-blog/finbert-financial-sentiment-analysis-with-bert-b277a3607101) on Medium.\n\nThe model will give softmax outputs for three labels: positive, negative or neutral.\n\n"} {"downloads": 2605299, "id": "distilbert-base-uncased-finetuned-sst-2-english", "likes": 176, "pipeline_tag": "text-classification", "task": "text-classification", "meta": {"language": "en", "license": "apache-2.0", "datasets": ["sst2", "glue"], "model-index": [{"name": "distilbert-base-uncased-finetuned-sst-2-english", "results": [{"task": {"type": "text-classification", "name": "Text Classification"}, "dataset": {"name": "glue", "type": "glue", "config": "sst2", "split": "validation"}, "metrics": [{"type": "accuracy", "value": 0.9105504587155964, "name": "Accuracy", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiN2YyOGMxYjY2Y2JhMjkxNjIzN2FmMjNiNmM2ZWViNGY3MTNmNWI2YzhiYjYxZTY0ZGUyN2M1NGIxZjRiMjQwZiIsInZlcnNpb24iOjF9.uui0srxV5ZHRhxbYN6082EZdwpnBgubPJ5R2-Wk8HTWqmxYE3QHidevR9LLAhidqGw6Ih93fK0goAXncld_gBg"}, {"type": "precision", "value": 0.8978260869565218, "name": "Precision", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiMzgwYTYwYjA2MmM0ZTYwNDk0M2NmNTBkZmM2NGNhYzQ1OGEyN2NkNDQ3Mzc2NTQyMmZiNDJiNzBhNGVhZGUyOSIsInZlcnNpb24iOjF9.eHjLmw3K02OU69R2Au8eyuSqT3aBDHgZCn8jSzE3_urD6EUSSsLxUpiAYR4BGLD_U6-ZKcdxVo_A2rdXqvUJDA"}, {"type": "recall", "value": 0.9301801801801802, "name": "Recall", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiMGIzM2E3MTI2Mzc2MDYwNmU3ZTVjYmZmZDBkNjY4ZTc5MGY0Y2FkNDU3NjY1MmVkNmE3Y2QzMzAwZDZhOWY1NiIsInZlcnNpb24iOjF9.PUZlqmct13-rJWBXdHm5tdkXgETL9F82GNbbSR4hI8MB-v39KrK59cqzFC2Ac7kJe_DtOeUyosj34O_mFt_1DQ"}, {"type": "auc", "value": 0.9716626673402374, "name": "AUC", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiMDM0YWIwZmQ4YjUwOGZmMWU2MjI1YjIxZGQ2MzNjMzRmZmYxMzZkNGFjODhlMDcyZDM1Y2RkMWZlOWQ0MWYwNSIsInZlcnNpb24iOjF9.E7GRlAXmmpEkTHlXheVkuL1W4WNjv4JO3qY_WCVsTVKiO7bUu0UVjPIyQ6g-J1OxsfqZmW3Leli1wY8vPBNNCQ"}, {"type": "f1", "value": 0.9137168141592922, "name": "F1", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiMGU4MjNmOGYwZjZjMDQ1ZTkyZTA4YTc1MWYwOTM0NDM4ZWY1ZGVkNDY5MzNhYTQyZGFlNzIyZmUwMDg3NDU0NyIsInZlcnNpb24iOjF9.mW5ftkq50Se58M-jm6a2Pu93QeKa3MfV7xcBwvG3PSB_KNJxZWTCpfMQp-Cmx_EMlmI2siKOyd8akYjJUrzJCA"}, {"type": "loss", "value": 0.39013850688934326, "name": "loss", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiMTZiNzAyZDc0MzUzMmE1MGJiN2JlYzFiODE5ZTNlNGE4MmI4YzRiMTc2ODEzMTUwZmEzOTgxNzc4YjJjZTRmNiIsInZlcnNpb24iOjF9.VqIC7uYC-ZZ8ss9zQOlRV39YVOOLc5R36sIzCcVz8lolh61ux_5djm2XjpP6ARc6KqEnXC4ZtfNXsX2HZfrtCQ"}]}, {"task": {"type": "text-classification", "name": "Text Classification"}, "dataset": {"name": "sst2", "type": "sst2", "config": "default", "split": "train"}, "metrics": [{"type": "accuracy", "value": 0.9885521685548412, "name": "Accuracy", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiY2I3NzU3YzhmMDkxZTViY2M3OTY1NmI0ZTdmMDQxNjNjYzJiZmQxNzczM2E4YmExYTY5ODY0NDBkY2I4ZjNkOCIsInZlcnNpb24iOjF9.4Gtk3FeVc9sPWSqZIaeUXJ9oVlPzm-NmujnWpK2y5s1Vhp1l6Y1pK5_78wW0-NxSvQqV6qd5KQf_OAEpVAkQDA"}, {"type": "precision", "value": 0.9881965062029833, "name": "Precision Macro", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiZDdlZDMzY2I3MTAwYTljNmM4MGMyMzU2YjAzZDg1NDYwN2ZmM2Y5OWZhMjUyMGJiNjY1YmZiMzFhMDI2ODFhNyIsInZlcnNpb24iOjF9.cqmv6yBxu4St2mykRWrZ07tDsiSLdtLTz2hbqQ7Gm1rMzq9tdlkZ8MyJRxtME_Y8UaOG9rs68pV-gKVUs8wABw"}, {"type": "precision", "value": 0.9885521685548412, "name": "Precision Micro", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiZjFlYzAzNmE1YjljNjUwNzBjZjEzZDY0ZDQyMmY5ZWM2OTBhNzNjYjYzYTk1YWE1NjU3YTMxZDQwOTE1Y2FkNyIsInZlcnNpb24iOjF9.jnCHOkUHuAOZZ_ZMVOnetx__OVJCS6LOno4caWECAmfrUaIPnPNV9iJ6izRO3sqkHRmxYpWBb-27GJ4N3LU-BQ"}, {"type": "precision", "value": 0.9885639626373408, "name": "Precision Weighted", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiZGUyODFjNjBlNTE2MTY3ZDAxOGU1N2U0YjUyY2NiZjhkOGVmYThjYjBkNGU3NTRkYzkzNDQ2MmMwMjkwMWNiMyIsInZlcnNpb24iOjF9.zTNabMwApiZyXdr76QUn7WgGB7D7lP-iqS3bn35piqVTNsv3wnKjZOaKFVLIUvtBXq4gKw7N2oWxvWc4OcSNDg"}, {"type": "recall", "value": 0.9886145346602994, "name": "Recall Macro", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiNTU1YjlhODU3YTkyNTdiZDcwZGFlZDBiYjY0N2NjMGM2NTRiNjQ3MDNjNGMxOWY2ZGQ4NWU1YmMzY2UwZTI3YSIsInZlcnNpb24iOjF9.xaLPY7U-wHsJ3DDui1yyyM-xWjL0Jz5puRThy7fczal9x05eKEQ9s0a_WD-iLmapvJs0caXpV70hDe2NLcs-DA"}, {"type": "recall", "value": 0.9885521685548412, "name": "Recall Micro", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiODE0YTU0MDBlOGY4YzU0MjY5MzA3OTk2OGNhOGVkMmU5OGRjZmFiZWI2ZjY5ODEzZTQzMTI0N2NiOTVkNDliYiIsInZlcnNpb24iOjF9.SOt1baTBbuZRrsvGcak2sUwoTrQzmNCbyV2m1_yjGsU48SBH0NcKXicidNBSnJ6ihM5jf_Lv_B5_eOBkLfNWDQ"}, {"type": "recall", "value": 0.9885521685548412, "name": "Recall Weighted", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiZWNkNmM0ZGRlNmYxYzIwNDk4OTI5MzIwZWU1NzZjZDVhMDcyNDFlMjBhNDQxODU5OWMwMWNhNGEzNjY3ZGUyOSIsInZlcnNpb24iOjF9.b15Fh70GwtlG3cSqPW-8VEZT2oy0CtgvgEOtWiYonOovjkIQ4RSLFVzVG-YfslaIyfg9RzMWzjhLnMY7Bpn2Aw"}, {"type": "f1", "value": 0.9884019815052447, "name": "F1 Macro", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiYmM4NjQ5Yjk5ODRhYTU1MTY3MmRhZDBmODM1NTg3OTFiNWM4NDRmYjI0MzZkNmQ1MzE3MzcxODZlYzBkYTMyYSIsInZlcnNpb24iOjF9.74RaDK8nBVuGRl2Se_-hwQvP6c4lvVxGHpcCWB4uZUCf2_HoC9NT9u7P3pMJfH_tK2cpV7U3VWGgSDhQDi-UBQ"}, {"type": "f1", "value": 0.9885521685548412, "name": "F1 Micro", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiZDRmYWRmMmQ0YjViZmQxMzhhYTUyOTE1MTc0ZDU1ZjQyZjFhMDYzYzMzZDE0NzZlYzQyOTBhMTBhNmM5NTlkMiIsInZlcnNpb24iOjF9.VMn_psdAHIZTlW6GbjERZDe8MHhwzJ0rbjV_VJyuMrsdOh5QDmko-wEvaBWNEdT0cEKsbggm-6jd3Gh81PfHAQ"}, {"type": "f1", "value": 0.9885546181087554, "name": "F1 Weighted", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiMjUyZWFhZDZhMGQ3MzBmYmRiNDVmN2FkZDBjMjk3ODk0OTAxNGZkMWE0NzU5ZjI0NzE0NGZiNzM0N2Y2NDYyOSIsInZlcnNpb24iOjF9.YsXBhnzEEFEW6jw3mQlFUuIrW7Gabad2Ils-iunYJr-myg0heF8NEnEWABKFE1SnvCWt-69jkLza6SupeyLVCA"}, {"type": "loss", "value": 0.040652573108673096, "name": "loss", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiZTc3YjU3MjdjMzkxODA5MjU5NGUyY2NkMGVhZDg3ZWEzMmU1YWVjMmI0NmU2OWEyZTkzMTVjNDZiYTc0YjIyNCIsInZlcnNpb24iOjF9.lA90qXZVYiILHMFlr6t6H81Oe8a-4KmeX-vyCC1BDia2ofudegv6Vb46-4RzmbtuKeV6yy6YNNXxXxqVak1pAg"}]}]}]}, "description": "\n\n# DistilBERT base uncased finetuned SST-2\n\n## Table of Contents\n- [Model Details](#model-details)\n- [How to Get Started With the Model](#how-to-get-started-with-the-model)\n- [Uses](#uses)\n- [Risks, Limitations and Biases](#risks-limitations-and-biases)\n- [Training](#training)\n\n## Model Details\n**Model Description:** This model is a fine-tune checkpoint of [DistilBERT-base-uncased](https://huggingface.co/distilbert-base-uncased), fine-tuned on SST-2.\nThis model reaches an accuracy of 91.3 on the dev set (for comparison, Bert bert-base-uncased version reaches an accuracy of 92.7).\n- **Developed by:** Hugging Face\n- **Model Type:** Text Classification\n- **Language(s):** English\n- **License:** Apache-2.0\n- **Parent Model:** For more details about DistilBERT, we encourage users to check out [this model card](https://huggingface.co/distilbert-base-uncased).\n- **Resources for more information:**\n - [Model Documentation](https://huggingface.co/docs/transformers/main/en/model_doc/distilbert#transformers.DistilBertForSequenceClassification)\n - [DistilBERT paper](https://arxiv.org/abs/1910.01108)\n\n## How to Get Started With the Model\n\nExample of single-label classification:\n\u200b\u200b\n```python\nimport torch\nfrom transformers import DistilBertTokenizer, DistilBertForSequenceClassification\n\ntokenizer = DistilBertTokenizer.from_pretrained(\"distilbert-base-uncased\")\nmodel = DistilBertForSequenceClassification.from_pretrained(\"distilbert-base-uncased\")\n\ninputs = tokenizer(\"Hello, my dog is cute\", return_tensors=\"pt\")\nwith torch.no_grad():\n logits = model(**inputs).logits\n\npredicted_class_id = logits.argmax().item()\nmodel.config.id2label[predicted_class_id]\n\n```\n\n## Uses\n\n#### Direct Use\n\nThis model can be used for topic classification. You can use the raw model for either masked language modeling or next sentence prediction, but it's mostly intended to be fine-tuned on a downstream task. See the model hub to look for fine-tuned versions on a task that interests you.\n\n#### Misuse and Out-of-scope Use\nThe model should not be used to intentionally create hostile or alienating environments for people. In addition, the model was not trained to be factual or true representations of people or events, and therefore using the model to generate such content is out-of-scope for the abilities of this model.\n\n\n## Risks, Limitations and Biases\n\nBased on a few experimentations, we observed that this model could produce biased predictions that target underrepresented populations.\n\nFor instance, for sentences like `This film was filmed in COUNTRY`, this binary classification model will give radically different probabilities for the positive label depending on the country (0.89 if the country is France, but 0.08 if the country is Afghanistan) when nothing in the input indicates such a strong semantic shift. In this [colab](https://colab.research.google.com/gist/ageron/fb2f64fb145b4bc7c49efc97e5f114d3/biasmap.ipynb), [Aur\u00e9lien G\u00e9ron](https://twitter.com/aureliengeron) made an interesting map plotting these probabilities for each country.\n\n\"Map\n\nWe strongly advise users to thoroughly probe these aspects on their use-cases in order to evaluate the risks of this model. We recommend looking at the following bias evaluation datasets as a place to start: [WinoBias](https://huggingface.co/datasets/wino_bias), [WinoGender](https://huggingface.co/datasets/super_glue), [Stereoset](https://huggingface.co/datasets/stereoset).\n\n\n\n# Training\n\n\n#### Training Data\n\n\nThe authors use the following Stanford Sentiment Treebank([sst2](https://huggingface.co/datasets/sst2)) corpora for the model.\n\n#### Training Procedure\n\n###### Fine-tuning hyper-parameters\n\n\n- learning_rate = 1e-5\n- batch_size = 32\n- warmup = 600\n- max_seq_length = 128\n- num_train_epochs = 3.0\n\n\n"} {"downloads": 1679023, "id": "cardiffnlp/twitter-roberta-base-sentiment", "likes": 145, "pipeline_tag": "text-classification", "task": "text-classification", "meta": {"datasets": ["tweet_eval"], "language": ["en"]}, "description": "\n# Twitter-roBERTa-base for Sentiment Analysis\n\nThis is a roBERTa-base model trained on ~58M tweets and finetuned for sentiment analysis with the TweetEval benchmark. This model is suitable for English (for a similar multilingual model, see [XLM-T](https://huggingface.co/cardiffnlp/twitter-xlm-roberta-base-sentiment)).\n\n- Reference Paper: [_TweetEval_ (Findings of EMNLP 2020)](https://arxiv.org/pdf/2010.12421.pdf). \n- Git Repo: [Tweeteval official repository](https://github.com/cardiffnlp/tweeteval).\n\nLabels: \n0 -> Negative;\n1 -> Neutral;\n2 -> Positive\n\nNew! We just released a new sentiment analysis model trained on more recent and a larger quantity of tweets. \nSee [twitter-roberta-base-sentiment-latest](https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest) and [TweetNLP](https://tweetnlp.org) for more details.\n\n## Example of classification\n\n```python\nfrom transformers import AutoModelForSequenceClassification\nfrom transformers import TFAutoModelForSequenceClassification\nfrom transformers import AutoTokenizer\nimport numpy as np\nfrom scipy.special import softmax\nimport csv\nimport urllib.request\n\n# Preprocess text (username and link placeholders)\ndef preprocess(text):\n new_text = []\n \n \n for t in text.split(\" \"):\n t = '@user' if t.startswith('@') and len(t) > 1 else t\n t = 'http' if t.startswith('http') else t\n new_text.append(t)\n return \" \".join(new_text)\n\n# Tasks:\n# emoji, emotion, hate, irony, offensive, sentiment\n# stance/abortion, stance/atheism, stance/climate, stance/feminist, stance/hillary\n\ntask='sentiment'\nMODEL = f\"cardiffnlp/twitter-roberta-base-{task}\"\n\ntokenizer = AutoTokenizer.from_pretrained(MODEL)\n\n# download label mapping\nlabels=[]\nmapping_link = f\"https://raw.githubusercontent.com/cardiffnlp/tweeteval/main/datasets/{task}/mapping.txt\"\nwith urllib.request.urlopen(mapping_link) as f:\n html = f.read().decode('utf-8').split(\"\\n\")\n csvreader = csv.reader(html, delimiter='\\t')\nlabels = [row[1] for row in csvreader if len(row) > 1]\n\n# PT\nmodel = AutoModelForSequenceClassification.from_pretrained(MODEL)\nmodel.save_pretrained(MODEL)\n\ntext = \"Good night \ud83d\ude0a\"\ntext = preprocess(text)\nencoded_input = tokenizer(text, return_tensors='pt')\noutput = model(**encoded_input)\nscores = output[0][0].detach().numpy()\nscores = softmax(scores)\n\n# # TF\n# model = TFAutoModelForSequenceClassification.from_pretrained(MODEL)\n# model.save_pretrained(MODEL)\n\n# text = \"Good night \ud83d\ude0a\"\n# encoded_input = tokenizer(text, return_tensors='tf')\n# output = model(encoded_input)\n# scores = output[0][0].numpy()\n# scores = softmax(scores)\n\nranking = np.argsort(scores)\nranking = ranking[::-1]\nfor i in range(scores.shape[0]):\n l = labels[ranking[i]]\n s = scores[ranking[i]]\n print(f\"{i+1}) {l} {np.round(float(s), 4)}\")\n\n```\n\nOutput: \n\n```\n1) positive 0.8466\n2) neutral 0.1458\n3) negative 0.0076\n```\n\n### BibTeX entry and citation info\n\nPlease cite the [reference paper](https://aclanthology.org/2020.findings-emnlp.148/) if you use this model.\n\n```bibtex\n@inproceedings{barbieri-etal-2020-tweeteval,\n title = \"{T}weet{E}val: Unified Benchmark and Comparative Evaluation for Tweet Classification\",\n author = \"Barbieri, Francesco and\n Camacho-Collados, Jose and\n Espinosa Anke, Luis and\n Neves, Leonardo\",\n booktitle = \"Findings of the Association for Computational Linguistics: EMNLP 2020\",\n month = nov,\n year = \"2020\",\n address = \"Online\",\n publisher = \"Association for Computational Linguistics\",\n url = \"https://aclanthology.org/2020.findings-emnlp.148\",\n doi = \"10.18653/v1/2020.findings-emnlp.148\",\n pages = \"1644--1650\"\n}\n```"} {"downloads": 708808, "id": "j-hartmann/emotion-english-distilroberta-base", "likes": 127, "pipeline_tag": "text-classification", "task": "text-classification", "meta": {"language": "en", "tags": ["distilroberta", "sentiment", "emotion", "twitter", "reddit"], "widget": [{"text": "Oh wow. I didn't know that."}, {"text": "This movie always makes me cry.."}, {"text": "Oh Happy Day"}]}, "description": "\n\n# Emotion English DistilRoBERTa-base\n\n# Description \u2139\n\nWith this model, you can classify emotions in English text data. The model was trained on 6 diverse datasets (see Appendix below) and predicts Ekman's 6 basic emotions, plus a neutral class:\n\n1) anger \ud83e\udd2c\n2) disgust \ud83e\udd22\n3) fear \ud83d\ude28\n4) joy \ud83d\ude00\n5) neutral \ud83d\ude10\n6) sadness \ud83d\ude2d\n7) surprise \ud83d\ude32\n\nThe model is a fine-tuned checkpoint of [DistilRoBERTa-base](https://huggingface.co/distilroberta-base). For a 'non-distilled' emotion model, please refer to the model card of the [RoBERTa-large](https://huggingface.co/j-hartmann/emotion-english-roberta-large) version.\n\n# Application \ud83d\ude80\n\na) Run emotion model with 3 lines of code on single text example using Hugging Face's pipeline command on Google Colab:\n\n[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/j-hartmann/emotion-english-distilroberta-base/blob/main/simple_emotion_pipeline.ipynb)\n\n```python\nfrom transformers import pipeline\nclassifier = pipeline(\"text-classification\", model=\"j-hartmann/emotion-english-distilroberta-base\", return_all_scores=True)\nclassifier(\"I love this!\")\n```\n\n```python\nOutput:\n[[{'label': 'anger', 'score': 0.004419783595949411},\n {'label': 'disgust', 'score': 0.0016119900392368436},\n {'label': 'fear', 'score': 0.0004138521908316761},\n {'label': 'joy', 'score': 0.9771687984466553},\n {'label': 'neutral', 'score': 0.005764586851000786},\n {'label': 'sadness', 'score': 0.002092392183840275},\n {'label': 'surprise', 'score': 0.008528684265911579}]]\n```\n\nb) Run emotion model on multiple examples and full datasets (e.g., .csv files) on Google Colab:\n\n[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/j-hartmann/emotion-english-distilroberta-base/blob/main/emotion_prediction_example.ipynb)\n\n# Contact \ud83d\udcbb\n\nPlease reach out to [jochen.hartmann@tum.de](mailto:jochen.hartmann@tum.de) if you have any questions or feedback.\n\nThanks to Samuel Domdey and [chrsiebert](https://huggingface.co/siebert) for their support in making this model available.\n\n# Reference \u2705\n\nFor attribution, please cite the following reference if you use this model. A working paper will be available soon.\n\n```\nJochen Hartmann, \"Emotion English DistilRoBERTa-base\". https://huggingface.co/j-hartmann/emotion-english-distilroberta-base/, 2022.\n```\n\nBibTex citation:\n\n```\n@misc{hartmann2022emotionenglish,\n author={Hartmann, Jochen},\n title={Emotion English DistilRoBERTa-base},\n year={2022},\n howpublished = {\\url{https://huggingface.co/j-hartmann/emotion-english-distilroberta-base/}},\n}\n```\n\n# Appendix \ud83d\udcda\n\nPlease find an overview of the datasets used for training below. All datasets contain English text. The table summarizes which emotions are available in each of the datasets. The datasets represent a diverse collection of text types. Specifically, they contain emotion labels for texts from Twitter, Reddit, student self-reports, and utterances from TV dialogues. As MELD (Multimodal EmotionLines Dataset) extends the popular EmotionLines dataset, EmotionLines itself is not included here. \n\n|Name|anger|disgust|fear|joy|neutral|sadness|surprise|\n|"} {"downloads": 936792, "id": "cardiffnlp/twitter-roberta-base-sentiment-latest", "likes": 108, "pipeline_tag": "text-classification", "task": "text-classification", "meta": {"language": "en", "widget": [{"text": "Covid cases are increasing fast!"}], "datasets": ["tweet_eval"]}, "description": "\n\n\n# Twitter-roBERTa-base for Sentiment Analysis - UPDATED (2022)\n\nThis is a RoBERTa-base model trained on ~124M tweets from January 2018 to December 2021, and finetuned for sentiment analysis with the TweetEval benchmark. \nThe original Twitter-based RoBERTa model can be found [here](https://huggingface.co/cardiffnlp/twitter-roberta-base-2021-124m) and the original reference paper is [TweetEval](https://github.com/cardiffnlp/tweeteval). This model is suitable for English. \n\n- Reference Paper: [TimeLMs paper](https://arxiv.org/abs/2202.03829). \n- Git Repo: [TimeLMs official repository](https://github.com/cardiffnlp/timelms).\n\nLabels: \n0 -> Negative;\n1 -> Neutral;\n2 -> Positive\n\nThis sentiment analysis model has been integrated into [TweetNLP](https://github.com/cardiffnlp/tweetnlp). You can access the demo [here](https://tweetnlp.org).\n\n## Example Pipeline\n```python\nfrom transformers import pipeline\nsentiment_task = pipeline(\"sentiment-analysis\", model=model_path, tokenizer=model_path)\nsentiment_task(\"Covid cases are increasing fast!\")\n```\n```\n[{'label': 'Negative', 'score': 0.7236}]\n```\n\n## Full classification example\n\n```python\nfrom transformers import AutoModelForSequenceClassification\nfrom transformers import TFAutoModelForSequenceClassification\nfrom transformers import AutoTokenizer, AutoConfig\nimport numpy as np\nfrom scipy.special import softmax\n# Preprocess text (username and link placeholders)\ndef preprocess(text):\n new_text = []\n for t in text.split(\" \"):\n t = '@user' if t.startswith('@') and len(t) > 1 else t\n t = 'http' if t.startswith('http') else t\n new_text.append(t)\n return \" \".join(new_text)\nMODEL = f\"cardiffnlp/twitter-roberta-base-sentiment-latest\"\ntokenizer = AutoTokenizer.from_pretrained(MODEL)\nconfig = AutoConfig.from_pretrained(MODEL)\n# PT\nmodel = AutoModelForSequenceClassification.from_pretrained(MODEL)\n#model.save_pretrained(MODEL)\ntext = \"Covid cases are increasing fast!\"\ntext = preprocess(text)\nencoded_input = tokenizer(text, return_tensors='pt')\noutput = model(**encoded_input)\nscores = output[0][0].detach().numpy()\nscores = softmax(scores)\n# # TF\n# model = TFAutoModelForSequenceClassification.from_pretrained(MODEL)\n# model.save_pretrained(MODEL)\n# text = \"Covid cases are increasing fast!\"\n# encoded_input = tokenizer(text, return_tensors='tf')\n# output = model(encoded_input)\n# scores = output[0][0].numpy()\n# scores = softmax(scores)\n# Print labels and scores\nranking = np.argsort(scores)\nranking = ranking[::-1]\nfor i in range(scores.shape[0]):\n l = config.id2label[ranking[i]]\n s = scores[ranking[i]]\n print(f\"{i+1}) {l} {np.round(float(s), 4)}\")\n```\n\nOutput: \n\n```\n1) Negative 0.7236\n2) Neutral 0.2287\n3) Positive 0.0477\n```"} {"downloads": 227347, "id": "nlptown/bert-base-multilingual-uncased-sentiment", "likes": 100, "pipeline_tag": "text-classification", "task": "text-classification", "meta": {"language": ["en", "nl", "de", "fr", "it", "es"], "license": "mit"}, "description": "\n\n# bert-base-multilingual-uncased-sentiment\n\nThis a bert-base-multilingual-uncased model finetuned for sentiment analysis on product reviews in six languages: English, Dutch, German, French, Spanish and Italian. It predicts the sentiment of the review as a number of stars (between 1 and 5).\n\nThis model is intended for direct use as a sentiment analysis model for product reviews in any of the six languages above, or for further finetuning on related sentiment analysis tasks.\n\n## Training data\n\nHere is the number of product reviews we used for finetuning the model: \n\n| Language | Number of reviews |\n| "} {"downloads": 1659614, "id": "cardiffnlp/twitter-xlm-roberta-base-sentiment", "likes": 81, "pipeline_tag": "text-classification", "task": "text-classification", "meta": {"language": "multilingual", "widget": [{"text": "\ud83e\udd17"}, {"text": "T'estimo! \u2764\ufe0f"}, {"text": "I love you!"}, {"text": "I hate you \ud83e\udd2e"}, {"text": "Mahal kita!"}, {"text": "\uc0ac\ub791\ud574!"}, {"text": "\ub09c \ub108\uac00 \uc2eb\uc5b4"}, {"text": "\ud83d\ude0d\ud83d\ude0d\ud83d\ude0d"}]}, "description": "\n\n\n# twitter-XLM-roBERTa-base for Sentiment Analysis\n\nThis is a multilingual XLM-roBERTa-base model trained on ~198M tweets and finetuned for sentiment analysis. The sentiment fine-tuning was done on 8 languages (Ar, En, Fr, De, Hi, It, Sp, Pt) but it can be used for more languages (see paper for details).\n\n- Paper: [XLM-T: A Multilingual Language Model Toolkit for Twitter](https://arxiv.org/abs/2104.12250). \n- Git Repo: [XLM-T official repository](https://github.com/cardiffnlp/xlm-t).\n\n## Example Pipeline\n```python\nfrom transformers import pipeline\nmodel_path = \"cardiffnlp/twitter-xlm-roberta-base-sentiment\"\nsentiment_task = pipeline(\"sentiment-analysis\", model=model_path, tokenizer=model_path)\nsentiment_task(\"T'estimo!\")\n```\n```\n[{'label': 'Positive', 'score': 0.6600581407546997}]\n```\n\n## Full classification example\n\n```python\nfrom transformers import AutoModelForSequenceClassification\nfrom transformers import TFAutoModelForSequenceClassification\nfrom transformers import AutoTokenizer, AutoConfig\nimport numpy as np\nfrom scipy.special import softmax\n\n# Preprocess text (username and link placeholders)\ndef preprocess(text):\n new_text = []\n for t in text.split(\" \"):\n t = '@user' if t.startswith('@') and len(t) > 1 else t\n t = 'http' if t.startswith('http') else t\n new_text.append(t)\n return \" \".join(new_text)\n\nMODEL = f\"cardiffnlp/twitter-xlm-roberta-base-sentiment\"\n\ntokenizer = AutoTokenizer.from_pretrained(MODEL)\nconfig = AutoConfig.from_pretrained(MODEL)\n\n# PT\nmodel = AutoModelForSequenceClassification.from_pretrained(MODEL)\nmodel.save_pretrained(MODEL)\n\ntext = \"Good night \ud83d\ude0a\"\ntext = preprocess(text)\nencoded_input = tokenizer(text, return_tensors='pt')\noutput = model(**encoded_input)\nscores = output[0][0].detach().numpy()\nscores = softmax(scores)\n\n# # TF\n# model = TFAutoModelForSequenceClassification.from_pretrained(MODEL)\n# model.save_pretrained(MODEL)\n\n# text = \"Good night \ud83d\ude0a\"\n# encoded_input = tokenizer(text, return_tensors='tf')\n# output = model(encoded_input)\n# scores = output[0][0].numpy()\n# scores = softmax(scores)\n\n# Print labels and scores\nranking = np.argsort(scores)\nranking = ranking[::-1]\nfor i in range(scores.shape[0]):\n l = config.id2label[ranking[i]]\n s = scores[ranking[i]]\n print(f\"{i+1}) {l} {np.round(float(s), 4)}\")\n\n```\n\nOutput: \n\n```\n1) Positive 0.7673\n2) Neutral 0.2015\n3) Negative 0.0313\n```\n\n"} {"downloads": 1240754, "id": "papluca/xlm-roberta-base-language-detection", "likes": 75, "pipeline_tag": "text-classification", "task": "text-classification", "meta": {"language": ["multilingual", "ar", "bg", "de", "el", "en", "es", "fr", "hi", "it", "ja", "nl", "pl", "pt", "ru", "sw", "th", "tr", "ur", "vi", "zh"], "license": "mit", "tags": ["generated_from_trainer"], "metrics": ["accuracy", "f1"], "model-index": [{"name": "xlm-roberta-base-language-detection", "results": []}]}, "description": "\n\n# xlm-roberta-base-language-detection\n\nThis model is a fine-tuned version of [xlm-roberta-base](https://huggingface.co/xlm-roberta-base) on the [Language Identification](https://huggingface.co/datasets/papluca/language-identification#additional-information) dataset.\n\n## Model description\n\nThis model is an XLM-RoBERTa transformer model with a classification head on top (i.e. a linear layer on top of the pooled output). \nFor additional information please refer to the [xlm-roberta-base](https://huggingface.co/xlm-roberta-base) model card or to the paper [Unsupervised Cross-lingual Representation Learning at Scale](https://arxiv.org/abs/1911.02116) by Conneau et al.\n\n## Intended uses & limitations\n\nYou can directly use this model as a language detector, i.e. for sequence classification tasks. Currently, it supports the following 20 languages: \n\n`arabic (ar), bulgarian (bg), german (de), modern greek (el), english (en), spanish (es), french (fr), hindi (hi), italian (it), japanese (ja), dutch (nl), polish (pl), portuguese (pt), russian (ru), swahili (sw), thai (th), turkish (tr), urdu (ur), vietnamese (vi), and chinese (zh)`\n\n## Training and evaluation data\n\nThe model was fine-tuned on the [Language Identification](https://huggingface.co/datasets/papluca/language-identification#additional-information) dataset, which consists of text sequences in 20 languages. The training set contains 70k samples, while the validation and test sets 10k each. The average accuracy on the test set is **99.6%** (this matches the average macro/weighted F1-score being the test set perfectly balanced). A more detailed evaluation is provided by the following table.\n\n| Language | Precision | Recall | F1-score | support |\n|:"} {"downloads": 901595, "id": "yiyanghkust/finbert-tone", "likes": 67, "pipeline_tag": "text-classification", "task": "text-classification", "meta": {"language": "en", "tags": ["financial-sentiment-analysis", "sentiment-analysis"], "widget": [{"text": "growth is strong and we have plenty of liquidity"}]}, "description": "\n\n`FinBERT` is a BERT model pre-trained on financial communication text. The purpose is to enhance financial NLP research and practice. It is trained on the following three financial communication corpus. The total corpora size is 4.9B tokens.\n- Corporate Reports 10-K & 10-Q: 2.5B tokens\n- Earnings Call Transcripts: 1.3B tokens\n- Analyst Reports: 1.1B tokens\n\nMore technical details on `FinBERT`: [Click Link](https://github.com/yya518/FinBERT)\n\nThis released `finbert-tone` model is the `FinBERT` model fine-tuned on 10,000 manually annotated (positive, negative, neutral) sentences from analyst reports. This model achieves superior performance on financial tone analysis task. If you are simply interested in using `FinBERT` for financial tone analysis, give it a try.\n\nIf you use the model in your academic work, please cite the following paper:\n\nHuang, Allen H., Hui Wang, and Yi Yang. \"FinBERT: A Large Language Model for Extracting Information from Financial Text.\" *Contemporary Accounting Research* (2022).\n\n\n# How to use \nYou can use this model with Transformers pipeline for sentiment analysis.\n```python\nfrom transformers import BertTokenizer, BertForSequenceClassification\nfrom transformers import pipeline\n\nfinbert = BertForSequenceClassification.from_pretrained('yiyanghkust/finbert-tone',num_labels=3)\ntokenizer = BertTokenizer.from_pretrained('yiyanghkust/finbert-tone')\n\nnlp = pipeline(\"sentiment-analysis\", model=finbert, tokenizer=tokenizer)\n\nsentences = [\"there is a shortage of capital, and we need extra financing\", \n \"growth is strong and we have plenty of liquidity\", \n \"there are doubts about our finances\", \n \"profits are flat\"]\nresults = nlp(sentences)\nprint(results) #LABEL_0: neutral; LABEL_1: positive; LABEL_2: negative\n\n```"} {"downloads": 250278, "id": "roberta-base-openai-detector", "likes": 64, "pipeline_tag": "text-classification", "task": "text-classification", "meta": {"language": "en", "license": "mit", "tags": ["exbert"], "datasets": ["bookcorpus", "wikipedia"]}, "description": "\n\n# RoBERTa Base OpenAI Detector\n\n## Table of Contents\n- [Model Details](#model-details)\n- [Uses](#uses)\n- [Risks, Limitations and Biases](#risks-limitations-and-biases)\n- [Training](#training)\n- [Evaluation](#evaluation)\n- [Environmental Impact](#environmental-impact)\n- [Technical Specifications](#technical-specifications)\n- [Citation Information](#citation-information)\n- [Model Card Authors](#model-card-author)\n- [How To Get Started With the Model](#how-to-get-started-with-the-model)\n\n## Model Details\n\n**Model Description:** RoBERTa base OpenAI Detector is the GPT-2 output detector model, obtained by fine-tuning a RoBERTa base model with the outputs of the 1.5B-parameter GPT-2 model. The model can be used to predict if text was generated by a GPT-2 model. This model was released by OpenAI at the same time as OpenAI released the weights of the [largest GPT-2 model](https://huggingface.co/gpt2-xl), the 1.5B parameter version. \n\n- **Developed by:** OpenAI, see [GitHub Repo](https://github.com/openai/gpt-2-output-dataset/tree/master/detector) and [associated paper](https://d4mucfpksywv.cloudfront.net/papers/GPT_2_Report.pdf) for full author list\n- **Model Type:** Fine-tuned transformer-based language model\n- **Language(s):** English\n- **License:** MIT\n- **Related Models:** [RoBERTa base](https://huggingface.co/roberta-base), [GPT-XL (1.5B parameter version)](https://huggingface.co/gpt2-xl), [GPT-Large (the 774M parameter version)](https://huggingface.co/gpt2-large), [GPT-Medium (the 355M parameter version)](https://huggingface.co/gpt2-medium) and [GPT-2 (the 124M parameter version)](https://huggingface.co/gpt2)\n- **Resources for more information:**\n - [Research Paper](https://d4mucfpksywv.cloudfront.net/papers/GPT_2_Report.pdf) (see, in particular, the section beginning on page 12 about Automated ML-based detection).\n - [GitHub Repo](https://github.com/openai/gpt-2-output-dataset/tree/master/detector)\n - [OpenAI Blog Post](https://openai.com/blog/gpt-2-1-5b-release/)\n - [Explore the detector model here](https://huggingface.co/openai-detector )\n\n## Uses\n\n#### Direct Use\n\nThe model is a classifier that can be used to detect text generated by GPT-2 models. However, it is strongly suggested not to use it as a ChatGPT detector for the purposes of making grave allegations of academic misconduct against undergraduates and others, as this model might give inaccurate results in the case of ChatGPT-generated input.\n\n#### Downstream Use\n\nThe model's developers have stated that they developed and released the model to help with research related to synthetic text generation, so the model could potentially be used for downstream tasks related to synthetic text generation. See the [associated paper](https://d4mucfpksywv.cloudfront.net/papers/GPT_2_Report.pdf) for further discussion.\n\n#### Misuse and Out-of-scope Use\n\nThe model should not be used to intentionally create hostile or alienating environments for people. In addition, the model developers discuss the risk of adversaries using the model to better evade detection in their [associated paper](https://d4mucfpksywv.cloudfront.net/papers/GPT_2_Report.pdf), suggesting that using the model for evading detection or for supporting efforts to evade detection would be a misuse of the model. \n\n## Risks, Limitations and Biases\n\n**CONTENT WARNING: Readers should be aware this section may contain content that is disturbing, offensive, and can propagate historical and current stereotypes.**\n\nUsers (both direct and downstream) should be made aware of the risks, biases and limitations of the model.\n\n#### Risks and Limitations\n\nIn their [associated paper](https://d4mucfpksywv.cloudfront.net/papers/GPT_2_Report.pdf), the model developers discuss the risk that the model may be used by bad actors to develop capabilities for evading detection, though one purpose of releasing the model is to help improve detection research. \n\nIn a related [blog post](https://openai.com/blog/gpt-2-1-5b-release/), the model developers also discuss the limitations of automated methods for detecting synthetic text and the need to pair automated detection tools with other, non-automated approaches. They write: \n\n> We conducted in-house detection research and developed a detection model that has detection rates of ~95% for detecting 1.5B GPT-2-generated text. We believe this is not high enough accuracy for standalone detection and needs to be paired with metadata-based approaches, human judgment, and public education to be more effective. \n\nThe model developers also [report](https://openai.com/blog/gpt-2-1-5b-release/) finding that classifying content from larger models is more difficult, suggesting that detection with automated tools like this model will be increasingly difficult as model sizes increase. The authors find that training detector models on the outputs of larger models can improve accuracy and robustness. \n\n#### Bias\n\nSignificant research has explored bias and fairness issues with language models (see, e.g., [Sheng et al. (2021)](https://aclanthology.org/2021.acl-long.330.pdf) and [Bender et al. (2021)](https://dl.acm.org/doi/pdf/10.1145/3442188.3445922)). Predictions generated by RoBERTa base and GPT-2 1.5B (which this model is built/fine-tuned on) can include disturbing and harmful stereotypes across protected classes; identity characteristics; and sensitive, social, and occupational groups (see the [RoBERTa base](https://huggingface.co/roberta-base) and [GPT-2 XL](https://huggingface.co/gpt2-xl) model cards for more information). The developers of this model discuss these issues further in their [paper](https://d4mucfpksywv.cloudfront.net/papers/GPT_2_Report.pdf).\n\n## Training\n\n#### Training Data\n\nThe model is a sequence classifier based on RoBERTa base (see the [RoBERTa base model card](https://huggingface.co/roberta-base) for more details on the RoBERTa base training data) and then fine-tuned using the outputs of the 1.5B GPT-2 model (available [here](https://github.com/openai/gpt-2-output-dataset)).\n\n#### Training Procedure\n\nThe model developers write that: \n\n> We based a sequence classifier on RoBERTaBASE (125 million parameters) and fine-tuned it to classify the outputs from the 1.5B GPT-2 model versus WebText, the dataset we used to train the GPT-2 model.\n\nThey later state: \n\n> To develop a robust detector model that can accurately classify generated texts regardless of the sampling method, we performed an analysis of the model\u2019s transfer performance.\n\nSee the [associated paper](https://d4mucfpksywv.cloudfront.net/papers/GPT_2_Report.pdf) for further details on the training procedure.\n\n## Evaluation\n\nThe following evaluation information is extracted from the [associated paper](https://d4mucfpksywv.cloudfront.net/papers/GPT_2_Report.pdf).\n\n#### Testing Data, Factors and Metrics\n\nThe model is intended to be used for detecting text generated by GPT-2 models, so the model developers test the model on text datasets, measuring accuracy by: \n\n> testing 510-token test examples comprised of 5,000 samples from the WebText dataset and 5,000 samples generated by a GPT-2 model, which were not used during the training.\n\n#### Results\n\nThe model developers [find](https://d4mucfpksywv.cloudfront.net/papers/GPT_2_Report.pdf): \n\n> Our classifier is able to detect 1.5 billion parameter GPT-2-generated text with approximately 95% accuracy...The model\u2019s accuracy depends on sampling methods used when generating outputs, like temperature, Top-K, and nucleus sampling ([Holtzman et al., 2019](https://arxiv.org/abs/1904.09751). Nucleus sampling outputs proved most difficult to correctly classify, but a detector trained using nucleus sampling transfers well across other sampling methods. As seen in Figure 1 [in the paper], we found consistently high accuracy when trained on nucleus sampling. \t\n\nSee the [associated paper](https://d4mucfpksywv.cloudfront.net/papers/GPT_2_Report.pdf), Figure 1 (on page 14) and Figure 2 (on page 16) for full results.\n\n## Environmental Impact\n\nCarbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).\n\n- **Hardware Type:** Unknown\n- **Hours used:** Unknown\n- **Cloud Provider:** Unknown\n- **Compute Region:** Unknown\n- **Carbon Emitted:** Unknown\n\n## Technical Specifications\n\nThe model developers write that: \n\nSee the [associated paper](https://d4mucfpksywv.cloudfront.net/papers/GPT_2_Report.pdf) for further details on the modeling architecture and training details.\n\n## Citation Information\n\n```bibtex\n@article{solaiman2019release,\n title={Release strategies and the social impacts of language models},\n author={Solaiman, Irene and Brundage, Miles and Clark, Jack and Askell, Amanda and Herbert-Voss, Ariel and Wu, Jeff and Radford, Alec and Krueger, Gretchen and Kim, Jong Wook and Kreps, Sarah and others},\n journal={arXiv preprint arXiv:1908.09203},\n year={2019}\n}\n```\n\nAPA: \n- Solaiman, I., Brundage, M., Clark, J., Askell, A., Herbert-Voss, A., Wu, J., ... & Wang, J. (2019). Release strategies and the social impacts of language models. arXiv preprint arXiv:1908.09203.\n\n## Model Card Authors\n\nThis model card was written by the team at Hugging Face.\n\n## How to Get Started with the Model \n\nMore information needed.\n"} {"downloads": 107304, "id": "bhadresh-savani/distilbert-base-uncased-emotion", "likes": 60, "pipeline_tag": "text-classification", "task": "text-classification", "meta": {"language": ["en"], "license": "apache-2.0", "tags": ["text-classification", "emotion", "pytorch"], "datasets": ["emotion"], "metrics": ["Accuracy, F1 Score"], "thumbnail": "https://avatars3.githubusercontent.com/u/32437151?s=460&u=4ec59abc8d21d5feea3dab323d23a5860e6996a4&v=4", "model-index": [{"name": "bhadresh-savani/distilbert-base-uncased-emotion", "results": [{"task": {"type": "text-classification", "name": "Text Classification"}, "dataset": {"name": "emotion", "type": "emotion", "config": "default", "split": "test"}, "metrics": [{"type": "accuracy", "value": 0.927, "name": "Accuracy", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiYzQxOGRmMjFlZThmZWViNjNmNGMzMTdjMGNjYjg1YWUzOTI0ZDlmYjRhYWMzMDA3Yjg2N2FiMTdmMzk0ZjJkOSIsInZlcnNpb24iOjF9.mOqr-hgNrnle7WCPy3Mo7M3fITFppn5gjpNagGMf_TZfB6VZnPKfZ51UkNFQlBtUlcm0U8vwPkF79snxwvCoDw"}, {"type": "precision", "value": 0.8880230732280744, "name": "Precision Macro", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiYjZiN2NjNTkyN2M3ZWM2ZDZiNDk1OWZhN2FmNTAwZDIzMmQ3NTU2Yjk2MTgyNjJmMTNjYTYzOTc1NDdhYTljYSIsInZlcnNpb24iOjF9.0rWHmCZ2PyZ5zYkSeb_tFdQG9CHS5PdpOZ9kOfrIzEXyZ968daayaOJi2d6iO84fnauE5hZiIAUPsx24Vr4nBA"}, {"type": "precision", "value": 0.927, "name": "Precision Micro", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiZmRhNWM1NDQ4ZjkyYjAxYjQ5MzQzMDA1ZDIzYWU3YTE4NTI2ZTMwYWI2ZWQ4NzQ3YzJkODYzMmZhZDI1NGRlNCIsInZlcnNpb24iOjF9.NlII1s42Mr_DMzPEoR0ntyh5cDW0405TxVkWhCgXLJTFAdnivH54-zZY4av1U5jHPTeXeWwZrrrbMwHCRBkoCw"}, {"type": "precision", "value": 0.9272902840835793, "name": "Precision Weighted", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiODhkNmM5NmYyMzA4MjkwOTllZDgyMDQ1NzZkN2QzOTAyOTMyNGFlZTU4NzM5NmM5NWQ1YmUxYmRmNjA5YjhhNCIsInZlcnNpb24iOjF9.oIn1KT-BOpFNLXiKL29frMvgHhWZMHWc9Q5WgeR7UaMEO7smkK8J3j5HAMy17Ktjv2dh783-f76N6gyJ_NewCg"}, {"type": "recall", "value": 0.8790126653780703, "name": "Recall Macro", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiYjhlNzczNDY2NDVlM2UwMjAzOWQxYTAyNWZkNGZlYmNjODNiZTEzMTcxNTE3MTAxNjNkOTFiMmRiMzViMzJmZiIsInZlcnNpb24iOjF9.AXp7omMuUZFJ6mzAVTQPMke7QoUtoi4RJSSE7Xbnp2pNi7y-JtznKdm"}]}]}]}, "description": "l6RfqcHPlI0jWr7TVGoFsWZ64YAg\n - type: recall\n value: 0.927\n name: Recall Micro\n verified: true\n verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiMjEyYmZiZDQ4MzM1ZmQ2ZmJhZWU4OTVkNmViYjA5NzhiN2MxODE0MzUxZTliZTk0MzViZDAyNGU4MDFjYjM1MSIsInZlcnNpb24iOjF9.9lazxLXbPOdwhqoYtIudwRwjfNVZnUu7KvGRklRP_RAoQStAzgmWMIrT3ckX_d5_6bKZH9fIdujUn5Qz-baKBw\n - type: recall\n value: 0.927\n name: Recall Weighted\n verified: true\n verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiMWVhMzY0YTA4YmQzYTg4YTBiMzQ5YzRiZWJhMjM1NjUzZGQxZmQ5M2NkZDcyNTQ0ZmJjN2NkY2ZiYjg0OWI0ZCIsInZlcnNpb24iOjF9.QgTv726WCTyvrEct0NM8Zpc3vUnDbIwCor9EH941-zpJtuWr-xpdZzYZFJfILkVA0UUn1y6Jz_ABfkfBeyZTBg\n - type: f1\n value: 0.8825061528287809\n name: F1 Macro\n verified: true\n verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiNzQzZTJkMDAwOTUwMzY3ZjI2MjIxYjlmZTg3YTdhNTc4ZjYyMmQ2NDQzM2FmYzk3OGEzNjhhMTk3NTQ3OTlhNyIsInZlcnNpb24iOjF9.hSln1KfKm0plK7Qao9vlubFtAl1M7_UYHNM6La9gEZlW_apnU1Mybz03GT2XZORgOVPe9JmgygvZByxQhpsYBw\n - type: f1\n value: 0.927\n name: F1 Micro\n verified: true\n verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiNzljODQ3NjE3MDRkODE3ZjFlZmY5MjYyOGJlNDQ4YzdlZGRiMTI5OGZiZWM2ODkyZjMyZWQ3MTkzYWU5YThkOCIsInZlcnNpb24iOjF9.7qfBw39fv22jSIJoY71DkOVr9eBB-srhqSi09bCcUC7Huok4O2Z_vB7gO_Rahh9sFgKVu1ZATusjTmOLQr0fBw\n - type: f1\n value: 0.926876082854655\n name: F1 Weighted\n verified: true\n verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiMjJhN2UzODgxOWQ0Y2E3YTcwZTQxMDE0ZWRmYThjOWVhYWQ1YjBhMzk0YWUxNzE2ZjFhNWM5ZmE2ZmI1YTczYSIsInZlcnNpb24iOjF9.nZW0dBdLmh_FgNw6GaITvSJFX-2C_Iku3NanU8Rip7FSiRHozKPAjothdQh9MWQnq158ZZGPPVIjtyIvuTSqCw\n - type: loss\n value: 0.17403268814086914\n name: loss\n verified: true\n verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiMTVjZmFiOGQwZGY1OTU5YWFkNGZjMTlhOGI4NjE3MGI4ZDhkODcxYmJiYTQ3NWNmMWM0ODUyZDI1MThkYTY3ZSIsInZlcnNpb24iOjF9.OYz5BI3Lz8LgjAqVnD6NcrG3UAG0D3wjKJ7G5298RRGaNpb621ycisG_7UYiWixY7e2RJafkfRiplmkdczIFDQ\n"} {"downloads": 108973, "id": "roberta-large-mnli", "likes": 58, "pipeline_tag": "text-classification", "task": "text-classification", "meta": {"language": ["en"], "license": "mit", "tags": ["autogenerated-modelcard"], "datasets": ["multi_nli", "wikipedia", "bookcorpus"]}, "description": "\n\n# roberta-large-mnli\n\n## Table of Contents\n- [Model Details](#model-details)\n- [How To Get Started With the Model](#how-to-get-started-with-the-model)\n- [Uses](#uses)\n- [Risks, Limitations and Biases](#risks-limitations-and-biases)\n- [Training](#training)\n- [Evaluation](#evaluation-results)\n- [Environmental Impact](#environmental-impact)\n- [Technical Specifications](#technical-specifications)\n- [Citation Information](#citation-information)\n- [Model Card Authors](#model-card-author)\n\n## Model Details\n\n**Model Description:** roberta-large-mnli is the [RoBERTa large model](https://huggingface.co/roberta-large) fine-tuned on the [Multi-Genre Natural Language Inference (MNLI)](https://huggingface.co/datasets/multi_nli) corpus. The model is a pretrained model on English language text using a masked language modeling (MLM) objective.\n\n- **Developed by:** See [GitHub Repo](https://github.com/facebookresearch/fairseq/tree/main/examples/roberta) for model developers\n- **Model Type:** Transformer-based language model\n- **Language(s):** English\n- **License:** MIT \n- **Parent Model:** This model is a fine-tuned version of the RoBERTa large model. Users should see the [RoBERTa large model card](https://huggingface.co/roberta-large) for relevant information.\n- **Resources for more information:**\n - [Research Paper](https://arxiv.org/abs/1907.11692)\n - [GitHub Repo](https://github.com/facebookresearch/fairseq/tree/main/examples/roberta)\n\n## How to Get Started with the Model \n\nUse the code below to get started with the model. The model can be loaded with the zero-shot-classification pipeline like so:\n\n```python\nfrom transformers import pipeline\nclassifier = pipeline('zero-shot-classification', model='roberta-large-mnli')\n```\n\nYou can then use this pipeline to classify sequences into any of the class names you specify. For example:\n\n```python\nsequence_to_classify = \"one day I will see the world\"\ncandidate_labels = ['travel', 'cooking', 'dancing']\nclassifier(sequence_to_classify, candidate_labels)\n```\n\n## Uses\n\n#### Direct Use\n\nThis fine-tuned model can be used for zero-shot classification tasks, including zero-shot sentence-pair classification (see the [GitHub repo](https://github.com/facebookresearch/fairseq/tree/main/examples/roberta) for examples) and zero-shot sequence classification.\n\n#### Misuse and Out-of-scope Use\n\nThe model should not be used to intentionally create hostile or alienating environments for people. In addition, the model was not trained to be factual or true representations of people or events, and therefore using the model to generate such content is out-of-scope for the abilities of this model.\n\n## Risks, Limitations and Biases\n\n**CONTENT WARNING: Readers should be aware this section contains content that is disturbing, offensive, and can propogate historical and current stereotypes.**\n\nSignificant research has explored bias and fairness issues with language models (see, e.g., [Sheng et al. (2021)](https://aclanthology.org/2021.acl-long.330.pdf) and [Bender et al. (2021)](https://dl.acm.org/doi/pdf/10.1145/3442188.3445922)). The [RoBERTa large model card](https://huggingface.co/roberta-large) notes that: \"The training data used for this model contains a lot of unfiltered content from the internet, which is far from neutral.\" \n\nPredictions generated by the model can include disturbing and harmful stereotypes across protected classes; identity characteristics; and sensitive, social, and occupational groups. For example:\n\n```python\nsequence_to_classify = \"The CEO had a strong handshake.\"\ncandidate_labels = ['male', 'female']\nhypothesis_template = \"This text speaks about a {} profession.\"\nclassifier(sequence_to_classify, candidate_labels, hypothesis_template=hypothesis_template)\n```\n\nUsers (both direct and downstream) should be made aware of the risks, biases and limitations of the model.\n\n## Training\n\n#### Training Data\n\nThis model was fine-tuned on the [Multi-Genre Natural Language Inference (MNLI)](https://cims.nyu.edu/~sbowman/multinli/) corpus. Also see the [MNLI data card](https://huggingface.co/datasets/multi_nli) for more information. \n\nAs described in the [RoBERTa large model card](https://huggingface.co/roberta-large): \n\n> The RoBERTa model was pretrained on the reunion of five datasets:\n> \n> - [BookCorpus](https://yknzhu.wixsite.com/mbweb), a dataset consisting of 11,038 unpublished books;\n> - [English Wikipedia](https://en.wikipedia.org/wiki/English_Wikipedia) (excluding lists, tables and headers) ;\n> - [CC-News](https://commoncrawl.org/2016/10/news-dataset-available/), a dataset containing 63 millions English news articles crawled between September 2016 and February 2019.\n> - [OpenWebText](https://github.com/jcpeterson/openwebtext), an opensource recreation of the WebText dataset used to train GPT-2,\n> - [Stories](https://arxiv.org/abs/1806.02847), a dataset containing a subset of CommonCrawl data filtered to match the story-like style of Winograd schemas.\n>\n> Together theses datasets weight 160GB of text.\n\nAlso see the [bookcorpus data card](https://huggingface.co/datasets/bookcorpus) and the [wikipedia data card](https://huggingface.co/datasets/wikipedia) for additional information.\n\n#### Training Procedure\n\n##### Preprocessing\n\nAs described in the [RoBERTa large model card](https://huggingface.co/roberta-large): \n\n> The texts are tokenized using a byte version of Byte-Pair Encoding (BPE) and a vocabulary size of 50,000. The inputs of\n> the model take pieces of 512 contiguous token that may span over documents. The beginning of a new document is marked\n> with `` and the end of one by ``\n> \n> The details of the masking procedure for each sentence are the following:\n> - 15% of the tokens are masked.\n> - In 80% of the cases, the masked tokens are replaced by ``.\n> - In 10% of the cases, the masked tokens are replaced by a random token (different) from the one they replace.\n> - In the 10% remaining cases, the masked tokens are left as is.\n> \n> Contrary to BERT, the masking is done dynamically during pretraining (e.g., it changes at each epoch and is not fixed).\n\n##### Pretraining \n\nAlso as described in the [RoBERTa large model card](https://huggingface.co/roberta-large): \n\n> The model was trained on 1024 V100 GPUs for 500K steps with a batch size of 8K and a sequence length of 512. The\n> optimizer used is Adam with a learning rate of 4e-4, \\\\(\\beta_{1} = 0.9\\\\), \\\\(\\beta_{2} = 0.98\\\\) and\n> \\\\(\\epsilon = 1e-6\\\\), a weight decay of 0.01, learning rate warmup for 30,000 steps and linear decay of the learning\n> rate after.\n\n## Evaluation\n\nThe following evaluation information is extracted from the associated [GitHub repo for RoBERTa](https://github.com/facebookresearch/fairseq/tree/main/examples/roberta). \n\n#### Testing Data, Factors and Metrics\n\nThe model developers report that the model was evaluated on the following tasks and datasets using the listed metrics: \n\n- **Dataset:** Part of [GLUE (Wang et al., 2019)](https://arxiv.org/pdf/1804.07461.pdf), the General Language Understanding Evaluation benchmark, a collection of 9 datasets for evaluating natural language understanding systems. Specifically, the model was evaluated on the [Multi-Genre Natural Language Inference (MNLI)](https://cims.nyu.edu/~sbowman/multinli/) corpus. See the [GLUE data card](https://huggingface.co/datasets/glue) or [Wang et al. (2019)](https://arxiv.org/pdf/1804.07461.pdf) for further information.\n - **Tasks:** NLI. [Wang et al. (2019)](https://arxiv.org/pdf/1804.07461.pdf) describe the inference task for MNLI as: \n > The Multi-Genre Natural Language Inference Corpus [(Williams et al., 2018)](https://arxiv.org/abs/1704.05426) is a crowd-sourced collection of sentence pairs with textual entailment annotations. Given a premise sentence and a hypothesis sentence, the task is to predict whether the premise entails the hypothesis (entailment), contradicts the hypothesis (contradiction), or neither (neutral). The premise sentences are gathered from ten different sources, including transcribed speech, fiction, and government reports. We use the standard test set, for which we obtained private labels from the authors, and evaluate on both the matched (in-domain) and mismatched (cross-domain) sections. We also use and recommend the SNLI corpus [(Bowman et al., 2015)](https://arxiv.org/abs/1508.05326) as 550k examples of auxiliary training data.\n - **Metrics:** Accuracy \n \n- **Dataset:** [XNLI (Conneau et al., 2018)](https://arxiv.org/pdf/1809.05053.pdf), the extension of the [Multi-Genre Natural Language Inference (MNLI)](https://cims.nyu.edu/~sbowman/multinli/) corpus to 15 languages: English, French, Spanish, German, Greek, Bulgarian, Russian, Turkish, Arabic, Vietnamese, Thai, Chinese, Hindi, Swahili and Urdu. See the [XNLI data card](https://huggingface.co/datasets/xnli) or [Conneau et al. (2018)](https://arxiv.org/pdf/1809.05053.pdf) for further information.\n - **Tasks:** Translate-test (e.g., the model is used to translate input sentences in other languages to the training language)\n - **Metrics:** Accuracy\n\n#### Results\n\nGLUE test results (dev set, single model, single-task fine-tuning): 90.2 on MNLI\n\nXNLI test results:\n\n| Task | en | fr | es | de | el | bg | ru | tr | ar | vi | th | zh | hi | sw | ur |\n|:"} {"downloads": 1154465, "id": "finiteautomata/bertweet-base-sentiment-analysis", "likes": 53, "pipeline_tag": "text-classification", "task": "text-classification", "meta": {"language": ["en"], "tags": ["sentiment-analysis"]}, "description": "\n# Sentiment Analysis in English\n## bertweet-sentiment-analysis\n\nRepository: [https://github.com/finiteautomata/pysentimiento/](https://github.com/finiteautomata/pysentimiento/)\n\n\nModel trained with SemEval 2017 corpus (around ~40k tweets). Base model is [BERTweet](https://github.com/VinAIResearch/BERTweet), a RoBERTa model trained on English tweets.\n\nUses `POS`, `NEG`, `NEU` labels.\n\n## License\n\n`pysentimiento` is an open-source library for non-commercial use and scientific research purposes only. Please be aware that models are trained with third-party datasets and are subject to their respective licenses. \n\n1. [TASS Dataset license](http://tass.sepln.org/tass_data/download.php)\n2. [SEMEval 2017 Dataset license]()\n\n## Citation\n\nIf you use `pysentimiento` in your work, please cite [this paper](https://arxiv.org/abs/2106.09462)\n\n```\n@misc{perez2021pysentimiento,\n title={pysentimiento: A Python Toolkit for Sentiment Analysis and SocialNLP tasks},\n author={Juan Manuel P\u00e9rez and Juan Carlos Giudici and Franco Luque},\n year={2021},\n eprint={2106.09462},\n archivePrefix={arXiv},\n primaryClass={cs.CL}\n}\n```\nEnjoy! \ud83e\udd17\n"} {"downloads": 168385, "id": "unitary/toxic-bert", "likes": 53, "pipeline_tag": "text-classification", "task": "text-classification", "meta": {}, "description": "\n \n
\n\n**\u26a0\ufe0f Disclaimer:**\nThe huggingface models currently give different results to the detoxify library (see issue [here](https://github.com/unitaryai/detoxify/issues/15)). For the most up to date models we recommend using the models from https://github.com/unitaryai/detoxify\n\n# \ud83d\ude4a Detoxify\n## Toxic Comment Classification with \u26a1 Pytorch Lightning and \ud83e\udd17 Transformers \n\n![CI testing](https://github.com/unitaryai/detoxify/workflows/CI%20testing/badge.svg)\n![Lint](https://github.com/unitaryai/detoxify/workflows/Lint/badge.svg)\n\n
\n \n![Examples image](examples.png)\n\n## Description \n\nTrained models & code to predict toxic comments on 3 Jigsaw challenges: Toxic comment classification, Unintended\u00a0Bias in Toxic comments, Multilingual toxic comment classification.\n\nBuilt by [Laura Hanu](https://laurahanu.github.io/) at [Unitary](https://www.unitary.ai/), where we are working to stop harmful content online by interpreting visual content in context. \n\nDependencies:\n- For inference:\n - \ud83e\udd17 Transformers\n - \u26a1 Pytorch lightning \n- For training will also need:\n - Kaggle API (to download data)\n\n\n| Challenge | Year | Goal | Original Data Source | Detoxify Model Name | Top Kaggle Leaderboard Score | Detoxify Score\n|-|-|-|-|-|-|-|\n| [Toxic Comment Classification Challenge](https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge) | 2018 | build a multi-headed model that\u2019s capable of detecting different types of of toxicity like threats, obscenity, insults, and identity-based hate. | Wikipedia Comments | `original` | 0.98856 | 0.98636\n| [Jigsaw Unintended Bias in Toxicity Classification](https://www.kaggle.com/c/jigsaw-unintended-bias-in-toxicity-classification) | 2019 | build a model that recognizes toxicity and minimizes this type of unintended bias with respect to mentions of identities. You'll be using a dataset labeled for identity mentions and optimizing a metric designed to measure unintended bias. | Civil Comments | `unbiased` | 0.94734 | 0.93639\n| [Jigsaw Multilingual Toxic Comment Classification](https://www.kaggle.com/c/jigsaw-multilingual-toxic-comment-classification) | 2020 | build effective multilingual models | Wikipedia Comments + Civil Comments | `multilingual` | 0.9536 | 0.91655*\n\n*Score not directly comparable since it is obtained on the validation set provided and not on the test set. To update when the test labels are made available. \n\nIt is also noteworthy to mention that the top leadearboard scores have been achieved using model ensembles. The purpose of this library was to build something user-friendly and straightforward to use.\n\n## Limitations and ethical considerations\n\nIf words that are associated with swearing, insults or profanity are present in a comment, it is likely that it will be classified as toxic, regardless of the tone or the intent of the author e.g. humorous/self-deprecating. This could present some biases towards already vulnerable minority groups.\n\nThe intended use of this library is for research purposes, fine-tuning on carefully constructed datasets that reflect real world demographics and/or to aid content moderators in flagging out harmful content quicker.\n\nSome useful resources about the risk of different biases in toxicity or hate speech detection are:\n- [The Risk of Racial Bias in Hate Speech Detection](https://homes.cs.washington.edu/~msap/pdfs/sap2019risk.pdf)\n- [Automated Hate Speech Detection and the Problem of Offensive Language](https://arxiv.org/pdf/1703.04009.pdf%201.pdf)\n- [Racial Bias in Hate Speech and Abusive Language Detection Datasets](https://arxiv.org/pdf/1905.12516.pdf)\n\n## Quick prediction\n\n\nThe `multilingual` model has been trained on 7 different languages so it should only be tested on: `english`, `french`, `spanish`, `italian`, `portuguese`, `turkish` or `russian`.\n\n```bash\n# install detoxify \n\npip install detoxify\n\n```\n```python\n\nfrom detoxify import Detoxify\n\n# each model takes in either a string or a list of strings\n\nresults = Detoxify('original').predict('example text')\n\nresults = Detoxify('unbiased').predict(['example text 1','example text 2'])\n\nresults = Detoxify('multilingual').predict(['example text','exemple de texte','texto de ejemplo','testo di esempio','texto de exemplo','\u00f6rnek metin','\u043f\u0440\u0438\u043c\u0435\u0440 \u0442\u0435\u043a\u0441\u0442\u0430'])\n\n# optional to display results nicely (will need to pip install pandas)\n\nimport pandas as pd\n\nprint(pd.DataFrame(results, index=input_text).round(5))\n\n```\nFor more details check the Prediction section.\n\n\n## Labels\nAll challenges have a toxicity label. The toxicity labels represent the aggregate ratings of up to 10 annotators according the following schema:\n- **Very Toxic** (a very hateful, aggressive, or disrespectful comment that is very likely to make you leave a discussion or give up on sharing your perspective)\n- **Toxic** (a rude, disrespectful, or unreasonable comment that is somewhat likely to make you leave a discussion or give up on sharing your perspective)\n- **Hard to Say**\n- **Not Toxic**\n\nMore information about the labelling schema can be found [here](https://www.kaggle.com/c/jigsaw-unintended-bias-in-toxicity-classification/data).\n\n### Toxic Comment Classification Challenge\nThis challenge includes the following labels:\n\n- `toxic`\n- `severe_toxic`\n- `obscene`\n- `threat`\n- `insult`\n- `identity_hate`\n\n### Jigsaw Unintended Bias in Toxicity Classification\nThis challenge has 2 types of labels: the main toxicity labels and some additional identity labels that represent the identities mentioned in the comments. \n\nOnly identities with more than 500 examples in the test set (combined public and private) are included during training as additional labels and in the evaluation calculation.\n\n- `toxicity`\n- `severe_toxicity`\n- `obscene`\n- `threat`\n- `insult`\n- `identity_attack`\n- `sexual_explicit`\n\nIdentity labels used:\n- `male`\n- `female`\n- `homosexual_gay_or_lesbian`\n- `christian`\n- `jewish`\n- `muslim`\n- `black`\n- `white`\n- `psychiatric_or_mental_illness`\n\nA complete list of all the identity labels available can be found [here](https://www.kaggle.com/c/jigsaw-unintended-bias-in-toxicity-classification/data).\n\n\n### Jigsaw Multilingual Toxic Comment Classification\n\nSince this challenge combines the data from the previous 2 challenges, it includes all labels from above, however the final evaluation is only on:\n\n- `toxicity`\n\n## How to run \n\nFirst, install dependencies \n```bash\n# clone project \n\ngit clone https://github.com/unitaryai/detoxify\n\n# create virtual env\n\npython3 -m venv toxic-env\nsource toxic-env/bin/activate\n\n# install project \n\npip install -e detoxify\ncd detoxify\n\n# for training\npip install -r requirements.txt\n\n ``` \n\n## Prediction\n\nTrained models summary:\n\n|Model name| Transformer type| Data from\n|:--:|:--:|:--:|\n|`original`| `bert-base-uncased` | Toxic Comment Classification Challenge\n|`unbiased`| `roberta-base`| Unintended Bias in Toxicity Classification\n|`multilingual`| `xlm-roberta-base`| Multilingual Toxic Comment Classification\n\nFor a quick prediction can run the example script on a comment directly or from a txt containing a list of comments. \n```bash\n\n# load model via torch.hub\n\npython run_prediction.py --input 'example' --model_name original\n\n# load model from from checkpoint path\n\npython run_prediction.py --input 'example' --from_ckpt_path model_path\n\n# save results to a .csv file\n\npython run_prediction.py --input test_set.txt --model_name original --save_to results.csv\n\n# to see usage\n\npython run_prediction.py --help\n\n```\n\nCheckpoints can be downloaded from the latest release or via the Pytorch hub API with the following names:\n- `toxic_bert`\n- `unbiased_toxic_roberta`\n- `multilingual_toxic_xlm_r`\n```bash\nmodel = torch.hub.load('unitaryai/detoxify','toxic_bert')\n```\n\nImporting detoxify in python:\n\n```python\n\nfrom detoxify import Detoxify\n\nresults = Detoxify('original').predict('some text')\n\nresults = Detoxify('unbiased').predict(['example text 1','example text 2'])\n\nresults = Detoxify('multilingual').predict(['example text','exemple de texte','texto de ejemplo','testo di esempio','texto de exemplo','\u00f6rnek metin','\u043f\u0440\u0438\u043c\u0435\u0440 \u0442\u0435\u043a\u0441\u0442\u0430'])\n\n# to display results nicely\n\nimport pandas as pd\n\nprint(pd.DataFrame(results,index=input_text).round(5))\n\n```\n\n\n## Training\n\n If you do not already have a Kaggle account: \n - you need to create one to be able to download the data\n \n - go to My Account and click on Create New API Token - this will download a kaggle.json file\n\n - make sure this file is located in ~/.kaggle\n\n ```bash\n\n# create data directory\n\nmkdir jigsaw_data\ncd jigsaw_data\n\n# download data\n\nkaggle competitions download -c jigsaw-toxic-comment-classification-challenge\n\nkaggle competitions download -c jigsaw-unintended-bias-in-toxicity-classification\n\nkaggle competitions download -c jigsaw-multilingual-toxic-comment-classification\n\n```\n## Start Training\n ### Toxic Comment Classification Challenge\n\n ```bash\n\npython create_val_set.py\n\npython train.py --config configs/Toxic_comment_classification_BERT.json\n``` \n ### Unintended Bias in Toxicicity Challenge\n\n```bash\n\npython train.py --config configs/Unintended_bias_toxic_comment_classification_RoBERTa.json\n\n```\n ### Multilingual Toxic Comment Classification\n\n This is trained in 2 stages. First, train on all available data, and second, train only on the translated versions of the first challenge. \n \n The [translated data](https://www.kaggle.com/miklgr500/jigsaw-train-multilingual-coments-google-api) can be downloaded from Kaggle in french, spanish, italian, portuguese, turkish, and russian (the languages available in the test set).\n\n ```bash\n\n# stage 1\n\npython train.py --config configs/Multilingual_toxic_comment_classification_XLMR.json\n\n# stage 2\n\npython train.py --config configs/Multilingual_toxic_comment_classification_XLMR_stage2.json\n\n```\n### Monitor progress with tensorboard\n\n ```bash\n\ntensorboard --logdir=./saved\n\n```\n## Model Evaluation\n\n### Toxic Comment Classification Challenge\n\nThis challenge is evaluated on the mean AUC score of all the labels.\n\n```bash\n\npython evaluate.py --checkpoint saved/lightning_logs/checkpoints/example_checkpoint.pth --test_csv test.csv\n\n```\n### Unintended Bias in Toxicicity Challenge\n\nThis challenge is evaluated on a novel bias metric that combines different AUC scores to balance overall performance. More information on this metric [here](https://www.kaggle.com/c/jigsaw-unintended-bias-in-toxicity-classification/overview/evaluation).\n\n```bash\n\npython evaluate.py --checkpoint saved/lightning_logs/checkpoints/example_checkpoint.pth --test_csv test.csv\n\n# to get the final bias metric\npython model_eval/compute_bias_metric.py\n\n```\n### Multilingual Toxic Comment Classification\n\nThis challenge is evaluated on the AUC score of the main toxic label.\n\n```bash\n\npython evaluate.py --checkpoint saved/lightning_logs/checkpoints/example_checkpoint.pth --test_csv test.csv\n\n```\n\n### Citation \n```\n@misc{Detoxify,\n title={Detoxify},\n author={Hanu, Laura and {Unitary team}},\n howpublished={Github. https://github.com/unitaryai/detoxify},\n year={2020}\n}\n``` \n"} {"downloads": 66609, "id": "arpanghoshal/EmoRoBERTa", "likes": 52, "pipeline_tag": "text-classification", "task": "text-classification", "meta": {"language": "en", "tags": ["text-classification", "tensorflow", "roberta"], "datasets": ["go_emotions"], "license": "mit"}, "description": "\n\nConnect me on LinkedIn\n- [linkedin.com/in/arpanghoshal](https://www.linkedin.com/in/arpanghoshal)\n\n\n## What is GoEmotions\n\nDataset labelled 58000 Reddit comments with 28 emotions\n\n- admiration, amusement, anger, annoyance, approval, caring, confusion, curiosity, desire, disappointment, disapproval, disgust, embarrassment, excitement, fear, gratitude, grief, joy, love, nervousness, optimism, pride, realization, relief, remorse, sadness, surprise + neutral\n\n\n## What is RoBERTa\n\nRoBERTa builds on BERT\u2019s language masking strategy and modifies key hyperparameters in BERT, including removing BERT\u2019s next-sentence pretraining objective, and training with much larger mini-batches and learning rates. RoBERTa was also trained on an order of magnitude more data than BERT, for a longer amount of time. This allows RoBERTa representations to generalize even better to downstream tasks compared to BERT.\n\n\n## Hyperparameters\n\n| Parameter | |\n| "} {"downloads": 276178, "id": "siebert/sentiment-roberta-large-english", "likes": 46, "pipeline_tag": "text-classification", "task": "text-classification", "meta": {"language": "en", "tags": ["sentiment", "twitter", "reviews", "siebert"]}, "description": "\n\n## SiEBERT - English-Language Sentiment Classification\n\n# Overview\nThis model (\"SiEBERT\", prefix for \"Sentiment in English\") is a fine-tuned checkpoint of [RoBERTa-large](https://huggingface.co/roberta-large) ([Liu et al. 2019](https://arxiv.org/pdf/1907.11692.pdf)). It enables reliable binary sentiment analysis for various types of English-language text. For each instance, it predicts either positive (1) or negative (0) sentiment. The model was fine-tuned and evaluated on 15 data sets from diverse text sources to enhance generalization across different types of texts (reviews, tweets, etc.). Consequently, it outperforms models trained on only one type of text (e.g., movie reviews from the popular SST-2 benchmark) when used on new data as shown below. \n\n\n# Predictions on a data set\nIf you want to predict sentiment for your own data, we provide an example script via [Google Colab](https://colab.research.google.com/notebooks/intro.ipynb). You can load your data to a Google Drive and run the script for free on a Colab GPU. Set-up only takes a few minutes. We suggest that you manually label a subset of your data to evaluate performance for your use case. For performance benchmark values across various sentiment analysis contexts, please refer to our paper ([Hartmann et al. 2022](https://www.sciencedirect.com/science/article/pii/S0167811622000477?via%3Dihub)).\n\n[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/chrsiebert/sentiment-roberta-large-english/blob/main/sentiment_roberta_prediction_example.ipynb)\n\n\n# Use in a Hugging Face pipeline\nThe easiest way to use the model for single predictions is Hugging Face's [sentiment analysis pipeline](https://huggingface.co/transformers/quicktour.html#getting-started-on-a-task-with-a-pipeline), which only needs a couple lines of code as shown in the following example:\n```\nfrom transformers import pipeline\nsentiment_analysis = pipeline(\"sentiment-analysis\",model=\"siebert/sentiment-roberta-large-english\")\nprint(sentiment_analysis(\"I love this!\"))\n```\n\n[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/chrsiebert/sentiment-roberta-large-english/blob/main/sentiment_roberta_pipeline.ipynb)\n\n\n# Use for further fine-tuning\nThe model can also be used as a starting point for further fine-tuning of RoBERTa on your specific data. Please refer to Hugging Face's [documentation](https://huggingface.co/docs/transformers/training) for further details and example code.\n\n\n# Performance\nTo evaluate the performance of our general-purpose sentiment analysis model, we set aside an evaluation set from each data set, which was not used for training. On average, our model outperforms a [DistilBERT-based model](https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english) (which is solely fine-tuned on the popular SST-2 data set) by more than 15 percentage points (78.1 vs. 93.2 percent, see table below). As a robustness check, we evaluate the model in a leave-one-out manner (training on 14 data sets, evaluating on the one left out), which decreases model performance by only about 3 percentage points on average and underscores its generalizability. Model performance is given as evaluation set accuracy in percent.\n\n|Dataset|DistilBERT SST-2|This model|\n|"} {"downloads": 25261, "id": "mrm8488/distilroberta-finetuned-financial-news-sentiment-analysis", "likes": 42, "pipeline_tag": "text-classification", "task": "text-classification", "meta": {"license": "apache-2.0", "tags": ["generated_from_trainer", "financial", "stocks", "sentiment"], "widget": [{"text": "Operating profit totaled EUR 9.4 mn , down from EUR 11.7 mn in 2004 ."}], "datasets": ["financial_phrasebank"], "metrics": ["accuracy"], "model-index": [{"name": "distilRoberta-financial-sentiment", "results": [{"task": {"name": "Text Classification", "type": "text-classification"}, "dataset": {"name": "financial_phrasebank", "type": "financial_phrasebank", "args": "sentences_allagree"}, "metrics": [{"name": "Accuracy", "type": "accuracy", "value": 0.9823008849557522}]}]}]}, "description": "\n\n\n\n# distilRoberta-financial-sentiment\n\nThis model is a fine-tuned version of [distilroberta-base](https://huggingface.co/distilroberta-base) on the financial_phrasebank dataset.\nIt achieves the following results on the evaluation set:\n- Loss: 0.1116\n- Accuracy: 0.9823\n\n## Model description\n\nMore information needed\n\n## Intended uses & limitations\n\nMore information needed\n\n## Training and evaluation data\n\nMore information needed\n\n## Training procedure\n\n### Training hyperparameters\n\nThe following hyperparameters were used during training:\n- learning_rate: 2e-05\n- train_batch_size: 8\n- eval_batch_size: 8\n- seed: 42\n- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08\n- lr_scheduler_type: linear\n- num_epochs: 5\n\n### Training results\n\n| Training Loss | Epoch | Step | Validation Loss | Accuracy |\n|:"} {"downloads": 6613, "id": "OpenAssistant/reward-model-deberta-v3-large-v2", "likes": 41, "pipeline_tag": "text-classification", "task": "text-classification", "meta": {"license": "mit", "datasets": ["openai/summarize_from_feedback", "openai/webgpt_comparisons", "Dahoas/instruct-synthetic-prompt-responses", "Anthropic/hh-rlhf"], "language": ["en"], "metrics": ["accuracy"], "tags": ["reward-model", "reward_model", "RLHF"]}, "description": "\n# Reward model trained from human feedback\n\nReward model (RM) trained to predict which generated answer is better judged by a human, given a question.\n\nRM are useful in these domain:\n\n- QA model evaluation\n\n- serves as reward score in RLHF \n\n- detect potential toxic response via ranking\n\nAll models are train on these dataset with a same split seed across datasets (if validation split wasn't available)\n\n- [webgpt_comparisons](https://huggingface.co/datasets/openai/webgpt_comparisons)\n\n- [summarize_from_feedback](https://huggingface.co/datasets/openai/summarize_from_feedback)\n\n- [synthetic-instruct-gptj-pairwise](https://huggingface.co/datasets/Dahoas/synthetic-instruct-gptj-pairwise)\n\n- [anthropic_hh-rlhf](https://huggingface.co/datasets/Anthropic/hh-rlhf)\n\n# How to use\n\n```\nfrom transformers import AutoModelForSequenceClassification, AutoTokenizer\nreward_name = \"OpenAssistant/reward-model-deberta-v3-large-v2\"\nrank_model, tokenizer = AutoModelForSequenceClassification.from_pretrained(reward_name), AutoTokenizer.from_pretrained(reward_name)\nquestion, answer = \"Explain nuclear fusion like I am five\", \"Nuclear fusion is the process by which two or more protons and neutrons combine to form a single nucleus. It is a very important process in the universe, as it is the source of energy for stars and galaxies. Nuclear fusion is also a key process in the production of energy for nuclear power plants.\"\ninputs = tokenizer(question, answer, return_tensors='pt')\nscore = rank_model(**inputs).logits[0].cpu().detach()\nprint(score)\n```\n\n**Toxic response detection**\n\n```\nfrom transformers import AutoModelForSequenceClassification, AutoTokenizer\nreward_name = \"OpenAssistant/reward-model-deberta-v3-large-v2\"\nrank_model, tokenizer = AutoModelForSequenceClassification.from_pretrained(reward_name), AutoTokenizer.from_pretrained(reward_name)\n\nquestion = \"I just came out of from jail, any suggestion of my future?\"\nhelpful = \"It's great to hear that you have been released from jail.\"\nbad = \"Go back to jail you scum\"\n\ninputs = tokenizer(question, helpful, return_tensors='pt')\ngood_score = rank_model(**inputs).logits[0].cpu().detach()\n\ninputs = tokenizer(question, bad, return_tensors='pt')\nbad_score = rank_model(**inputs).logits[0].cpu().detach()\nprint(good_score > bad_score) # tensor([True])\n```\n\n# Performance\n\nValidation split accuracy\n\n| Model | [WebGPT](https://huggingface.co/datasets/openai/webgpt_comparisons) | [Summary](https://huggingface.co/datasets/openai/summarize_from_feedback) | [SytheticGPT](https://huggingface.co/datasets/Dahoas/synthetic-instruct-gptj-pairwise) | [Anthropic RLHF]() |\n|"} {"downloads": 14059, "id": "microsoft/MiniLM-L12-H384-uncased", "likes": 33, "pipeline_tag": "text-classification", "task": "text-classification", "meta": {"thumbnail": "https://huggingface.co/front/thumbnails/microsoft.png", "tags": ["text-classification"], "license": "mit"}, "description": "\n\n## MiniLM: Small and Fast Pre-trained Models for Language Understanding and Generation\n\nMiniLM is a distilled model from the paper \"[MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers](https://arxiv.org/abs/2002.10957)\".\n\nPlease find the information about preprocessing, training and full details of the MiniLM in the [original MiniLM repository](https://github.com/microsoft/unilm/blob/master/minilm/).\n\nPlease note: This checkpoint can be an inplace substitution for BERT and it needs to be fine-tuned before use!\n\n### English Pre-trained Models\nWe release the **uncased** **12**-layer model with **384** hidden size distilled from an in-house pre-trained [UniLM v2](/unilm) model in BERT-Base size.\n\n- MiniLMv1-L12-H384-uncased: 12-layer, 384-hidden, 12-heads, 33M parameters, 2.7x faster than BERT-Base\n\n#### Fine-tuning on NLU tasks\n\nWe present the dev results on SQuAD 2.0 and several GLUE benchmark tasks.\n\n| Model | #Param | SQuAD 2.0 | MNLI-m | SST-2 | QNLI | CoLA | RTE | MRPC | QQP |\n|"} {"downloads": 10892, "id": "microsoft/Multilingual-MiniLM-L12-H384", "likes": 32, "pipeline_tag": "text-classification", "task": "text-classification", "meta": {"language": ["multilingual", "en", "ar", "bg", "de", "el", "es", "fr", "hi", "ru", "sw", "th", "tr", "ur", "vi", "zh"], "thumbnail": "https://huggingface.co/front/thumbnails/microsoft.png", "tags": ["text-classification"], "license": "mit"}, "description": "\n\n## MiniLM: Small and Fast Pre-trained Models for Language Understanding and Generation\n\nMiniLM is a distilled model from the paper \"[MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers](https://arxiv.org/abs/2002.10957)\".\n\nPlease find the information about preprocessing, training and full details of the MiniLM in the [original MiniLM repository](https://github.com/microsoft/unilm/blob/master/minilm/).\n\nPlease note: This checkpoint uses `BertModel` with `XLMRobertaTokenizer` so `AutoTokenizer` won't work with this checkpoint!\n\n### Multilingual Pretrained Model\n- Multilingual-MiniLMv1-L12-H384: 12-layer, 384-hidden, 12-heads, 21M Transformer parameters, 96M embedding parameters\n\nMultilingual MiniLM uses the same tokenizer as XLM-R. But the Transformer architecture of our model is the same as BERT. We provide the fine-tuning code on XNLI based on [huggingface/transformers](https://github.com/huggingface/transformers). Please replace `run_xnli.py` in transformers with [ours](https://github.com/microsoft/unilm/blob/master/minilm/examples/run_xnli.py) to fine-tune multilingual MiniLM. \n\nWe evaluate the multilingual MiniLM on cross-lingual natural language inference benchmark (XNLI) and cross-lingual question answering benchmark (MLQA).\n\n#### Cross-Lingual Natural Language Inference - [XNLI](https://arxiv.org/abs/1809.05053)\n\nWe evaluate our model on cross-lingual transfer from English to other languages. Following [Conneau et al. (2019)](https://arxiv.org/abs/1911.02116), we select the best single model on the joint dev set of all the languages.\n\n| Model | #Layers | #Hidden | #Transformer Parameters | Average | en | fr | es | de | el | bg | ru | tr | ar | vi | th | zh | hi | sw | ur |\n|"} {"downloads": 284664, "id": "joeddav/distilbert-base-uncased-go-emotions-student", "likes": 29, "pipeline_tag": "text-classification", "task": "text-classification", "meta": {"language": "en", "tags": ["text-classification", "pytorch", "tensorflow"], "datasets": ["go_emotions"], "license": "mit", "widget": [{"text": "I feel lucky to be here."}]}, "description": "\n\n# distilbert-base-uncased-go-emotions-student\n\n## Model Description\n\nThis model is distilled from the zero-shot classification pipeline on the unlabeled GoEmotions dataset using [this\nscript](https://github.com/huggingface/transformers/tree/master/examples/research_projects/zero-shot-distillation).\nIt was trained with mixed precision for 10 epochs and otherwise used the default script arguments. \n\n## Intended Usage\n\nThe model can be used like any other model trained on GoEmotions, but will likely not perform as well as a model\ntrained with full supervision. It is primarily intended as a demo of how an expensive NLI-based zero-shot model\ncan be distilled to a more efficient student, allowing a classifier to be trained with only unlabeled data. Note\nthat although the GoEmotions dataset allow multiple labels per instance, the teacher used single-label \nclassification to create psuedo-labels.\n"} {"downloads": 105810, "id": "nbroad/ESG-BERT", "likes": 28, "pipeline_tag": "text-classification", "task": "text-classification", "meta": {"language": ["en"], "widget": [{"text": "In fiscal year 2019, we reduced our comprehensive carbon footprint for the fourth consecutive year\u2014down 35 percent compared to 2015, when Apple\u2019s carbon emissions peaked, even as net revenue increased by 11 percent over that same period. In the past year, we avoided over 10 million metric tons from our emissions reduction initiatives\u2014like our Supplier Clean Energy Program, which lowered our footprint by 4.4 million metric tons. ", "example_title": "Reduced carbon footprint"}, {"text": "We believe it is essential to establish validated conflict-free sources of 3TG within the Democratic Republic of the Congo (the \u201cDRC\u201d) and adjoining countries (together, with the DRC, the \u201cCovered Countries\u201d), so that these minerals can be procured in a way that contributes to economic growth and development in the region. To aid in this effort, we have established a conflict minerals policy and an internal team to implement the policy.", "example_title": "Conflict minerals policy"}]}, "description": "\n# Model Card for ESG-BERT\nDomain Specific BERT Model for Text Mining in Sustainable Investing\n \n \n \n# Model Details\n \n## Model Description\n \n \n \n- **Developed by:** [Mukut Mukherjee](https://www.linkedin.com/in/mukutm/), [Charan Pothireddi](https://www.linkedin.com/in/sree-charan-pothireddi-6a0a3587/) and [Parabole.ai](https://www.linkedin.com/in/sree-charan-pothireddi-6a0a3587/)\n- **Shared by [Optional]:** HuggingFace\n- **Model type:** Language model\n- **Language(s) (NLP):** en\n- **License:** More information needed\n- **Related Models:** \n - **Parent Model:** BERT\n- **Resources for more information:** \n - [GitHub Repo](https://github.com/mukut03/ESG-BERT)\n - [Blog Post](https://towardsdatascience.com/nlp-meets-sustainable-investing-d0542b3c264b?source=friends_link&sk=1f7e6641c3378aaff319a81decf387bf)\n \n# Uses\n \n \n## Direct Use\n \nText Mining in Sustainable Investing\n \n## Downstream Use [Optional]\n \nThe applications of ESG-BERT can be expanded way beyond just text classification. It can be fine-tuned to perform various other downstream NLP tasks in the domain of Sustainable Investing.\n \n## Out-of-Scope Use\n \nThe model should not be used to intentionally create hostile or alienating environments for people. \n# Bias, Risks, and Limitations\n \n \nSignificant research has explored bias and fairness issues with language models (see, e.g., [Sheng et al. (2021)](https://aclanthology.org/2021.acl-long.330.pdf) and [Bender et al. (2021)](https://dl.acm.org/doi/pdf/10.1145/3442188.3445922)). Predictions generated by the model may include disturbing and harmful stereotypes across protected classes; identity characteristics; and sensitive, social, and occupational groups.\n \n \n## Recommendations\n \n \nUsers (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recomendations.\n \n \n# Training Details\n \n## Training Data\n \nMore information needed\n \n## Training Procedure\n \n\n \n### Preprocessing\n \nMore information needed\n \n### Speeds, Sizes, Times\n \nMore information needed\n \n# Evaluation\n \n \n \n## Testing Data, Factors & Metrics\n \n### Testing Data\n \nThe fine-tuned model for text classification is also available [here](https://drive.google.com/drive/folders/1Qz4HP3xkjLfJ6DGCFNeJ7GmcPq65_HVe?usp=sharing). It can be used directly to make predictions using just a few steps. First, download the fine-tuned pytorch_model.bin, config.json, and vocab.txt\n \n### Factors\n \nMore information needed\n \n### Metrics\n \nMore information needed\n \n## Results \n \nESG-BERT was further trained on unstructured text data with accuracies of 100% and 98% for Next Sentence Prediction and Masked Language Modelling tasks. Fine-tuning ESG-BERT for text classification yielded an F-1 score of 0.90. For comparison, the general BERT (BERT-base) model scored 0.79 after fine-tuning, and the sci-kit learn approach scored 0.67.\n \n# Model Examination\n \nMore information needed\n \n# Environmental Impact\n \n \nCarbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).\n \n- **Hardware Type:** More information needed\n- **Hours used:** More information needed\n- **Cloud Provider:** information needed\n- **Compute Region:** More information needed\n- **Carbon Emitted:** More information needed\n \n# Technical Specifications [optional]\n \n## Model Architecture and Objective\n \nMore information needed\n \n## Compute Infrastructure\n \nMore information needed\n \n### Hardware\n \nMore information needed\n \n### Software\n \nJDK 11 is needed to serve the model\n \n# Citation\n \n\n \n**BibTeX:**\n \nMore information needed\n \n**APA:**\n \nMore information needed\n \n# Glossary [optional]\n \n\n \nMore information needed\n \n# More Information [optional]\n \nMore information needed\n \n# Model Card Authors [optional]\n[Mukut Mukherjee](https://www.linkedin.com/in/mukutm/), [Charan Pothireddi](https://www.linkedin.com/in/sree-charan-pothireddi-6a0a3587/) and [Parabole.ai](https://www.linkedin.com/in/sree-charan-pothireddi-6a0a3587/), in collaboration with the Ezi Ozoani and the HuggingFace Team\n \n \n# Model Card Contact\n \nMore information needed\n \n# How to Get Started with the Model\n \nUse the code below to get started with the model.\n \n
\n Click to expand \n \n```\npip install torchserve torch-model-archiver\n \npip install torchvision\n \npip install transformers\n \n```\n \nNext up, we'll set up the handler script. It is a basic handler for text classification that can be improved upon. Save this script as \"handler.py\" in your directory. [1]\n \n```\n \nfrom abc import ABC\n \nimport json\n \nimport logging\n \nimport os\n \nimport torch\n \nfrom transformers import AutoModelForSequenceClassification, AutoTokenizer\n \nfrom ts.torch_handler.base_handler import BaseHandler\n \nlogger = logging.getLogger(__name__)\n \nclass TransformersClassifierHandler(BaseHandler, ABC):\n \n \"\"\"\n \n Transformers text classifier handler class. This handler takes a text (string) and\n \n as input and returns the classification text based on the serialized transformers checkpoint.\n \n \"\"\"\n \n def __init__(self):\n \n super(TransformersClassifierHandler, self).__init__()\n \n self.initialized = False\n \ndef initialize(self, ctx):\n \n self.manifest = ctx.manifest\n \nproperties = ctx.system_properties\n \n model_dir = properties.get(\"model_dir\")\n \n self.device = torch.device(\"cuda:\" + str(properties.get(\"gpu_id\")) if torch.cuda.is_available() else \"cpu\")\n \n# Read model serialize/pt file\n \n self.model = AutoModelForSequenceClassification.from_pretrained(model_dir)\n \n self.tokenizer = AutoTokenizer.from_pretrained(model_dir)\n \nself.model.to(self.device)\n \n self.model.eval()\n \nlogger.debug('Transformer model from path {0} loaded successfully'.format(model_dir))\n \n# Read the mapping file, index to object name\n \n mapping_file_path = os.path.join(model_dir, \"index_to_name.json\")\n \nif os.path.isfile(mapping_file_path):\n \n with open(mapping_file_path) as f:\n \n self.mapping = json.load(f)\n \n else:\n \n logger.warning('Missing the index_to_name.json file. Inference output will not include class name.')\n \nself.initialized = True\n \ndef preprocess(self, data):\n \n \"\"\" Very basic preprocessing code - only tokenizes.\n \n Extend with your own preprocessing steps as needed.\n \n \"\"\"\n \n text = data[0].get(\"data\")\n \n if text is None:\n \n text = data[0].get(\"body\")\n \n sentences = text.decode('utf-8')\n \n logger.info(\"Received text: '%s'\", sentences)\n \ninputs = self.tokenizer.encode_plus(\n \n sentences,\n \n add_special_tokens=True,\n \n return_tensors=\"pt\"\n \n )\n \n return inputs\n \ndef inference(self, inputs):\n \n \"\"\"\n \n Predict the class of a text using a trained transformer model.\n \n \"\"\"\n \n # NOTE: This makes the assumption that your model expects text to be tokenized \n \n # with \"input_ids\" and \"token_type_ids\" - which is true for some popular transformer models, e.g. bert.\n \n # If your transformer model expects different tokenization, adapt this code to suit\n \n # its expected input format.\n \n prediction = self.model(\n \n inputs['input_ids'].to(self.device),\n \n token_type_ids=inputs['token_type_ids'].to(self.device)\n \n )[0].argmax().item()\n \n logger.info(\"Model predicted: '%s'\", prediction)\n \nif self.mapping:\n \n prediction = self.mapping[str(prediction)]\n \nreturn [prediction]\n \ndef postprocess(self, inference_output):\n \n # TODO: Add any needed post-processing of the model predictions here\n \n return inference_output\n \n_service = TransformersClassifierHandler()\n \ndef handle(data, context):\n \n try:\n \n if not _service.initialized:\n \n _service.initialize(context)\n \nif data is None:\n \n return None\n \ndata = _service.preprocess(data)\n \n data = _service.inference(data)\n \n data = _service.postprocess(data)\n \nreturn data\n \n except Exception as e:\n \n raise e\n \n \n \n```\n \nTorcheServe uses a format called MAR (Model Archive). We can convert our PyTorch model to a .mar file using this command:\n \n```\n \ntorch-model-archiver --model-name \"bert\" --version 1.0 --serialized-file ./bert_model/pytorch_model.bin --extra-files \"./bert_model/config.json,./bert_model/vocab.txt\" --handler \"./handler.py\"\n \n```\n \nMove the .mar file into a new directory: \n \n```\n \nmkdir model_store && mv bert.mar model_store\n \n```\n \nFinally, we can start TorchServe using the command: \n \n```\n \ntorchserve --start --model-store model_store --models bert=bert.mar\n \n```\n \nWe can now query the model from another terminal window using the Inference API. We pass a text file containing text that the model will try to classify. \n \n\n \n \n```\n \ncurl -X POST http://127.0.0.1:8080/predictions/bert -T predict.txt\n \n```\n \nThis returns a label number which correlates to a textual label. This is stored in the label_dict.txt dictionary file. \n \n```\n \n__label__Business_Ethics : 0\n \n__label__Data_Security : 1\n \n__label__Access_And_Affordability : 2\n \n__label__Business_Model_Resilience : 3\n \n__label__Competitive_Behavior : 4\n \n__label__Critical_Incident_Risk_Management : 5\n \n__label__Customer_Welfare : 6\n \n__label__Director_Removal : 7\n \n__label__Employee_Engagement_Inclusion_And_Diversity : 8\n \n__label__Employee_Health_And_Safety : 9\n \n__label__Human_Rights_And_Community_Relations : 10\n \n__label__Labor_Practices : 11\n \n__label__Management_Of_Legal_And_Regulatory_Framework : 12\n \n__label__Physical_Impacts_Of_Climate_Change : 13\n \n__label__Product_Quality_And_Safety : 14\n \n__label__Product_Design_And_Lifecycle_Management : 15\n \n__label__Selling_Practices_And_Product_Labeling : 16\n \n__label__Supply_Chain_Management : 17\n \n__label__Systemic_Risk_Management : 18\n \n__label__Waste_And_Hazardous_Materials_Management : 19\n \n__label__Water_And_Wastewater_Management : 20\n \n__label__Air_Quality : 21\n \n__label__Customer_Privacy : 22\n \n__label__Ecological_Impacts : 23\n \n__label__Energy_Management : 24\n \n__label__GHG_Emissions : 25\n \n```\n\n<\\details>\n"} {"downloads": 88574, "id": "cardiffnlp/twitter-roberta-base-emotion", "likes": 26, "pipeline_tag": "text-classification", "task": "text-classification", "meta": {}, "description": "# Twitter-roBERTa-base for Emotion Recognition\n\nThis is a roBERTa-base model trained on ~58M tweets and finetuned for emotion recognition with the TweetEval benchmark.\n\n- Paper: [_TweetEval_ benchmark (Findings of EMNLP 2020)](https://arxiv.org/pdf/2010.12421.pdf). \n- Git Repo: [Tweeteval official repository](https://github.com/cardiffnlp/tweeteval).\n\n## Example of classification\n\n```python\nfrom transformers import AutoModelForSequenceClassification\nfrom transformers import TFAutoModelForSequenceClassification\nfrom transformers import AutoTokenizer\nimport numpy as np\nfrom scipy.special import softmax\nimport csv\nimport urllib.request\n\n# Preprocess text (username and link placeholders)\ndef preprocess(text):\n new_text = []\n for t in text.split(\" \"):\n t = '@user' if t.startswith('@') and len(t) > 1 else t\n t = 'http' if t.startswith('http') else t\n new_text.append(t)\n return \" \".join(new_text)\n\n# Tasks:\n# emoji, emotion, hate, irony, offensive, sentiment\n# stance/abortion, stance/atheism, stance/climate, stance/feminist, stance/hillary\n\ntask='emotion'\nMODEL = f\"cardiffnlp/twitter-roberta-base-{task}\"\n\ntokenizer = AutoTokenizer.from_pretrained(MODEL)\n\n# download label mapping\nmapping_link = f\"https://raw.githubusercontent.com/cardiffnlp/tweeteval/main/datasets/{task}/mapping.txt\"\nwith urllib.request.urlopen(mapping_link) as f:\n html = f.read().decode('utf-8').split(\"\\n\")\n csvreader = csv.reader(html, delimiter='\\t')\nlabels = [row[1] for row in csvreader if len(row) > 1]\n\n# PT\nmodel = AutoModelForSequenceClassification.from_pretrained(MODEL)\nmodel.save_pretrained(MODEL)\n\ntext = \"Celebrating my promotion \ud83d\ude0e\"\ntext = preprocess(text)\nencoded_input = tokenizer(text, return_tensors='pt')\noutput = model(**encoded_input)\nscores = output[0][0].detach().numpy()\nscores = softmax(scores)\n\n# # TF\n# model = TFAutoModelForSequenceClassification.from_pretrained(MODEL)\n# model.save_pretrained(MODEL)\n\n# text = \"Celebrating my promotion \ud83d\ude0e\"\n# encoded_input = tokenizer(text, return_tensors='tf')\n# output = model(encoded_input)\n# scores = output[0][0].numpy()\n# scores = softmax(scores)\n\nranking = np.argsort(scores)\nranking = ranking[::-1]\nfor i in range(scores.shape[0]):\n l = labels[ranking[i]]\n s = scores[ranking[i]]\n print(f\"{i+1}) {l} {np.round(float(s), 4)}\")\n\n```\n\nOutput: \n\n```\n1) joy 0.9382\n2) optimism 0.0362\n3) anger 0.0145\n4) sadness 0.0112\n```\n"} {"downloads": 12980, "id": "IDEA-CCNL/Erlangshen-Roberta-110M-Sentiment", "likes": 25, "pipeline_tag": "text-classification", "task": "text-classification", "meta": {"language": ["zh"], "license": "apache-2.0", "tags": ["roberta", "NLU", "Sentiment", "Chinese"], "inference": true, "widget": [{"text": "\u4eca\u5929\u5fc3\u60c5\u4e0d\u597d"}]}, "description": "\n# Erlangshen-Roberta-110M-Sentiment\n\n- Github: [Fengshenbang-LM](https://github.com/IDEA-CCNL/Fengshenbang-LM)\n- Docs: [Fengshenbang-Docs](https://fengshenbang-doc.readthedocs.io/)\n\n## \u7b80\u4ecb Brief Introduction\n\n\u4e2d\u6587\u7684RoBERTa-wwm-ext-base\u5728\u6570\u4e2a\u60c5\u611f\u5206\u6790\u4efb\u52a1\u5fae\u8c03\u540e\u7684\u7248\u672c\n\nThis is the fine-tuned version of the Chinese RoBERTa-wwm-ext-base model on several sentiment analysis datasets.\n\n## \u6a21\u578b\u5206\u7c7b Model Taxonomy\n\n| \u9700\u6c42 Demand | \u4efb\u52a1 Task | \u7cfb\u5217 Series | \u6a21\u578b Model | \u53c2\u6570 Parameter | \u989d\u5916 Extra |\n| :"} {"downloads": 70469, "id": "bhadresh-savani/bert-base-go-emotion", "likes": 22, "pipeline_tag": "text-classification", "task": "text-classification", "meta": {"language": ["en"], "thumbnail": "https://avatars3.githubusercontent.com/u/32437151?s=460&u=4ec59abc8d21d5feea3dab323d23a5860e6996a4&v=4", "tags": ["text-classification", "go-emotion", "pytorch"], "license": "apache-2.0", "datasets": ["go_emotions"], "metrics": ["Accuracy"]}, "description": "\n# Bert-Base-Uncased-Go-Emotion\n\n## Model description:\n\n## Training Parameters:\n```\nNum examples = 169208\nNum Epochs = 3\nInstantaneous batch size per device = 16\nTotal train batch size (w. parallel, distributed & accumulation) = 16\nGradient Accumulation steps = 1\nTotal optimization steps = 31728\n```\n\n## TrainOutput:\n```\n'train_loss': 0.12085497042373672, \n```\n\n## Evalution Output:\n```\n 'eval_accuracy_thresh': 0.9614765048027039,\n 'eval_loss': 0.1164659634232521\n```\n\n## Colab Notebook:\n[Notebook](https://github.com/bhadreshpsavani/UnderstandingNLP/blob/master/go_emotion_of_transformers_multilabel_text_classification_v2.ipynb)"} {"downloads": 7164, "id": "michiyasunaga/BioLinkBERT-base", "likes": 22, "pipeline_tag": "text-classification", "task": "text-classification", "meta": {"license": "apache-2.0", "language": "en", "datasets": ["pubmed"], "tags": ["bert", "exbert", "linkbert", "biolinkbert", "feature-extraction", "fill-mask", "question-answering", "text-classification", "token-classification"], "widget": [{"text": "Sunitinib is a tyrosine kinase inhibitor"}]}, "description": "\r\n\r\n## BioLinkBERT-base\r\n\r\nBioLinkBERT-base model pretrained on [PubMed](https://pubmed.ncbi.nlm.nih.gov/) abstracts along with citation link information. It is introduced in the paper [LinkBERT: Pretraining Language Models with Document Links (ACL 2022)](https://arxiv.org/abs/2203.15827). The code and data are available in [this repository](https://github.com/michiyasunaga/LinkBERT).\r\n\r\nThis model achieves state-of-the-art performance on several biomedical NLP benchmarks such as [BLURB](https://microsoft.github.io/BLURB/) and [MedQA-USMLE](https://github.com/jind11/MedQA).\r\n\r\n\r\n## Model description\r\n\r\nLinkBERT is a transformer encoder (BERT-like) model pretrained on a large corpus of documents. It is an improvement of BERT that newly captures **document links** such as hyperlinks and citation links to include knowledge that spans across multiple documents. Specifically, it was pretrained by feeding linked documents into the same language model context, besides a single document.\r\n\r\nLinkBERT can be used as a drop-in replacement for BERT. It achieves better performance for general language understanding tasks (e.g. text classification), and is also particularly effective for **knowledge-intensive** tasks (e.g. question answering) and **cross-document** tasks (e.g. reading comprehension, document retrieval).\r\n\r\n\r\n## Intended uses & limitations\r\n\r\nThe model can be used by fine-tuning on a downstream task, such as question answering, sequence classification, and token classification.\r\nYou can also use the raw model for feature extraction (i.e. obtaining embeddings for input text).\r\n\r\n\r\n### How to use\r\n\r\nTo use the model to get the features of a given text in PyTorch:\r\n\r\n```python\r\nfrom transformers import AutoTokenizer, AutoModel\r\ntokenizer = AutoTokenizer.from_pretrained('michiyasunaga/BioLinkBERT-base')\r\nmodel = AutoModel.from_pretrained('michiyasunaga/BioLinkBERT-base')\r\ninputs = tokenizer(\"Sunitinib is a tyrosine kinase inhibitor\", return_tensors=\"pt\")\r\noutputs = model(**inputs)\r\nlast_hidden_states = outputs.last_hidden_state\r\n```\r\n\r\nFor fine-tuning, you can use [this repository](https://github.com/michiyasunaga/LinkBERT) or follow any other BERT fine-tuning codebases.\r\n\r\n\r\n## Evaluation results\r\n\r\nWhen fine-tuned on downstream tasks, LinkBERT achieves the following results.\r\n\r\n**Biomedical benchmarks ([BLURB](https://microsoft.github.io/BLURB/), [MedQA](https://github.com/jind11/MedQA), [MMLU](https://github.com/hendrycks/test), etc.):** BioLinkBERT attains new state-of-the-art.\r\n\r\n| | BLURB score | PubMedQA | BioASQ | MedQA-USMLE |\r\n| "} {"downloads": 6184, "id": "microsoft/xtremedistil-l6-h384-uncased", "likes": 22, "pipeline_tag": "text-classification", "task": "text-classification", "meta": {"language": "en", "thumbnail": "https://huggingface.co/front/thumbnails/microsoft.png", "tags": ["text-classification"], "license": "mit"}, "description": "\n\n# XtremeDistilTransformers for Distilling Massive Neural Networks\n\nXtremeDistilTransformers is a distilled task-agnostic transformer model that leverages task transfer for learning a small universal model that can be applied to arbitrary tasks and languages as outlined in the paper [XtremeDistilTransformers: Task Transfer for Task-agnostic Distillation](https://arxiv.org/abs/2106.04563).\n\nWe leverage task transfer combined with multi-task distillation techniques from the papers [XtremeDistil: Multi-stage Distillation for Massive Multilingual Models](https://www.aclweb.org/anthology/2020.acl-main.202.pdf) and [MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers](https://proceedings.neurips.cc/paper/2020/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf) with the following [Github code](https://github.com/microsoft/xtreme-distil-transformers).\n\nThis l6-h384 checkpoint with **6** layers, **384** hidden size, **12** attention heads corresponds to **22 million** parameters with **5.3x** speedup over BERT-base.\n\nOther available checkpoints: [xtremedistil-l6-h256-uncased](https://huggingface.co/microsoft/xtremedistil-l6-h256-uncased) and [xtremedistil-l12-h384-uncased](https://huggingface.co/microsoft/xtremedistil-l12-h384-uncased) \n\nThe following table shows the results on GLUE dev set and SQuAD-v2.\n\n| Models | #Params | Speedup | MNLI | QNLI | QQP | RTE | SST | MRPC | SQUAD2 | Avg |\n|"} {"downloads": 5159, "id": "ElKulako/cryptobert", "likes": 21, "pipeline_tag": "text-classification", "task": "text-classification", "meta": {"datasets": ["ElKulako/stocktwits-crypto"], "language": ["en"], "tags": ["cryptocurrency", "crypto", "BERT", "sentiment classification", "NLP", "bitcoin", "ethereum", "shib", "social media", "sentiment analysis", "cryptocurrency sentiment analysis"]}, "description": "\n\n# CryptoBERT\nCryptoBERT is a pre-trained NLP model to analyse the language and sentiments of cryptocurrency-related social media posts and messages. It was built by further training the [vinai's bertweet-base](https://huggingface.co/vinai/bertweet-base) language model on the cryptocurrency domain, using a corpus of over 3.2M unique cryptocurrency-related social media posts. \n(A research paper with more details will follow soon.)\n## Classification Training\nThe model was trained on the following labels: \"Bearish\" : 0, \"Neutral\": 1, \"Bullish\": 2\n\nCryptoBERT's sentiment classification head was fine-tuned on a balanced dataset of 2M labelled StockTwits posts, sampled from [ElKulako/stocktwits-crypto](https://huggingface.co/datasets/ElKulako/stocktwits-crypto). \n\nCryptoBERT was trained with a max sequence length of 128. Technically, it can handle sequences of up to 514 tokens, however, going beyond 128 is not recommended.\n\n# Classification Example\n```python\nfrom transformers import TextClassificationPipeline, AutoModelForSequenceClassification, AutoTokenizer\nmodel_name = \"ElKulako/cryptobert\"\ntokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True)\nmodel = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels = 3)\npipe = TextClassificationPipeline(model=model, tokenizer=tokenizer, max_length=64, truncation=True, padding = 'max_length')\n# post_1 & post_3 = bullish, post_2 = bearish\npost_1 = \" see y'all tomorrow and can't wait to see ada in the morning, i wonder what price it is going to be at. \ud83d\ude0e\ud83d\udc02\ud83e\udd20\ud83d\udcaf\ud83d\ude34, bitcoin is looking good go for it and flash by that 45k. \"\npost_2 = \" alright racers, it\u2019s a race to the bottom! good luck today and remember there are no losers (minus those who invested in currency nobody really uses) take your marks... are you ready? go!!\" \npost_3 = \" i'm never selling. the whole market can bottom out. i'll continue to hold this dumpster fire until the day i die if i need to.\" \ndf_posts = [post_1, post_2, post_3]\npreds = pipe(df_posts)\nprint(preds)\n\n\n```\n\n```\n[{'label': 'Bullish', 'score': 0.8734585642814636}, {'label': 'Bearish', 'score': 0.9889495372772217}, {'label': 'Bullish', 'score': 0.6595883965492249}]\n```\n\n## Training Corpus\nCryptoBERT was trained on 3.2M social media posts regarding various cryptocurrencies. Only non-duplicate posts of length above 4 words were considered. The following communities were used as sources for our corpora:\n\n\n(1) StockTwits - 1.875M posts about the top 100 cryptos by trading volume. Posts were collected from the 1st of November 2021 to the 16th of June 2022. [ElKulako/stocktwits-crypto](https://huggingface.co/datasets/ElKulako/stocktwits-crypto)\n\n(2) Telegram - 664K posts from top 5 telegram groups: [Binance](https://t.me/binanceexchange), [Bittrex](https://t.me/BittrexGlobalEnglish), [huobi global](https://t.me/huobiglobalofficial), [Kucoin](https://t.me/Kucoin_Exchange), [OKEx](https://t.me/OKExOfficial_English). \nData from 16.11.2020 to 30.01.2021. Courtesy of [Anton](https://www.kaggle.com/datasets/aagghh/crypto-telegram-groups).\n\n(3) Reddit - 172K comments from various crypto investing threads, collected from May 2021 to May 2022\n\n(4) Twitter - 496K posts with hashtags XBT, Bitcoin or BTC. Collected for May 2018. Courtesy of [Paul](https://www.kaggle.com/datasets/paul92s/bitcoin-tweets-14m)."} {"downloads": 1866, "id": "uer/roberta-base-finetuned-chinanews-chinese", "likes": 20, "pipeline_tag": "text-classification", "task": "text-classification", "meta": {"language": "zh", "widget": [{"text": "\u8fd9\u672c\u4e66\u771f\u7684\u5f88\u4e0d\u9519"}]}, "description": "\n\n# Chinese RoBERTa-Base Models for Text Classification\n\n## Model description\n\nThis is the set of 5 Chinese RoBERTa-Base classification models fine-tuned by [UER-py](https://arxiv.org/abs/1909.05658). You can download the 5 Chinese RoBERTa-Base classification models either from the [UER-py Modelzoo page](https://github.com/dbiir/UER-py/wiki/Modelzoo) (in UER-py format), or via HuggingFace from the links below:\n\n| Dataset | Link |\n| :"} {"downloads": 180242, "id": "finiteautomata/beto-sentiment-analysis", "likes": 19, "pipeline_tag": "text-classification", "task": "text-classification", "meta": {"language": ["es"], "tags": ["sentiment-analysis"]}, "description": "\n\n# Sentiment Analysis in Spanish\n## beto-sentiment-analysis\n\n**NOTE: this model will be removed soon -- use [pysentimiento/robertuito-sentiment-analysis](https://huggingface.co/pysentimiento/robertuito-sentiment-analysis) instead**\n\nRepository: [https://github.com/finiteautomata/pysentimiento/](https://github.com/pysentimiento/pysentimiento/)\n\n\nModel trained with TASS 2020 corpus (around ~5k tweets) of several dialects of Spanish. Base model is [BETO](https://github.com/dccuchile/beto), a BERT model trained in Spanish.\n\nUses `POS`, `NEG`, `NEU` labels.\n\n## License\n\n`pysentimiento` is an open-source library for non-commercial use and scientific research purposes only. Please be aware that models are trained with third-party datasets and are subject to their respective licenses. \n\n1. [TASS Dataset license](http://tass.sepln.org/tass_data/download.php)\n2. [SEMEval 2017 Dataset license]()\n\n## Citation\n\nIf you use this model in your work, please cite the following papers:\n\n```\n@misc{perez2021pysentimiento,\n title={pysentimiento: A Python Toolkit for Sentiment Analysis and SocialNLP tasks},\n author={Juan Manuel P\u00e9rez and Juan Carlos Giudici and Franco Luque},\n year={2021},\n eprint={2106.09462},\n archivePrefix={arXiv},\n primaryClass={cs.CL}\n}\n\n@article{canete2020spanish,\n title={Spanish pre-trained bert model and evaluation data},\n author={Ca{\\~n}ete, Jos{\\'e} and Chaperon, Gabriel and Fuentes, Rodrigo and Ho, Jou-Hui and Kang, Hojin and P{\\'e}rez, Jorge},\n journal={Pml4dc at iclr},\n volume={2020},\n number={2020},\n pages={1--10},\n year={2020}\n}\n```\n\nEnjoy! \ud83e\udd17\n"} {"downloads": 1102469, "id": "dslim/bert-base-NER", "likes": 134, "pipeline_tag": "token-classification", "task": "token-classification", "meta": {"language": "en", "datasets": ["conll2003"], "license": "mit"}, "description": "\n# bert-base-NER\n\n## Model description\n\n**bert-base-NER** is a fine-tuned BERT model that is ready to use for **Named Entity Recognition** and achieves **state-of-the-art performance** for the NER task. It has been trained to recognize four types of entities: location (LOC), organizations (ORG), person (PER) and Miscellaneous (MISC). \n\nSpecifically, this model is a *bert-base-cased* model that was fine-tuned on the English version of the standard [CoNLL-2003 Named Entity Recognition](https://www.aclweb.org/anthology/W03-0419.pdf) dataset. \n\nIf you'd like to use a larger BERT-large model fine-tuned on the same dataset, a [**bert-large-NER**](https://huggingface.co/dslim/bert-large-NER/) version is also available. \n\n\n## Intended uses & limitations\n\n#### How to use\n\nYou can use this model with Transformers *pipeline* for NER.\n\n```python\nfrom transformers import AutoTokenizer, AutoModelForTokenClassification\nfrom transformers import pipeline\n\ntokenizer = AutoTokenizer.from_pretrained(\"dslim/bert-base-NER\")\nmodel = AutoModelForTokenClassification.from_pretrained(\"dslim/bert-base-NER\")\n\nnlp = pipeline(\"ner\", model=model, tokenizer=tokenizer)\nexample = \"My name is Wolfgang and I live in Berlin\"\n\nner_results = nlp(example)\nprint(ner_results)\n```\n\n#### Limitations and bias\n\nThis model is limited by its training dataset of entity-annotated news articles from a specific span of time. This may not generalize well for all use cases in different domains. Furthermore, the model occassionally tags subword tokens as entities and post-processing of results may be necessary to handle those cases. \n\n## Training data\n\nThis model was fine-tuned on English version of the standard [CoNLL-2003 Named Entity Recognition](https://www.aclweb.org/anthology/W03-0419.pdf) dataset. \n\nThe training dataset distinguishes between the beginning and continuation of an entity so that if there are back-to-back entities of the same type, the model can output where the second entity begins. As in the dataset, each token will be classified as one of the following classes:\n\nAbbreviation|Description\n-|-\nO|Outside of a named entity\nB-MIS |Beginning of a miscellaneous entity right after another miscellaneous entity\nI-MIS | Miscellaneous entity\nB-PER |Beginning of a person\u2019s name right after another person\u2019s name\nI-PER |Person\u2019s name\nB-ORG |Beginning of an organization right after another organization\nI-ORG |organization\nB-LOC |Beginning of a location right after another location\nI-LOC |Location\n\n\n### CoNLL-2003 English Dataset Statistics\nThis dataset was derived from the Reuters corpus which consists of Reuters news stories. You can read more about how this dataset was created in the CoNLL-2003 paper. \n#### # of training examples per entity type\nDataset|LOC|MISC|ORG|PER\n-|-|-|-|-\nTrain|7140|3438|6321|6600\nDev|1837|922|1341|1842\nTest|1668|702|1661|1617\n#### # of articles/sentences/tokens per dataset\nDataset |Articles |Sentences |Tokens\n-|-|-|-\nTrain |946 |14,987 |203,621\nDev |216 |3,466 |51,362\nTest |231 |3,684 |46,435\n\n## Training procedure\n\nThis model was trained on a single NVIDIA V100 GPU with recommended hyperparameters from the [original BERT paper](https://arxiv.org/pdf/1810.04805) which trained & evaluated the model on CoNLL-2003 NER task. \n\n## Eval results\nmetric|dev|test\n-|-|-\nf1 |95.1 |91.3\nprecision |95.0 |90.7\nrecall |95.3 |91.9\n\nThe test metrics are a little lower than the official Google BERT results which encoded document context & experimented with CRF. More on replicating the original results [here](https://github.com/google-research/bert/issues/223).\n\n### BibTeX entry and citation info\n\n```\n@article{DBLP:journals/corr/abs-1810-04805,\n author = {Jacob Devlin and\n Ming{-}Wei Chang and\n Kenton Lee and\n Kristina Toutanova},\n title = {{BERT:} Pre-training of Deep Bidirectional Transformers for Language\n Understanding},\n journal = {CoRR},\n volume = {abs/1810.04805},\n year = {2018},\n url = {http://arxiv.org/abs/1810.04805},\n archivePrefix = {arXiv},\n eprint = {1810.04805},\n timestamp = {Tue, 30 Oct 2018 20:39:56 +0100},\n biburl = {https://dblp.org/rec/journals/corr/abs-1810-04805.bib},\n bibsource = {dblp computer science bibliography, https://dblp.org}\n}\n```\n```\n@inproceedings{tjong-kim-sang-de-meulder-2003-introduction,\n title = \"Introduction to the {C}o{NLL}-2003 Shared Task: Language-Independent Named Entity Recognition\",\n author = \"Tjong Kim Sang, Erik F. and\n De Meulder, Fien\",\n booktitle = \"Proceedings of the Seventh Conference on Natural Language Learning at {HLT}-{NAACL} 2003\",\n year = \"2003\",\n url = \"https://www.aclweb.org/anthology/W03-0419\",\n pages = \"142--147\",\n}\n```\n"} {"downloads": 653292, "id": "Jean-Baptiste/camembert-ner", "likes": 60, "pipeline_tag": "token-classification", "task": "token-classification", "meta": {"language": "fr", "datasets": ["Jean-Baptiste/wikiner_fr"], "widget": [{"text": "Je m'appelle jean-baptiste et je vis \u00e0 montr\u00e9al"}, {"text": "george washington est all\u00e9 \u00e0 washington"}], "license": "mit"}, "description": "\n\n# camembert-ner: model fine-tuned from camemBERT for NER task.\n\n## Introduction\n\n[camembert-ner] is a NER model that was fine-tuned from camemBERT on wikiner-fr dataset.\nModel was trained on wikiner-fr dataset (~170 634 sentences).\nModel was validated on emails/chat data and overperformed other models on this type of data specifically. \nIn particular the model seems to work better on entity that don't start with an upper case.\n\n## Training data\nTraining data was classified as follow:\n\nAbbreviation|Description\n-|-\nO |Outside of a named entity\nMISC |Miscellaneous entity\nPER |Person\u2019s name\nORG |Organization\nLOC |Location\n\n\n## How to use camembert-ner with HuggingFace\n\n##### Load camembert-ner and its sub-word tokenizer :\n\n```python\nfrom transformers import AutoTokenizer, AutoModelForTokenClassification\n\ntokenizer = AutoTokenizer.from_pretrained(\"Jean-Baptiste/camembert-ner\")\nmodel = AutoModelForTokenClassification.from_pretrained(\"Jean-Baptiste/camembert-ner\")\n\n\n##### Process text sample (from wikipedia)\n\nfrom transformers import pipeline\n\nnlp = pipeline('ner', model=model, tokenizer=tokenizer, aggregation_strategy=\"simple\")\nnlp(\"Apple est cr\u00e9\u00e9e le 1er avril 1976 dans le garage de la maison d'enfance de Steve Jobs \u00e0 Los Altos en Californie par Steve Jobs, Steve Wozniak et Ronald Wayne14, puis constitu\u00e9e sous forme de soci\u00e9t\u00e9 le 3 janvier 1977 \u00e0 l'origine sous le nom d'Apple Computer, mais pour ses 30 ans et pour refl\u00e9ter la diversification de ses produits, le mot \u00ab computer \u00bb est retir\u00e9 le 9 janvier 2015.\")\n\n\n[{'entity_group': 'ORG',\n 'score': 0.9472818374633789,\n 'word': 'Apple',\n 'start': 0,\n 'end': 5},\n {'entity_group': 'PER',\n 'score': 0.9838564991950989,\n 'word': 'Steve Jobs',\n 'start': 74,\n 'end': 85},\n {'entity_group': 'LOC',\n 'score': 0.9831605950991312,\n 'word': 'Los Altos',\n 'start': 87,\n 'end': 97},\n {'entity_group': 'LOC',\n 'score': 0.9834540486335754,\n 'word': 'Californie',\n 'start': 100,\n 'end': 111},\n {'entity_group': 'PER',\n 'score': 0.9841555754343668,\n 'word': 'Steve Jobs',\n 'start': 115,\n 'end': 126},\n {'entity_group': 'PER',\n 'score': 0.9843501806259155,\n 'word': 'Steve Wozniak',\n 'start': 127,\n 'end': 141},\n {'entity_group': 'PER',\n 'score': 0.9841533899307251,\n 'word': 'Ronald Wayne',\n 'start': 144,\n 'end': 157},\n {'entity_group': 'ORG',\n 'score': 0.9468960364659628,\n 'word': 'Apple Computer',\n 'start': 243,\n 'end': 257}]\n\n```\n\n\n## Model performances (metric: seqeval)\n\nOverall\n\nprecision|recall|f1\n-|-|-\n0.8859|0.8971|0.8914\n\nBy entity\n\nentity|precision|recall|f1\n-|-|-|-\nPER|0.9372|0.9598|0.9483 \nORG|0.8099|0.8265|0.8181\nLOC|0.8905|0.9005|0.8955\nMISC|0.8175|0.8117|0.8146\n\n\n\n\nFor those who could be interested, here is a short article on how I used the results of this model to train a LSTM model for signature detection in emails:\nhttps://medium.com/@jean-baptiste.polle/lstm-model-for-email-signature-detection-8e990384fefa\n"} {"downloads": 68744, "id": "oliverguhr/fullstop-punctuation-multilang-large", "likes": 58, "pipeline_tag": "token-classification", "task": "token-classification", "meta": {"language": ["en", "de", "fr", "it", "multilingual"], "tags": ["punctuation prediction", "punctuation"], "datasets": "wmt/europarl", "license": "mit", "widget": [{"text": "Ho sentito che ti sei laureata il che mi fa molto piacere", "example_title": "Italian"}, {"text": "Tous les matins vers quatre heures mon p\u00e8re ouvrait la porte de ma chambre", "example_title": "French"}, {"text": "Ist das eine Frage Frau M\u00fcller", "example_title": "German"}, {"text": "Yet she blushed as if with guilt when Cynthia reading her thoughts said to her one day Molly you're very glad to get rid of us are not you", "example_title": "English"}], "metrics": ["f1"]}, "description": "\n\nThis model predicts the punctuation of English, Italian, French and German texts. We developed it to restore the punctuation of transcribed spoken language. \n\nThis multilanguage model was trained on the [Europarl Dataset](https://huggingface.co/datasets/wmt/europarl) provided by the [SEPP-NLG Shared Task](https://sites.google.com/view/sentence-segmentation). *Please note that this dataset consists of political speeches. Therefore the model might perform differently on texts from other domains.*\n\nThe model restores the following punctuation markers: **\".\" \",\" \"?\" \"-\" \":\"**\n## Sample Code\nWe provide a simple python package that allows you to process text of any length.\n\n## Install \n\nTo get started install the package from [pypi](https://pypi.org/project/deepmultilingualpunctuation/):\n\n```bash\npip install deepmultilingualpunctuation\n```\n### Restore Punctuation\n```python\nfrom deepmultilingualpunctuation import PunctuationModel\n\nmodel = PunctuationModel()\ntext = \"My name is Clara and I live in Berkeley California Ist das eine Frage Frau M\u00fcller\"\nresult = model.restore_punctuation(text)\nprint(result)\n```\n\n**output**\n> My name is Clara and I live in Berkeley, California. Ist das eine Frage, Frau M\u00fcller?\n\n\n### Predict Labels \n```python\nfrom deepmultilingualpunctuation import PunctuationModel\n\nmodel = PunctuationModel()\ntext = \"My name is Clara and I live in Berkeley California Ist das eine Frage Frau M\u00fcller\"\nclean_text = model.preprocess(text)\nlabled_words = model.predict(clean_text)\nprint(labled_words)\n```\n\n**output**\n\n> [['My', '0', 0.9999887], ['name', '0', 0.99998665], ['is', '0', 0.9998579], ['Clara', '0', 0.6752215], ['and', '0', 0.99990904], ['I', '0', 0.9999877], ['live', '0', 0.9999839], ['in', '0', 0.9999515], ['Berkeley', ',', 0.99800044], ['California', '.', 0.99534047], ['Ist', '0', 0.99998784], ['das', '0', 0.99999154], ['eine', '0', 0.9999918], ['Frage', ',', 0.99622655], ['Frau', '0', 0.9999889], ['M\u00fcller', '?', 0.99863917]]\n\n\n\n\n## Results \n\nThe performance differs for the single punctuation markers as hyphens and colons, in many cases, are optional and can be substituted by either a comma or a full stop. The model achieves the following F1 scores for the different languages:\n\n| Label | EN | DE | FR | IT |\n| "} {"downloads": 37186, "id": "flair/ner-english-ontonotes-large", "likes": 52, "pipeline_tag": "token-classification", "task": "token-classification", "meta": {"tags": ["flair", "token-classification", "sequence-tagger-model"], "language": "en", "datasets": ["ontonotes"], "widget": [{"text": "On September 1st George won 1 dollar while watching Game of Thrones."}]}, "description": "\n\n## English NER in Flair (Ontonotes large model)\n\nThis is the large 18-class NER model for English that ships with [Flair](https://github.com/flairNLP/flair/).\n\nF1-Score: **90.93** (Ontonotes)\n\nPredicts 18 tags:\n\n| **tag** | **meaning** |\n|"} {"downloads": 267535, "id": "xlm-roberta-large-finetuned-conll03-english", "likes": 48, "pipeline_tag": "token-classification", "task": "token-classification", "meta": {"language": ["multilingual", "af", "am", "ar", "as", "az", "be", "bg", "bn", "br", "bs", "ca", "cs", "cy", "da", "de", "el", "en", "eo", "es", "et", "eu", "fa", "fi", "fr", "fy", "ga", "gd", "gl", "gu", "ha", "he", "hi", "hr", "hu", "hy", "id", "is", "it", "ja", "jv", "ka", "kk", "km", "kn", "ko", "ku", "ky", "la", "lo", "lt", "lv", "mg", "mk", "ml", "mn", "mr", "ms", "my", "ne", "nl", false, "om", "or", "pa", "pl", "ps", "pt", "ro", "ru", "sa", "sd", "si", "sk", "sl", "so", "sq", "sr", "su", "sv", "sw", "ta", "te", "th", "tl", "tr", "ug", "uk", "ur", "uz", "vi", "xh", "yi", "zh"]}, "description": "\n\n# xlm-roberta-large-finetuned-conll03-english\n\n# Table of Contents\n\n1. [Model Details](#model-details)\n2. [Uses](#uses)\n3. [Bias, Risks, and Limitations](#bias-risks-and-limitations)\n4. [Training](#training)\n5. [Evaluation](#evaluation)\n6. [Environmental Impact](#environmental-impact)\n7. [Technical Specifications](#technical-specifications)\n8. [Citation](#citation)\n9. [Model Card Authors](#model-card-authors)\n10. [How To Get Started With the Model](#how-to-get-started-with-the-model)\n\n\n# Model Details\n\n## Model Description\n\nThe XLM-RoBERTa model was proposed in [Unsupervised Cross-lingual Representation Learning at Scale](https://arxiv.org/abs/1911.02116) by Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzm\u00e1n, Edouard Grave, Myle Ott, Luke Zettlemoyer and Veselin Stoyanov. It is based on Facebook's RoBERTa model released in 2019. It is a large multi-lingual language model, trained on 2.5TB of filtered CommonCrawl data. This model is [XLM-RoBERTa-large](https://huggingface.co/xlm-roberta-large) fine-tuned with the [conll2003](https://huggingface.co/datasets/conll2003) dataset in English.\n\n- **Developed by:** See [associated paper](https://arxiv.org/abs/1911.02116)\n- **Model type:** Multi-lingual language model\n- **Language(s) (NLP) or Countries (images):** XLM-RoBERTa is a multilingual model trained on 100 different languages; see [GitHub Repo](https://github.com/facebookresearch/fairseq/tree/main/examples/xlmr) for full list; model is fine-tuned on a dataset in English\n- **License:** More information needed\n- **Related Models:** [RoBERTa](https://huggingface.co/roberta-base), [XLM](https://huggingface.co/docs/transformers/model_doc/xlm)\n - **Parent Model:** [XLM-RoBERTa-large](https://huggingface.co/xlm-roberta-large)\n- **Resources for more information:** \n -[GitHub Repo](https://github.com/facebookresearch/fairseq/tree/main/examples/xlmr)\n -[Associated Paper](https://arxiv.org/abs/1911.02116)\n\n# Uses\n\n## Direct Use\n\nThe model is a language model. The model can be used for token classification, a natural language understanding task in which a label is assigned to some tokens in a text. \n\n## Downstream Use\n\nPotential downstream use cases include Named Entity Recognition (NER) and Part-of-Speech (PoS) tagging. To learn more about token classification and other potential downstream use cases, see the Hugging Face [token classification docs](https://huggingface.co/tasks/token-classification).\n\n## Out-of-Scope Use\n\nThe model should not be used to intentionally create hostile or alienating environments for people. \n\n# Bias, Risks, and Limitations\n\n**CONTENT WARNING: Readers should be made aware that language generated by this model may be disturbing or offensive to some and may propagate historical and current stereotypes.**\n\nSignificant research has explored bias and fairness issues with language models (see, e.g., [Sheng et al. (2021)](https://aclanthology.org/2021.acl-long.330.pdf) and [Bender et al. (2021)](https://dl.acm.org/doi/pdf/10.1145/3442188.3445922)). In the context of tasks relevant to this model, [Mishra et al. (2020)](https://arxiv.org/pdf/2008.03415.pdf) explore social biases in NER systems for English and find that there is systematic bias in existing NER systems in that they fail to identify named entities from different demographic groups (though this paper did not look at BERT). For example, using a sample sentence from [Mishra et al. (2020)](https://arxiv.org/pdf/2008.03415.pdf):\n\n```python\n>>> from transformers import pipeline\n>>> tokenizer = AutoTokenizer.from_pretrained(\"xlm-roberta-large-finetuned-conll03-english\")\n>>> model = AutoModelForTokenClassification.from_pretrained(\"xlm-roberta-large-finetuned-conll03-english\")\n>>> classifier = pipeline(\"ner\", model=model, tokenizer=tokenizer)\n>>> classifier(\"Alya told Jasmine that Andrew could pay with cash..\")\n[{'end': 2,\n 'entity': 'I-PER',\n 'index': 1,\n 'score': 0.9997861,\n 'start': 0,\n 'word': '\u2581Al'},\n {'end': 4,\n 'entity': 'I-PER',\n 'index': 2,\n 'score': 0.9998591,\n 'start': 2,\n 'word': 'ya'},\n {'end': 16,\n 'entity': 'I-PER',\n 'index': 4,\n 'score': 0.99995816,\n 'start': 10,\n 'word': '\u2581Jasmin'},\n {'end': 17,\n 'entity': 'I-PER',\n 'index': 5,\n 'score': 0.9999584,\n 'start': 16,\n 'word': 'e'},\n {'end': 29,\n 'entity': 'I-PER',\n 'index': 7,\n 'score': 0.99998057,\n 'start': 23,\n 'word': '\u2581Andrew'}]\n```\n\n## Recommendations\n\nUsers (both direct and downstream) should be made aware of the risks, biases and limitations of the model.\n\n# Training\n\nSee the following resources for training data and training procedure details: \n- [XLM-RoBERTa-large model card](https://huggingface.co/xlm-roberta-large)\n- [CoNLL-2003 data card](https://huggingface.co/datasets/conll2003)\n- [Associated paper](https://arxiv.org/pdf/1911.02116.pdf)\n \n# Evaluation\n\nSee the [associated paper](https://arxiv.org/pdf/1911.02116.pdf) for evaluation details.\n\n# Environmental Impact\n\nCarbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).\n\n- **Hardware Type:** 500 32GB Nvidia V100 GPUs (from the [associated paper](https://arxiv.org/pdf/1911.02116.pdf))\n- **Hours used:** More information needed\n- **Cloud Provider:** More information needed\n- **Compute Region:** More information needed\n- **Carbon Emitted:** More information needed\n\n# Technical Specifications\n\nSee the [associated paper](https://arxiv.org/pdf/1911.02116.pdf) for further details.\n\n# Citation\n\n**BibTeX:**\n\n```bibtex\n@article{conneau2019unsupervised,\n title={Unsupervised Cross-lingual Representation Learning at Scale},\n author={Conneau, Alexis and Khandelwal, Kartikay and Goyal, Naman and Chaudhary, Vishrav and Wenzek, Guillaume and Guzm{\\'a}n, Francisco and Grave, Edouard and Ott, Myle and Zettlemoyer, Luke and Stoyanov, Veselin},\n journal={arXiv preprint arXiv:1911.02116},\n year={2019}\n}\n```\n\n**APA:**\n- Conneau, A., Khandelwal, K., Goyal, N., Chaudhary, V., Wenzek, G., Guzm\u00e1n, F., ... & Stoyanov, V. (2019). Unsupervised cross-lingual representation learning at scale. arXiv preprint arXiv:1911.02116.\n\n# Model Card Authors\n\nThis model card was written by the team at Hugging Face.\n\n# How to Get Started with the Model\n\nUse the code below to get started with the model. You can use this model directly within a pipeline for NER.\n\n
\n Click to expand \n\n```python\n>>> from transformers import AutoTokenizer, AutoModelForTokenClassification\n>>> from transformers import pipeline\n>>> tokenizer = AutoTokenizer.from_pretrained(\"xlm-roberta-large-finetuned-conll03-english\")\n>>> model = AutoModelForTokenClassification.from_pretrained(\"xlm-roberta-large-finetuned-conll03-english\")\n>>> classifier = pipeline(\"ner\", model=model, tokenizer=tokenizer)\n>>> classifier(\"Hello I'm Omar and I live in Z\u00fcrich.\")\n\n[{'end': 14,\n 'entity': 'I-PER',\n 'index': 5,\n 'score': 0.9999175,\n 'start': 10,\n 'word': '\u2581Omar'},\n {'end': 35,\n 'entity': 'I-LOC',\n 'index': 10,\n 'score': 0.9999906,\n 'start': 29,\n 'word': '\u2581Z\u00fcrich'}]\n```\n\n
"} {"downloads": 1934946, "id": "Davlan/bert-base-multilingual-cased-ner-hrl", "likes": 40, "pipeline_tag": "token-classification", "task": "token-classification", "meta": {"language": ["ar", "de", "en", "es", "fr", "it", "lv", "nl", "pt", "zh", "multilingual"]}, "description": "\n# bert-base-multilingual-cased-ner-hrl\n## Model description\n**bert-base-multilingual-cased-ner-hrl** is a **Named Entity Recognition** model for 10 high resourced languages (Arabic, German, English, Spanish, French, Italian, Latvian, Dutch, Portuguese and Chinese) based on a fine-tuned mBERT base model. It has been trained to recognize three types of entities: location (LOC), organizations (ORG), and person (PER). \nSpecifically, this model is a *bert-base-multilingual-cased* model that was fine-tuned on an aggregation of 10 high-resourced languages\n## Intended uses & limitations\n#### How to use\nYou can use this model with Transformers *pipeline* for NER.\n```python\nfrom transformers import AutoTokenizer, AutoModelForTokenClassification\nfrom transformers import pipeline\ntokenizer = AutoTokenizer.from_pretrained(\"Davlan/bert-base-multilingual-cased-ner-hrl\")\nmodel = AutoModelForTokenClassification.from_pretrained(\"Davlan/bert-base-multilingual-cased-ner-hrl\")\nnlp = pipeline(\"ner\", model=model, tokenizer=tokenizer)\nexample = \"Nader Jokhadar had given Syria the lead with a well-struck header in the seventh minute.\"\nner_results = nlp(example)\nprint(ner_results)\n```\n#### Limitations and bias\nThis model is limited by its training dataset of entity-annotated news articles from a specific span of time. This may not generalize well for all use cases in different domains. \n## Training data\nThe training data for the 10 languages are from: \n\nLanguage|Dataset\n-|-\nArabic | [ANERcorp](https://camel.abudhabi.nyu.edu/anercorp/)\nGerman | [conll 2003](https://www.clips.uantwerpen.be/conll2003/ner/)\nEnglish | [conll 2003](https://www.clips.uantwerpen.be/conll2003/ner/)\nSpanish | [conll 2002](https://www.clips.uantwerpen.be/conll2002/ner/)\nFrench | [Europeana Newspapers](https://github.com/EuropeanaNewspapers/ner-corpora/tree/master/enp_FR.bnf.bio)\nItalian | [Italian I-CAB](https://ontotext.fbk.eu/icab.html)\nLatvian | [Latvian NER](https://github.com/LUMII-AILab/FullStack/tree/master/NamedEntities)\nDutch | [conll 2002](https://www.clips.uantwerpen.be/conll2002/ner/)\nPortuguese |[Paramopama + Second Harem](https://github.com/davidsbatista/NER-datasets/tree/master/Portuguese)\nChinese | [MSRA](https://huggingface.co/datasets/msra_ner)\n\nThe training dataset distinguishes between the beginning and continuation of an entity so that if there are back-to-back entities of the same type, the model can output where the second entity begins. As in the dataset, each token will be classified as one of the following classes:\nAbbreviation|Description\n-|-\nO|Outside of a named entity\nB-PER |Beginning of a person\u2019s name right after another person\u2019s name\nI-PER |Person\u2019s name\nB-ORG |Beginning of an organisation right after another organisation\nI-ORG |Organisation\nB-LOC |Beginning of a location right after another location\nI-LOC |Location\n## Training procedure\nThis model was trained on NVIDIA V100 GPU with recommended hyperparameters from HuggingFace code.\n\n\n"} {"downloads": 85991, "id": "felflare/bert-restore-punctuation", "likes": 39, "pipeline_tag": "token-classification", "task": "token-classification", "meta": {"language": ["en"], "tags": ["punctuation"], "license": "mit", "datasets": ["yelp_polarity"], "metrics": ["f1"]}, "description": "\n# \u2728 bert-restore-punctuation\n[![forthebadge](https://forthebadge.com/images/badges/gluten-free.svg)]()\n\nThis a bert-base-uncased model finetuned for punctuation restoration on [Yelp Reviews](https://www.tensorflow.org/datasets/catalog/yelp_polarity_reviews). \n\nThe model predicts the punctuation and upper-casing of plain, lower-cased text. An example use case can be ASR output. Or other cases when text has lost punctuation.\n\nThis model is intended for direct use as a punctuation restoration model for the general English language. Alternatively, you can use this for further fine-tuning on domain-specific texts for punctuation restoration tasks.\n\nModel restores the following punctuations -- **[! ? . , - : ; ' ]**\n\nThe model also restores the upper-casing of words.\n\n"} {"downloads": 232660, "id": "dslim/bert-large-NER", "likes": 38, "pipeline_tag": "token-classification", "task": "token-classification", "meta": {"language": "en", "datasets": ["conll2003"], "license": "mit"}, "description": "\n# bert-base-NER\n\n## Model description\n\n**bert-large-NER** is a fine-tuned BERT model that is ready to use for **Named Entity Recognition** and achieves **state-of-the-art performance** for the NER task. It has been trained to recognize four types of entities: location (LOC), organizations (ORG), person (PER) and Miscellaneous (MISC). \n\nSpecifically, this model is a *bert-large-cased* model that was fine-tuned on the English version of the standard [CoNLL-2003 Named Entity Recognition](https://www.aclweb.org/anthology/W03-0419.pdf) dataset. \n\nIf you'd like to use a smaller BERT model fine-tuned on the same dataset, a [**bert-base-NER**](https://huggingface.co/dslim/bert-base-NER/) version is also available. \n\n\n## Intended uses & limitations\n\n#### How to use\n\nYou can use this model with Transformers *pipeline* for NER.\n\n```python\nfrom transformers import AutoTokenizer, AutoModelForTokenClassification\nfrom transformers import pipeline\n\ntokenizer = AutoTokenizer.from_pretrained(\"dslim/bert-base-NER\")\nmodel = AutoModelForTokenClassification.from_pretrained(\"dslim/bert-base-NER\")\n\nnlp = pipeline(\"ner\", model=model, tokenizer=tokenizer)\nexample = \"My name is Wolfgang and I live in Berlin\"\n\nner_results = nlp(example)\nprint(ner_results)\n```\n\n#### Limitations and bias\n\nThis model is limited by its training dataset of entity-annotated news articles from a specific span of time. This may not generalize well for all use cases in different domains. Furthermore, the model occassionally tags subword tokens as entities and post-processing of results may be necessary to handle those cases. \n\n## Training data\n\nThis model was fine-tuned on English version of the standard [CoNLL-2003 Named Entity Recognition](https://www.aclweb.org/anthology/W03-0419.pdf) dataset. \n\nThe training dataset distinguishes between the beginning and continuation of an entity so that if there are back-to-back entities of the same type, the model can output where the second entity begins. As in the dataset, each token will be classified as one of the following classes:\n\nAbbreviation|Description\n-|-\nO|Outside of a named entity\nB-MIS |Beginning of a miscellaneous entity right after another miscellaneous entity\nI-MIS | Miscellaneous entity\nB-PER |Beginning of a person\u2019s name right after another person\u2019s name\nI-PER |Person\u2019s name\nB-ORG |Beginning of an organization right after another organization\nI-ORG |organization\nB-LOC |Beginning of a location right after another location\nI-LOC |Location\n\n\n### CoNLL-2003 English Dataset Statistics\nThis dataset was derived from the Reuters corpus which consists of Reuters news stories. You can read more about how this dataset was created in the CoNLL-2003 paper. \n#### # of training examples per entity type\nDataset|LOC|MISC|ORG|PER\n-|-|-|-|-\nTrain|7140|3438|6321|6600\nDev|1837|922|1341|1842\nTest|1668|702|1661|1617\n#### # of articles/sentences/tokens per dataset\nDataset |Articles |Sentences |Tokens\n-|-|-|-\nTrain |946 |14,987 |203,621\nDev |216 |3,466 |51,362\nTest |231 |3,684 |46,435\n\n## Training procedure\n\nThis model was trained on a single NVIDIA V100 GPU with recommended hyperparameters from the [original BERT paper](https://arxiv.org/pdf/1810.04805) which trained & evaluated the model on CoNLL-2003 NER task. \n\n## Eval results\nmetric|dev|test\n-|-|-\nf1 |95.7 |91.7\nprecision |95.3 |91.2\nrecall |96.1 |92.3\n\nThe test metrics are a little lower than the official Google BERT results which encoded document context & experimented with CRF. More on replicating the original results [here](https://github.com/google-research/bert/issues/223).\n\n### BibTeX entry and citation info\n\n```\n@article{DBLP:journals/corr/abs-1810-04805,\n author = {Jacob Devlin and\n Ming{-}Wei Chang and\n Kenton Lee and\n Kristina Toutanova},\n title = {{BERT:} Pre-training of Deep Bidirectional Transformers for Language\n Understanding},\n journal = {CoRR},\n volume = {abs/1810.04805},\n year = {2018},\n url = {http://arxiv.org/abs/1810.04805},\n archivePrefix = {arXiv},\n eprint = {1810.04805},\n timestamp = {Tue, 30 Oct 2018 20:39:56 +0100},\n biburl = {https://dblp.org/rec/journals/corr/abs-1810-04805.bib},\n bibsource = {dblp computer science bibliography, https://dblp.org}\n}\n```\n```\n@inproceedings{tjong-kim-sang-de-meulder-2003-introduction,\n title = \"Introduction to the {C}o{NLL}-2003 Shared Task: Language-Independent Named Entity Recognition\",\n author = \"Tjong Kim Sang, Erik F. and\n De Meulder, Fien\",\n booktitle = \"Proceedings of the Seventh Conference on Natural Language Learning at {HLT}-{NAACL} 2003\",\n year = \"2003\",\n url = \"https://www.aclweb.org/anthology/W03-0419\",\n pages = \"142--147\",\n}\n```\n"} {"downloads": 15910, "id": "d4data/biomedical-ner-all", "likes": 38, "pipeline_tag": "token-classification", "task": "token-classification", "meta": {"license": "apache-2.0", "language": ["en"], "tags": ["Token Classification"], "co2_eq_emissions": 0.0279399890043426, "widget": [{"text": "CASE: A 28-year-old previously healthy man presented with a 6-week history of palpitations. The symptoms occurred during rest, 2\u20133 times per week, lasted up to 30 minutes at a time and were associated with dyspnea. Except for a grade 2/6 holosystolic tricuspid regurgitation murmur (best heard at the left sternal border with inspiratory accentuation), physical examination yielded unremarkable findings.", "example_title": "example 1"}, {"text": "A 63-year-old woman with no known cardiac history presented with a sudden onset of dyspnea requiring intubation and ventilatory support out of hospital. She denied preceding symptoms of chest discomfort, palpitations, syncope or infection. The patient was afebrile and normotensive, with a sinus tachycardia of 140 beats/min.", "example_title": "example 2"}, {"text": "A 48 year-old female presented with vaginal bleeding and abnormal Pap smears. Upon diagnosis of invasive non-keratinizing SCC of the cervix, she underwent a radical hysterectomy with salpingo-oophorectomy which demonstrated positive spread to the pelvic lymph nodes and the parametrium. Pathological examination revealed that the tumour also extensively involved the lower uterine segment.", "example_title": "example 3"}]}, "description": "\n\n## About the Model\nAn English Named Entity Recognition model, trained on Maccrobat to recognize the bio-medical entities (107 entities) from a given text corpus (case reports etc.). This model was built on top of distilbert-base-uncased\n\n- Dataset: Maccrobat https://figshare.com/articles/dataset/MACCROBAT2018/9764942\n- Carbon emission: 0.0279399890043426 Kg\n- Training time: 30.16527 minutes\n- GPU used : 1 x GeForce RTX 3060 Laptop GPU\n\nCheckout the tutorial video for explanation of this model and corresponding python library: https://youtu.be/xpiDPdBpS18\n\n## Usage\nThe easiest way is to load the inference api from huggingface and second method is through the pipeline object offered by transformers library.\n```python\nfrom transformers import pipeline\nfrom transformers import AutoTokenizer, AutoModelForTokenClassification\n\ntokenizer = AutoTokenizer.from_pretrained(\"d4data/biomedical-ner-all\")\nmodel = AutoModelForTokenClassification.from_pretrained(\"d4data/biomedical-ner-all\")\n\npipe = pipeline(\"ner\", model=model, tokenizer=tokenizer, aggregation_strategy=\"simple\") # pass device=0 if using gpu\npipe(\"\"\"The patient reported no recurrence of palpitations at follow-up 6 months after the ablation.\"\"\")\n```\n\n## Author\nThis model is part of the Research topic \"AI in Biomedical field\" conducted by Deepak John Reji, Shaina Raza. If you use this work (code, model or dataset), please star at:\n> https://github.com/dreji18/Bio-Epidemiology-NER"} {"downloads": 194860, "id": "Jean-Baptiste/roberta-large-ner-english", "likes": 37, "pipeline_tag": "token-classification", "task": "token-classification", "meta": {"language": "en", "datasets": ["conll2003"], "widget": [{"text": "My name is jean-baptiste and I live in montreal"}, {"text": "My name is clara and I live in berkeley, california."}, {"text": "My name is wolfgang and I live in berlin"}], "train-eval-index": [{"config": "conll2003", "task": "token-classification", "task_id": "entity_extraction", "splits": {"eval_split": "validation"}, "col_mapping": {"tokens": "tokens", "ner_tags": "tags"}}], "license": "mit"}, "description": "\n\n# roberta-large-ner-english: model fine-tuned from roberta-large for NER task\n\n## Introduction\n\n[roberta-large-ner-english] is an english NER model that was fine-tuned from roberta-large on conll2003 dataset. \nModel was validated on emails/chat data and outperformed other models on this type of data specifically. \nIn particular the model seems to work better on entity that don't start with an upper case.\n\n\n## Training data\n\nTraining data was classified as follow:\n\nAbbreviation|Description\n-|-\nO |Outside of a named entity\nMISC |Miscellaneous entity\nPER |Person\u2019s name\nORG |Organization\nLOC |Location\n\nIn order to simplify, the prefix B- or I- from original conll2003 was removed.\nI used the train and test dataset from original conll2003 for training and the \"validation\" dataset for validation. This resulted in a dataset of size:\n\nTrain | Validation \n-|-\n17494 | 3250\n\n## How to use roberta-large-ner-english with HuggingFace\n\n##### Load roberta-large-ner-english and its sub-word tokenizer :\n\n```python\nfrom transformers import AutoTokenizer, AutoModelForTokenClassification\n\ntokenizer = AutoTokenizer.from_pretrained(\"Jean-Baptiste/roberta-large-ner-english\")\nmodel = AutoModelForTokenClassification.from_pretrained(\"Jean-Baptiste/roberta-large-ner-english\")\n\n\n##### Process text sample (from wikipedia)\n\nfrom transformers import pipeline\n\nnlp = pipeline('ner', model=model, tokenizer=tokenizer, aggregation_strategy=\"simple\")\nnlp(\"Apple was founded in 1976 by Steve Jobs, Steve Wozniak and Ronald Wayne to develop and sell Wozniak's Apple I personal computer\")\n\n\n[{'entity_group': 'ORG',\n 'score': 0.99381506,\n 'word': ' Apple',\n 'start': 0,\n 'end': 5},\n {'entity_group': 'PER',\n 'score': 0.99970853,\n 'word': ' Steve Jobs',\n 'start': 29,\n 'end': 39},\n {'entity_group': 'PER',\n 'score': 0.99981767,\n 'word': ' Steve Wozniak',\n 'start': 41,\n 'end': 54},\n {'entity_group': 'PER',\n 'score': 0.99956465,\n 'word': ' Ronald Wayne',\n 'start': 59,\n 'end': 71},\n {'entity_group': 'PER',\n 'score': 0.9997918,\n 'word': ' Wozniak',\n 'start': 92,\n 'end': 99},\n {'entity_group': 'MISC',\n 'score': 0.99956393,\n 'word': ' Apple I',\n 'start': 102,\n 'end': 109}]\n```\n\n\n## Model performances \n\nModel performances computed on conll2003 validation dataset (computed on the tokens predictions)\n\nentity|precision|recall|f1\n-|-|-|-\nPER|0.9914|0.9927|0.9920 \nORG|0.9627|0.9661|0.9644\nLOC|0.9795|0.9862|0.9828\nMISC|0.9292|0.9262|0.9277\nOverall|0.9740|0.9766|0.9753\n\n\nOn private dataset (email, chat, informal discussion), computed on word predictions:\n\nentity|precision|recall|f1\n-|-|-|-\nPER|0.8823|0.9116|0.8967\nORG|0.7694|0.7292|0.7487\nLOC|0.8619|0.7768|0.8171\n\nBy comparison on the same private dataset, Spacy (en_core_web_trf-3.2.0) was giving:\n\nentity|precision|recall|f1\n-|-|-|-\nPER|0.9146|0.8287|0.8695\nORG|0.7655|0.6437|0.6993\nLOC|0.8727|0.6180|0.7236\n\n\n\nFor those who could be interested, here is a short article on how I used the results of this model to train a LSTM model for signature detection in emails:\nhttps://medium.com/@jean-baptiste.polle/lstm-model-for-email-signature-detection-8e990384fefa\n"} {"downloads": 23760, "id": "StanfordAIMI/stanford-deidentifier-base", "likes": 37, "pipeline_tag": "token-classification", "task": "token-classification", "meta": {"widget": [{"text": "PROCEDURE: Chest xray. COMPARISON: last seen on 1/1/2020 and also record dated of March 1st, 2019. FINDINGS: patchy airspace opacities. IMPRESSION: The results of the chest xray of January 1 2020 are the most concerning ones. The patient was transmitted to another service of UH Medical Center under the responsability of Dr. Perez. We used the system MedClinical data transmitter and sent the data on 2/1/2020, under the ID 5874233. We received the confirmation of Dr Perez. He is reachable at 567-493-1234."}, {"text": "Dr. Curt Langlotz chose to schedule a meeting on 06/23."}], "tags": ["token-classification", "sequence-tagger-model", "pytorch", "transformers", "pubmedbert", "uncased", "radiology", "biomedical"], "datasets": ["radreports"], "language": ["en"], "license": "mit"}, "description": "\nStanford de-identifier was trained on a variety of radiology and biomedical documents with the goal of automatising the de-identification process while reaching satisfactory accuracy for use in production. Manuscript in-proceedings. \n\nThese model weights are the recommended ones among all available deidentifier weights.\n\nAssociated github repo: https://github.com/MIDRC/Stanford_Penn_Deidentifier\n\n## Citation\n\n```bibtex\n@article{10.1093/jamia/ocac219,\n author = {Chambon, Pierre J and Wu, Christopher and Steinkamp, Jackson M and Adleberg, Jason and Cook, Tessa S and Langlotz, Curtis P},\n title = \"{Automated deidentification of radiology reports combining transformer and \u201chide in plain sight\u201d rule-based methods}\",\n journal = {Journal of the American Medical Informatics Association},\n year = {2022},\n month = {11},\n abstract = \"{To develop an automated deidentification pipeline for radiology reports that detect protected health information (PHI) entities and replaces them with realistic surrogates \u201chiding in plain sight.\u201dIn this retrospective study, 999 chest X-ray and CT reports collected between November 2019 and November 2020 were annotated for PHI at the token level and combined with 3001 X-rays and 2193 medical notes previously labeled, forming a large multi-institutional and cross-domain dataset of 6193 documents. Two radiology test sets, from a known and a new institution, as well as i2b2 2006 and 2014 test sets, served as an evaluation set to estimate model performance and to compare it with previously released deidentification tools. Several PHI detection models were developed based on different training datasets, fine-tuning approaches and data augmentation techniques, and a synthetic PHI generation algorithm. These models were compared using metrics such as precision, recall and F1 score, as well as paired samples Wilcoxon tests.Our best PHI detection model achieves 97.9 F1 score on radiology reports from a known institution, 99.6 from a new institution, 99.5 on i2b2 2006, and 98.9 on i2b2 2014. On reports from a known institution, it achieves 99.1 recall of detecting the core of each PHI span.Our model outperforms all deidentifiers it was compared to on all test sets as well as human labelers on i2b2 2014 data. It enables accurate and automatic deidentification of radiology reports.A transformer-based deidentification pipeline can achieve state-of-the-art performance for deidentifying radiology reports and other medical documents.}\",\n issn = {1527-974X},\n doi = {10.1093/jamia/ocac219},\n url = {https://doi.org/10.1093/jamia/ocac219},\n note = {ocac219},\n eprint = {https://academic.oup.com/jamia/advance-article-pdf/doi/10.1093/jamia/ocac219/47220191/ocac219.pdf},\n}\n```"} {"downloads": 1006532, "id": "ckiplab/bert-base-chinese-ner", "likes": 34, "pipeline_tag": "token-classification", "task": "token-classification", "meta": {"language": ["zh"], "thumbnail": "https://ckip.iis.sinica.edu.tw/files/ckip_logo.png", "tags": ["pytorch", "token-classification", "bert", "zh"], "license": "gpl-3.0"}, "description": "\n\n# CKIP BERT Base Chinese\n\nThis project provides traditional Chinese transformers models (including ALBERT, BERT, GPT2) and NLP tools (including word segmentation, part-of-speech tagging, named entity recognition).\n\n\u9019\u500b\u5c08\u6848\u63d0\u4f9b\u4e86\u7e41\u9ad4\u4e2d\u6587\u7684 transformers \u6a21\u578b\uff08\u5305\u542b ALBERT\u3001BERT\u3001GPT2\uff09\u53ca\u81ea\u7136\u8a9e\u8a00\u8655\u7406\u5de5\u5177\uff08\u5305\u542b\u65b7\u8a5e\u3001\u8a5e\u6027\u6a19\u8a18\u3001\u5be6\u9ad4\u8fa8\u8b58\uff09\u3002\n\n## Homepage\n\n- https://github.com/ckiplab/ckip-transformers\n\n## Contributers\n\n- [Mu Yang](https://muyang.pro) at [CKIP](https://ckip.iis.sinica.edu.tw) (Author & Maintainer)\n\n## Usage\n\nPlease use BertTokenizerFast as tokenizer instead of AutoTokenizer.\n\n\u8acb\u4f7f\u7528 BertTokenizerFast \u800c\u975e AutoTokenizer\u3002\n\n```\nfrom transformers import (\n BertTokenizerFast,\n AutoModel,\n)\n\ntokenizer = BertTokenizerFast.from_pretrained('bert-base-chinese')\nmodel = AutoModel.from_pretrained('ckiplab/bert-base-chinese-ner')\n```\n\nFor full usage and more information, please refer to https://github.com/ckiplab/ckip-transformers.\n\n\u6709\u95dc\u5b8c\u6574\u4f7f\u7528\u65b9\u6cd5\u53ca\u5176\u4ed6\u8cc7\u8a0a\uff0c\u8acb\u53c3\u898b https://github.com/ckiplab/ckip-transformers \u3002\n"} {"downloads": 13794, "id": "ml6team/keyphrase-extraction-kbir-inspec", "likes": 33, "pipeline_tag": "token-classification", "task": "token-classification", "meta": {"language": "en", "license": "mit", "tags": ["keyphrase-extraction"], "datasets": ["midas/inspec"], "metrics": ["seqeval"], "widget": [{"text": "Keyphrase extraction is a technique in text analysis where you extract the important keyphrases from a document. Thanks to these keyphrases humans can understand the content of a text very quickly and easily without reading it completely. Keyphrase extraction was first done primarily by human annotators, who read the text in detail and then wrote down the most important keyphrases. The disadvantage is that if you work with a lot of documents, this process can take a lot of time.\nHere is where Artificial Intelligence comes in. Currently, classical machine learning methods, that use statistical and linguistic features, are widely used for the extraction process. Now with deep learning, it is possible to capture the semantic meaning of a text even better than these classical methods. Classical methods look at the frequency, occurrence and order of words in the text, whereas these neural approaches can capture long-term semantic dependencies and context of words in a text.", "example_title": "Example 1"}, {"text": "In this work, we explore how to learn task specific language models aimed towards learning rich representation of keyphrases from text documents. We experiment with different masking strategies for pre-training transformer language models (LMs) in discriminative as well as generative settings. In the discriminative setting, we introduce a new pre-training objective - Keyphrase Boundary Infilling with Replacement (KBIR), showing large gains in performance (up to 9.26 points in F1) over SOTA, when LM pre-trained using KBIR is fine-tuned for the task of keyphrase extraction. In the generative setting, we introduce a new pre-training setup for BART - KeyBART, that reproduces the keyphrases related to the input text in the CatSeq format, instead of the denoised original input. This also led to gains in performance (up to 4.33 points inF1@M) over SOTA for keyphrase generation. Additionally, we also fine-tune the pre-trained language models on named entity recognition(NER), question answering (QA), relation extraction (RE), abstractive summarization and achieve comparable performance with that of the SOTA, showing that learning rich representation of keyphrases is indeed beneficial for many other fundamental NLP tasks.", "example_title": "Example 2"}], "model-index": [{"name": "DeDeckerThomas/keyphrase-extraction-kbir-inspec", "results": [{"task": {"type": "keyphrase-extraction", "name": "Keyphrase Extraction"}, "dataset": {"type": "midas/inspec", "name": "inspec"}, "metrics": [{"type": "F1 (Seqeval)", "value": 0.588, "name": "F1 (Seqeval)"}, {"type": "F1@M", "value": 0.564, "name": "F1@M"}]}]}]}, "description": "\n# \ud83d\udd11 Keyphrase Extraction Model: KBIR-inspec\nKeyphrase extraction is a technique in text analysis where you extract the important keyphrases from a document. Thanks to these keyphrases humans can understand the content of a text very quickly and easily without reading it completely. Keyphrase extraction was first done primarily by human annotators, who read the text in detail and then wrote down the most important keyphrases. The disadvantage is that if you work with a lot of documents, this process can take a lot of time \u23f3. \n\nHere is where Artificial Intelligence \ud83e\udd16 comes in. Currently, classical machine learning methods, that use statistical and linguistic features, are widely used for the extraction process. Now with deep learning, it is possible to capture the semantic meaning of a text even better than these classical methods. Classical methods look at the frequency, occurrence and order of words in the text, whereas these neural approaches can capture long-term semantic dependencies and context of words in a text.\n\n\n\n## \ud83d\udcd3 Model Description\nThis model uses [KBIR](https://huggingface.co/bloomberg/KBIR) as its base model and fine-tunes it on the [Inspec dataset](https://huggingface.co/datasets/midas/inspec). KBIR or Keyphrase Boundary Infilling with Replacement is a pre-trained model which utilizes a multi-task learning setup for optimizing a combined loss of Masked Language Modeling (MLM), Keyphrase Boundary Infilling (KBI) and Keyphrase Replacement Classification (KRC).\nYou can find more information about the architecture in this [paper](https://arxiv.org/abs/2112.08547).\n\nKeyphrase extraction models are transformer models fine-tuned as a token classification problem where each word in the document is classified as being part of a keyphrase or not.\n\n| Label | Description |\n| "} {"downloads": 29746, "id": "vblagoje/bert-english-uncased-finetuned-pos", "likes": 27, "pipeline_tag": "token-classification", "task": "token-classification", "meta": {}, "description": "Entry not found"} {"downloads": 2772, "id": "deprem-ml/deprem-ner", "likes": 26, "pipeline_tag": "token-classification", "task": "token-classification", "meta": {"license": "apache-2.0", "language": ["tr"], "pipeline_tag": "token-classification", "widget": [{"text": "L\u00fctfen yard\u0131m Akevler mahallesi R\u00fczgar sokak Tuncay apartman\u0131 zemin kat Antakya akrabalar\u0131m g\u00f6\u00e7\u00fck alt\u0131nda #hatay #Afad", "example_title": "\u00d6rnek"}]}, "description": "\n## deprem-ner\n\nBu model depremde enkaz alt\u0131nda kalan ki\u015filerin bildirimlerinden sokak, il, il\u00e7e gibi bilgileri \u00e7ekmeye \u00e7al\u0131\u015ft\u0131k. \n\n\u00d6rnek girdiler:\n- \"L\u00fctfen yard\u0131m Akevler mahallesi R\u00fczgar sokak Tuncay apartman\u0131 zemin kat Antakya akrabalar\u0131m g\u00f6\u00e7\u00fck alt\u0131nda #hatay #Afad\"\n- \"MARA\u0218A'ta arkada\u015fimizdan haber alam\u0131yoruz ACIL yard\u0131m Penta Park konutlar\u0131 1. Blok en \u00fcst kat 11. Kat \\n\\n@AFADBaskanlik #kahramanmara\u015f\\nAC\u0130L\"\n\n\n```\nfrom transformers import pipeline\n\nner_pipe = pipeline(\"token-classification\",\"deprem-ml/deprem-ner\")\npredictions = ner_pipe(\"\"L\u00fctfen yard\u0131m Akevler mahallesi R\u00fczgar sokak Tuncay apartman\u0131 zemin kat Antakya akrabalar\u0131m g\u00f6\u00e7\u00fck alt\u0131nda #hatay #Afad\"\")\n\n```\nVerdi\u011fi \u00e7\u0131kt\u0131lar:\n\n\n\n```\n[\n {\n \"entity_group\": \"mahalle\",\n \"score\": 0.8160411715507507,\n \"word\": \"Akevler mahallesi\",\n \"start\": 14,\n \"end\": 31\n },\n {\n \"entity_group\": \"sokak\",\n \"score\": 0.940501868724823,\n \"word\": \"R\u00fczgar sokak\",\n \"start\": 32,\n \"end\": 44\n },\n {\n \"entity_group\": \"Apartman/Site\",\n \"score\": 0.8081040978431702,\n \"word\": \"Tuncay apartman\u0131\",\n \"start\": 45,\n \"end\": 61\n },\n {\n \"entity_group\": \"ilce\",\n \"score\": 0.854024350643158,\n \"word\": \"Antakya\",\n \"start\": 72,\n \"end\": 79\n }\n]\n```\n### De\u011ferlendirme\nBu modeli Hugging Face Hub'daki di\u011fer modellerle kar\u015f\u0131la\u015ft\u0131rd\u0131k, \u00f6rnek 30 input'ta sonu\u00e7lar\u0131 [bu repository'de](https://huggingface.co/datasets/deprem-ml/butun_model_benchmarklari) bulabilirsiniz."} {"downloads": 35415, "id": "flair/ner-english-large", "likes": 25, "pipeline_tag": "token-classification", "task": "token-classification", "meta": {"tags": ["flair", "token-classification", "sequence-tagger-model"], "language": "en", "datasets": ["conll2003"], "widget": [{"text": "George Washington went to Washington"}]}, "description": "\n\n## English NER in Flair (large model)\n\nThis is the large 4-class NER model for English that ships with [Flair](https://github.com/flairNLP/flair/).\n\nF1-Score: **94,36** (corrected CoNLL-03)\n\nPredicts 4 tags:\n\n| **tag** | **meaning** |\n|"} {"downloads": 5677, "id": "Babelscape/wikineural-multilingual-ner", "likes": 17, "pipeline_tag": "token-classification", "task": "token-classification", "meta": {"annotations_creators": ["machine-generated"], "language_creators": ["machine-generated"], "widget": [{"text": "My name is Wolfgang and I live in Berlin."}, {"text": "George Washington went to Washington."}, {"text": "Mi nombre es Sarah y vivo en Londres."}, {"text": "\u041c\u0435\u043d\u044f \u0437\u043e\u0432\u0443\u0442 \u0421\u0438\u043c\u043e\u043d\u0430, \u0438 \u044f \u0436\u0438\u0432\u0443 \u0432 \u0420\u0438\u043c\u0435."}], "tags": ["named-entity-recognition", "sequence-tagger-model"], "datasets": ["Babelscape/wikineural"], "language": ["de", "en", "es", "fr", "it", "nl", "pl", "pt", "ru", "multilingual"], "license": ["cc-by-nc-sa-4.0"], "pretty_name": "wikineural-dataset", "source_datasets": ["original"], "task_categories": ["structure-prediction"], "task_ids": ["named-entity-recognition"]}, "description": "\n\n# WikiNEuRal: Combined Neural and Knowledge-based Silver Data Creation for Multilingual NER\nThis is the model card for the EMNLP 2021 paper [WikiNEuRal: Combined Neural and Knowledge-based Silver Data Creation for Multilingual NER](https://aclanthology.org/2021.findings-emnlp.215/). We fine-tuned a multilingual language model (mBERT) for 3 epochs on our [WikiNEuRal dataset](https://huggingface.co/datasets/Babelscape/wikineural) for Named Entity Recognition (NER). The resulting multilingual NER model supports the 9 languages covered by WikiNEuRal (de, en, es, fr, it, nl, pl, pt, ru), and it was trained on all 9 languages jointly.\n\n**If you use the model, please reference this work in your paper**:\n\n```bibtex\n@inproceedings{tedeschi-etal-2021-wikineural-combined,\n title = \"{W}iki{NE}u{R}al: {C}ombined Neural and Knowledge-based Silver Data Creation for Multilingual {NER}\",\n author = \"Tedeschi, Simone and\n Maiorca, Valentino and\n Campolungo, Niccol{\\`o} and\n Cecconi, Francesco and\n Navigli, Roberto\",\n booktitle = \"Findings of the Association for Computational Linguistics: EMNLP 2021\",\n month = nov,\n year = \"2021\",\n address = \"Punta Cana, Dominican Republic\",\n publisher = \"Association for Computational Linguistics\",\n url = \"https://aclanthology.org/2021.findings-emnlp.215\",\n pages = \"2521--2533\",\n abstract = \"Multilingual Named Entity Recognition (NER) is a key intermediate task which is needed in many areas of NLP. In this paper, we address the well-known issue of data scarcity in NER, especially relevant when moving to a multilingual scenario, and go beyond current approaches to the creation of multilingual silver data for the task. We exploit the texts of Wikipedia and introduce a new methodology based on the effective combination of knowledge-based approaches and neural models, together with a novel domain adaptation technique, to produce high-quality training corpora for NER. We evaluate our datasets extensively on standard benchmarks for NER, yielding substantial improvements up to 6 span-based F1-score points over previous state-of-the-art systems for data creation.\",\n}\n```\n \nThe original repository for the paper can be found at [https://github.com/Babelscape/wikineural](https://github.com/Babelscape/wikineural).\n\n## How to use\n\nYou can use this model with Transformers *pipeline* for NER. \n\n```python\nfrom transformers import AutoTokenizer, AutoModelForTokenClassification\nfrom transformers import pipeline\n\ntokenizer = AutoTokenizer.from_pretrained(\"Babelscape/wikineural-multilingual-ner\")\nmodel = AutoModelForTokenClassification.from_pretrained(\"Babelscape/wikineural-multilingual-ner\")\n\nnlp = pipeline(\"ner\", model=model, tokenizer=tokenizer)\nexample = \"My name is Wolfgang and I live in Berlin\"\n\nner_results = nlp(example)\nprint(ner_results)\n```\n\n## Limitations and bias\n\nThis model is trained on WikiNEuRal, a state-of-the-art dataset for Multilingual NER automatically derived from Wikipedia. Therefore, it might not generalize well to all textual genres (e.g. news). On the other hand, models trained only on news articles (e.g. only on CoNLL03) have been proven to obtain much lower scores on encyclopedic articles. To obtain more robust systems, we encourage you to train a system on the combination of WikiNEuRal with other datasets (e.g. WikiNEuRal + CoNLL).\n\n## Licensing Information\n\nContents of this repository are restricted to only non-commercial research purposes under the [Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0)](https://creativecommons.org/licenses/by-nc-sa/4.0/). Copyright of the dataset contents and models belongs to the original copyright holders."} {"downloads": 182470, "id": "dbmdz/bert-large-cased-finetuned-conll03-english", "likes": 16, "pipeline_tag": "token-classification", "task": "token-classification", "meta": {}, "description": "Entry not found"} {"downloads": 21176, "id": "Jean-Baptiste/camembert-ner-with-dates", "likes": 16, "pipeline_tag": "token-classification", "task": "token-classification", "meta": {"language": "fr", "datasets": ["Jean-Baptiste/wikiner_fr"], "widget": [{"text": "Je m'appelle jean-baptiste et j'habite \u00e0 montr\u00e9al depuis fevr 2012"}]}, "description": "\n\n# camembert-ner: model fine-tuned from camemBERT for NER task (including DATE tag).\n\n## Introduction\n\n[camembert-ner-with-dates] is an extension of french camembert-ner model with an additionnal tag for dates.\nModel was trained on enriched version of wikiner-fr dataset (~170 634 sentences).\n\nOn my test data (mix of chat and email), this model got an f1 score of ~83% (in comparison dateparser was ~70%).\nDateparser library can still be be used on the output of this model in order to convert text to python datetime object \n(https://dateparser.readthedocs.io/en/latest/).\n\n\n## How to use camembert-ner-with-dates with HuggingFace\n\n##### Load camembert-ner-with-dates and its sub-word tokenizer :\n\n```python\nfrom transformers import AutoTokenizer, AutoModelForTokenClassification\n\ntokenizer = AutoTokenizer.from_pretrained(\"Jean-Baptiste/camembert-ner-with-dates\")\nmodel = AutoModelForTokenClassification.from_pretrained(\"Jean-Baptiste/camembert-ner-with-dates\")\n\n\n##### Process text sample (from wikipedia)\n\nfrom transformers import pipeline\n\nnlp = pipeline('ner', model=model, tokenizer=tokenizer, aggregation_strategy=\"simple\")\nnlp(\"Apple est cr\u00e9\u00e9e le 1er avril 1976 dans le garage de la maison d'enfance de Steve Jobs \u00e0 Los Altos en Californie par Steve Jobs, Steve Wozniak et Ronald Wayne14, puis constitu\u00e9e sous forme de soci\u00e9t\u00e9 le 3 janvier 1977 \u00e0 l'origine sous le nom d'Apple Computer, mais pour ses 30 ans et pour refl\u00e9ter la diversification de ses produits, le mot \u00ab computer \u00bb est retir\u00e9 le 9 janvier 2015.\")\n\n\n[{'entity_group': 'ORG',\n 'score': 0.9776379466056824,\n 'word': 'Apple',\n 'start': 0,\n 'end': 5},\n {'entity_group': 'DATE',\n 'score': 0.9793774570737567,\n 'word': 'le 1er avril 1976 dans le',\n 'start': 15,\n 'end': 41},\n {'entity_group': 'PER',\n 'score': 0.9958226680755615,\n 'word': 'Steve Jobs',\n 'start': 74,\n 'end': 85},\n {'entity_group': 'LOC',\n 'score': 0.995087186495463,\n 'word': 'Los Altos',\n 'start': 87,\n 'end': 97},\n {'entity_group': 'LOC',\n 'score': 0.9953305125236511,\n 'word': 'Californie',\n 'start': 100,\n 'end': 111},\n {'entity_group': 'PER',\n 'score': 0.9961076378822327,\n 'word': 'Steve Jobs',\n 'start': 115,\n 'end': 126},\n {'entity_group': 'PER',\n 'score': 0.9960325956344604,\n 'word': 'Steve Wozniak',\n 'start': 127,\n 'end': 141},\n {'entity_group': 'PER',\n 'score': 0.9957776467005411,\n 'word': 'Ronald Wayne',\n 'start': 144,\n 'end': 157},\n {'entity_group': 'DATE',\n 'score': 0.994030773639679,\n 'word': 'le 3 janvier 1977 \u00e0',\n 'start': 198,\n 'end': 218},\n {'entity_group': 'ORG',\n 'score': 0.9720810294151306,\n 'word': \"d'Apple Computer\",\n 'start': 240,\n 'end': 257},\n {'entity_group': 'DATE',\n 'score': 0.9924157659212748,\n 'word': '30 ans et',\n 'start': 272,\n 'end': 282},\n {'entity_group': 'DATE',\n 'score': 0.9934852868318558,\n 'word': 'le 9 janvier 2015.',\n 'start': 363,\n 'end': 382}]\n\n```\n\n\n## Model performances (metric: seqeval)\n\nGlobal\n```\n'precision': 0.928\n'recall': 0.928\n'f1': 0.928\n```\n\nBy entity\n```\nLabel LOC: (precision:0.929, recall:0.932, f1:0.931, support:9510)\nLabel PER: (precision:0.952, recall:0.965, f1:0.959, support:9399)\nLabel MISC: (precision:0.878, recall:0.844, f1:0.860, support:5364)\nLabel ORG: (precision:0.848, recall:0.883, f1:0.865, support:2299)\nLabel DATE: Not relevant because of method used to add date tag on wikiner dataset (estimated f1 ~90%)\n\n\n ```\n\n"} {"downloads": 4146, "id": "yanekyuk/bert-uncased-keyword-extractor", "likes": 16, "pipeline_tag": "token-classification", "task": "token-classification", "meta": {"license": "apache-2.0", "tags": ["generated_from_trainer"], "metrics": ["precision", "recall", "accuracy", "f1"], "language": ["en"], "widget": [{"text": "Broadcom agreed to acquire cloud computing company VMware in a $61 billion (\u20ac57bn) cash-and stock deal, massively diversifying the chipmaker\u2019s business and almost tripling its software-related revenue to about 45% of its total sales. By the numbers: VMware shareholders will receive either $142.50 in cash or 0.2520 of a Broadcom share for each VMware stock. Broadcom will also assume $8 billion of VMware's net debt."}, {"text": "Canadian Natural Resources Minister Jonathan Wilkinson told Bloomberg that the country could start supplying Europe with liquefied natural gas (LNG) in as soon as three years by converting an existing LNG import facility on Canada\u2019s Atlantic coast into an export terminal. Bottom line: Wilkinson said what Canada cares about is that the new LNG facility uses a low-emission process for the gas and is capable of transitioning to exporting hydrogen later on."}, {"text": "Google is being investigated by the UK\u2019s antitrust watchdog for its dominance in the \"ad tech stack,\" the set of services that facilitate the sale of online advertising space between advertisers and sellers. Google has strong positions at various levels of the ad tech stack and charges fees to both publishers and advertisers. A step back: UK Competition and Markets Authority has also been investigating whether Google and Meta colluded over ads, probing into the advertising agreement between the two companies, codenamed Jedi Blue."}, {"text": "Shares in Twitter closed 6.35% up after an SEC 13D filing revealed that Elon Musk pledged to put up an additional $6.25 billion of his own wealth to fund the $44 billion takeover deal, lifting the total to $33.5 billion from an initial $27.25 billion. In other news: Former Twitter CEO Jack Dorsey announced he's stepping down, but would stay on Twitter\u2019s board \\\u201cuntil his term expires at the 2022 meeting of stockholders.\""}], "model-index": [{"name": "bert-uncased-keyword-extractor", "results": []}]}, "description": "\n\n\n\n# bert-uncased-keyword-extractor\n\nThis model is a fine-tuned version of [bert-base-uncased](https://huggingface.co/bert-base-uncased) on an unknown dataset.\nIt achieves the following results on the evaluation set:\n- Loss: 0.1247\n- Precision: 0.8547\n- Recall: 0.8825\n- Accuracy: 0.9741\n- F1: 0.8684\n\n## Model description\n\nMore information needed\n\n## Intended uses & limitations\n\nMore information needed\n\n## Training and evaluation data\n\nMore information needed\n\n## Training procedure\n\n### Training hyperparameters\n\nThe following hyperparameters were used during training:\n- learning_rate: 2e-05\n- train_batch_size: 16\n- eval_batch_size: 16\n- seed: 42\n- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08\n- lr_scheduler_type: linear\n- num_epochs: 8\n- mixed_precision_training: Native AMP\n\n### Training results\n\n| Training Loss | Epoch | Step | Validation Loss | Precision | Recall | Accuracy | F1 |\n|:"} {"downloads": 15114, "id": "cmarkea/distilcamembert-base-ner", "likes": 15, "pipeline_tag": "token-classification", "task": "token-classification", "meta": {"language": "fr", "license": "mit", "datasets": ["Jean-Baptiste/wikiner_fr"], "widget": [{"text": "Boulanger, habitant \u00e0 Boulanger et travaillant dans le magasin Boulanger situ\u00e9 dans la ville de Boulanger. Boulanger a \u00e9crit le livre \u00e9ponyme Boulanger \u00e9dit\u00e9 par la maison d'\u00e9dition Boulanger."}, {"text": "Quentin Jerome Tarantino na\u00eet le 27 mars 1963 \u00e0 Knoxville, dans le Tennessee. Il est le fils de Connie McHugh, une infirmi\u00e8re, n\u00e9e le 3 septembre 1946, et de Tony Tarantino, acteur et musicien amateur n\u00e9 \u00e0 New York. Ce dernier est d'origine italienne par son p\u00e8re ; sa m\u00e8re a des ascendances irlandaises et cherokees. Il est pr\u00e9nomm\u00e9 d'apr\u00e8s Quint Asper, le personnage jou\u00e9 par Burt Reynolds dans la s\u00e9rie Gunsmoke et Quentin Compson, personnage du roman Le Bruit et la Fureur. Son p\u00e8re quitte le domicile familial avant m\u00eame sa naissance. En 1965, sa m\u00e8re d\u00e9m\u00e9nage \u00e0 Torrance, dans la banlieue sud de Los Angeles, et se remarie avec Curtis Zastoupil, un pianiste de bar, qui lui fait d\u00e9couvrir le cin\u00e9ma. Le couple divorce alors que le jeune Quentin a une dizaine d'ann\u00e9es."}]}, "description": "\nDistilCamemBERT-NER\n===================\n\nWe present DistilCamemBERT-NER, which is [DistilCamemBERT](https://huggingface.co/cmarkea/distilcamembert-base) fine-tuned for the NER (Named Entity Recognition) task for the French language. The work is inspired by [Jean-Baptiste/camembert-ner](https://huggingface.co/Jean-Baptiste/camembert-ner) based on the [CamemBERT](https://huggingface.co/camembert-base) model. The problem of the modelizations based on CamemBERT is at the scaling moment, for the production phase, for example. Indeed, inference cost can be a technological issue. To counteract this effect, we propose this modelization which **divides the inference time by two** with the same consumption power thanks to [DistilCamemBERT](https://huggingface.co/cmarkea/distilcamembert-base).\n\nDataset\n"} {"downloads": 3122, "id": "elastic/distilbert-base-uncased-finetuned-conll03-english", "likes": 15, "pipeline_tag": "token-classification", "task": "token-classification", "meta": {"language": "en", "license": "apache-2.0", "datasets": ["conll2003"], "model-index": [{"name": "elastic/distilbert-base-uncased-finetuned-conll03-english", "results": [{"task": {"type": "token-classification", "name": "Token Classification"}, "dataset": {"name": "conll2003", "type": "conll2003", "config": "conll2003", "split": "validation"}, "metrics": [{"type": "accuracy", "value": 0.9854480753649896, "name": "Accuracy", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiZmM0NzNhYTM2NGU0YjMwZDMwYTdhYjY3MDgwMTYxNWRjYzQ1NmE0OGEwOTcxMGY5ZTU1ZTQ3OTM5OGZkYjE2NCIsInZlcnNpb24iOjF9.v8Mk62C40vRWQ78BSCtGyphKKHd6q-Ir6sVbSjNjG37j9oiuQN3CDmk9XItmjvCwyKwMEr2NqUXaSyIfUSpBDg"}, {"type": "precision", "value": 0.9880928983228512, "name": "Precision", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiMWIzYTg2OTFjY2FkNWY4MzUyN2ZjOGFlYWNhODYzODVhYjQwZTQ3YzdhMzMxY2I4N2U0YWI1YWVlYjIxMDdkNCIsInZlcnNpb24iOjF9.A50vF5qWgZjxABjL9tc0vssFxYHYhBQ__hLXcvuoZoK8c2TyuODHcM0LqGLeRJF8kcPaLx1hcNk3QMdOETVQBA"}, {"type": "recall", "value": 0.9895677847945542, "name": "Recall", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiYzBiZDg1YmM2NzFkNjQ3MzUzN2QzZDAwNzUwMmM3MzU1ODBlZWJjYmI1YzIxM2YxMzMzNDUxYjkyYzQzMDQ3ZSIsInZlcnNpb24iOjF9.aZEC0c93WWn3YoPkjhe2W1-OND9U2qWzesL9zioNuhstbj7ftANERs9dUAaJIlNCb7NS28q3x9c2s6wGLwovCw"}, {"type": "f1", "value": 0.9888297915932504, "name": "F1", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiYmNkNzVhODJjMjExOTg4ZjQwMWM4NGIxZGNiZTZlMDk5MzNmMjIwM2ZiNzdiZGIxYmNmNmJjMGVkYTlkN2FlNiIsInZlcnNpb24iOjF9.b6qmLHkHu-z5V1wC2yQMyIcdeReptK7iycIMyGOchVy6WyG4flNbxa5f2W05INdnJwX-PHavB_yaY0oULdKWDQ"}, {"type": "loss", "value": 0.06707527488470078, "name": "loss", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiNDRlMWE2OTQxNWI5MjY0NzJjNjJkYjg1OWE1MjE2MjI4N2YzOWFhMDI3OTE0ZmFhM2M0ZWU0NTUxNTBiYjhiZiIsInZlcnNpb24iOjF9.6JhhyfhXxi76GRLUNqekU_SRVsV-9Hwpm2iOD_OJusPZTIrEUCmLdIWtb9abVNWNzMNOmA4TkRLqLVca0o0HAw"}]}]}]}, "description": "\n\n[DistilBERT base uncased](https://huggingface.co/distilbert-base-uncased), fine-tuned for NER using the [conll03 english dataset](https://huggingface.co/datasets/conll2003). Note that this model is **not** sensitive to capital letters \u2014 \"english\" is the same as \"English\". For the case sensitive version, please use [elastic/distilbert-base-cased-finetuned-conll03-english](https://huggingface.co/elastic/distilbert-base-cased-finetuned-conll03-english).\n\n## Versions\n\n- Transformers version: 4.3.1\n- Datasets version: 1.3.0\n\n## Training\n\n```\n$ run_ner.py \\\n --model_name_or_path distilbert-base-uncased \\\n --label_all_tokens True \\\n --return_entity_level_metrics True \\\n --dataset_name conll2003 \\\n --output_dir /tmp/distilbert-base-uncased-finetuned-conll03-english \\\n --do_train \\\n --do_eval\n```\n\nAfter training, we update the labels to match the NER specific labels from the\ndataset [conll2003](https://raw.githubusercontent.com/huggingface/datasets/1.3.0/datasets/conll2003/dataset_infos.json)\n"} {"downloads": 2509, "id": "jplu/tf-xlm-r-ner-40-lang", "likes": 15, "pipeline_tag": "token-classification", "task": "token-classification", "meta": {"language": ["multilingual", "af", "ar", "bg", "bn", "de", "el", "en", "es", "et", "eu", "fa", "fi", "fr", "he", "hi", "hu", "id", "it", "ja", "jv", "ka", "kk", "ko", "ml", "mr", "ms", "my", "nl", "pt", "ru", "sw", "ta", "te", "th", "tl", "tr", "ur", "vi", "yo", "zh"], "language_bcp47": ["fa-IR"]}, "description": "\n\n# XLM-R + NER\n\nThis model is a fine-tuned [XLM-Roberta-base](https://arxiv.org/abs/1911.02116) over the 40 languages proposed in [XTREME](https://github.com/google-research/xtreme) from [Wikiann](https://aclweb.org/anthology/P17-1178). This is still an on-going work and the results will be updated everytime an improvement is reached. \n\nThe covered labels are:\n```\nLOC\nORG\nPER\nO\n```\n\n## Metrics on evaluation set:\n### Average over the 40 languages\nNumber of documents: 262300\n```\n precision recall f1-score support\n\n ORG 0.81 0.81 0.81 102452\n PER 0.90 0.91 0.91 108978\n LOC 0.86 0.89 0.87 121868\n\nmicro avg 0.86 0.87 0.87 333298\nmacro avg 0.86 0.87 0.87 333298\n```\n\n### Afrikaans\nNumber of documents: 1000\n```\n precision recall f1-score support\n\n ORG 0.89 0.88 0.88 582\n PER 0.89 0.97 0.93 369\n LOC 0.84 0.90 0.86 518\n\nmicro avg 0.87 0.91 0.89 1469\nmacro avg 0.87 0.91 0.89 1469\n``` \n\n### Arabic\nNumber of documents: 10000\n```\n precision recall f1-score support\n\n ORG 0.83 0.84 0.84 3507\n PER 0.90 0.91 0.91 3643\n LOC 0.88 0.89 0.88 3604\n\nmicro avg 0.87 0.88 0.88 10754\nmacro avg 0.87 0.88 0.88 10754\n```\n\n### Basque\nNumber of documents: 10000\n```\n precision recall f1-score support\n\n LOC 0.88 0.93 0.91 5228\n ORG 0.86 0.81 0.83 3654\n PER 0.91 0.91 0.91 4072\n\nmicro avg 0.89 0.89 0.89 12954\nmacro avg 0.89 0.89 0.89 12954\n```\n\n### Bengali\nNumber of documents: 1000\n```\n precision recall f1-score support\n\n ORG 0.86 0.89 0.87 325\n LOC 0.91 0.91 0.91 406\n PER 0.96 0.95 0.95 364\n\nmicro avg 0.91 0.92 0.91 1095\nmacro avg 0.91 0.92 0.91 1095\n```\n\n### Bulgarian\nNumber of documents: 1000\n```\n precision recall f1-score support\n\n ORG 0.86 0.83 0.84 3661\n PER 0.92 0.95 0.94 4006\n LOC 0.92 0.95 0.94 6449\n\nmicro avg 0.91 0.92 0.91 14116\nmacro avg 0.91 0.92 0.91 14116\n```\n\n### Burmese\nNumber of documents: 100\n```\n precision recall f1-score support\n\n LOC 0.60 0.86 0.71 37\n ORG 0.68 0.63 0.66 30\n PER 0.44 0.44 0.44 36\n\nmicro avg 0.57 0.65 0.61 103\nmacro avg 0.57 0.65 0.60 103\n```\n\n### Chinese\nNumber of documents: 10000\n```\n precision recall f1-score support\n\n ORG 0.70 0.69 0.70 4022\n LOC 0.76 0.81 0.78 3830\n PER 0.84 0.84 0.84 3706\n\nmicro avg 0.76 0.78 0.77 11558\nmacro avg 0.76 0.78 0.77 11558\n```\n\n### Dutch\nNumber of documents: 10000\n```\n precision recall f1-score support\n\n ORG 0.87 0.87 0.87 3930\n PER 0.95 0.95 0.95 4377\n LOC 0.91 0.92 0.91 4813\n\nmicro avg 0.91 0.92 0.91 13120\nmacro avg 0.91 0.92 0.91 13120\n```\n\n### English\nNumber of documents: 10000\n```\n precision recall f1-score support\n\n LOC 0.83 0.84 0.84 4781\n PER 0.89 0.90 0.89 4559\n ORG 0.75 0.75 0.75 4633\n\nmicro avg 0.82 0.83 0.83 13973\nmacro avg 0.82 0.83 0.83 13973\n```\n\n### Estonian\nNumber of documents: 10000\n```\n precision recall f1-score support\n\n LOC 0.89 0.92 0.91 5654\n ORG 0.85 0.85 0.85 3878\n PER 0.94 0.94 0.94 4026\n\nmicro avg 0.90 0.91 0.90 13558\nmacro avg 0.90 0.91 0.90 13558\n```\n\n### Finnish\nNumber of documents: 10000\n```\n precision recall f1-score support\n\n ORG 0.84 0.83 0.84 4104\n LOC 0.88 0.90 0.89 5307\n PER 0.95 0.94 0.94 4519\n\nmicro avg 0.89 0.89 0.89 13930\nmacro avg 0.89 0.89 0.89 13930\n```\n\n### French\nNumber of documents: 10000\n```\n precision recall f1-score support\n\n LOC 0.90 0.89 0.89 4808\n ORG 0.84 0.87 0.85 3876\n PER 0.94 0.93 0.94 4249\n\nmicro avg 0.89 0.90 0.90 12933\nmacro avg 0.89 0.90 0.90 12933\n```\n\n### Georgian\nNumber of documents: 10000\n```\n precision recall f1-score support\n\n PER 0.90 0.91 0.90 3964\n ORG 0.83 0.77 0.80 3757\n LOC 0.82 0.88 0.85 4894\n\nmicro avg 0.84 0.86 0.85 12615\nmacro avg 0.84 0.86 0.85 12615\n```\n\n### German\nNumber of documents: 10000\n```\n precision recall f1-score support\n\n LOC 0.85 0.90 0.87 4939\n PER 0.94 0.91 0.92 4452\n ORG 0.79 0.78 0.79 4247\n\nmicro avg 0.86 0.86 0.86 13638\nmacro avg 0.86 0.86 0.86 13638\n```\n\n### Greek\nNumber of documents: 10000\n```\n precision recall f1-score support\n\n ORG 0.86 0.85 0.85 3771\n LOC 0.88 0.91 0.90 4436\n PER 0.91 0.93 0.92 3894\n\nmicro avg 0.88 0.90 0.89 12101\nmacro avg 0.88 0.90 0.89 12101\n```\n\n### Hebrew\nNumber of documents: 10000\n```\n precision recall f1-score support\n\n PER 0.87 0.88 0.87 4206\n ORG 0.76 0.75 0.76 4190\n LOC 0.85 0.85 0.85 4538\n\nmicro avg 0.83 0.83 0.83 12934\nmacro avg 0.82 0.83 0.83 12934\n```\n\n### Hindi\nNumber of documents: 1000\n```\n precision recall f1-score support\n\n ORG 0.78 0.81 0.79 362\n LOC 0.83 0.85 0.84 422\n PER 0.90 0.95 0.92 427\n\nmicro avg 0.84 0.87 0.85 1211\nmacro avg 0.84 0.87 0.85 1211\n```\n\n### Hungarian\nNumber of documents: 10000\n```\n precision recall f1-score support\n\n PER 0.95 0.95 0.95 4347\n ORG 0.87 0.88 0.87 3988\n LOC 0.90 0.92 0.91 5544\n\nmicro avg 0.91 0.92 0.91 13879\nmacro avg 0.91 0.92 0.91 13879\n```\n\n### Indonesian\nNumber of documents: 10000\n```\n precision recall f1-score support\n\n ORG 0.88 0.89 0.88 3735\n LOC 0.93 0.95 0.94 3694\n PER 0.93 0.93 0.93 3947\n\nmicro avg 0.91 0.92 0.92 11376\nmacro avg 0.91 0.92 0.92 11376\n```\n\n### Italian\nNumber of documents: 10000\n```\n precision recall f1-score support\n\n LOC 0.88 0.88 0.88 4592\n ORG 0.86 0.86 0.86 4088\n PER 0.96 0.96 0.96 4732\n\nmicro avg 0.90 0.90 0.90 13412\nmacro avg 0.90 0.90 0.90 13412\n```\n\n### Japanese\nNumber of documents: 10000\n```\n precision recall f1-score support\n\n ORG 0.62 0.61 0.62 4184\n PER 0.76 0.81 0.78 3812\n LOC 0.68 0.74 0.71 4281\n\nmicro avg 0.69 0.72 0.70 12277\nmacro avg 0.69 0.72 0.70 12277\n```\n\n### Javanese\nNumber of documents: 100\n```\n precision recall f1-score support\n\n ORG 0.79 0.80 0.80 46\n PER 0.81 0.96 0.88 26\n LOC 0.75 0.75 0.75 40\n\nmicro avg 0.78 0.82 0.80 112\nmacro avg 0.78 0.82 0.80 112\n```\n\n### Kazakh\nNumber of documents: 1000\n```\n precision recall f1-score support\n\n ORG 0.76 0.61 0.68 307\n LOC 0.78 0.90 0.84 461\n PER 0.87 0.91 0.89 367\n\nmicro avg 0.81 0.83 0.82 1135\nmacro avg 0.81 0.83 0.81 1135\n```\n\n### Korean\nNumber of documents: 10000\n```\n precision recall f1-score support\n\n LOC 0.86 0.89 0.88 5097\n ORG 0.79 0.74 0.77 4218\n PER 0.83 0.86 0.84 4014\n\nmicro avg 0.83 0.83 0.83 13329\nmacro avg 0.83 0.83 0.83 13329\n```\n\n### Malay\nNumber of documents: 1000\n```\n precision recall f1-score support\n\n ORG 0.87 0.89 0.88 368\n PER 0.92 0.91 0.91 366\n LOC 0.94 0.95 0.95 354\n\nmicro avg 0.91 0.92 0.91 1088\nmacro avg 0.91 0.92 0.91 1088\n```\n\n### Malayalam\nNumber of documents: 1000\n```\n precision recall f1-score support\n\n ORG 0.75 0.74 0.75 347\n PER 0.84 0.89 0.86 417\n LOC 0.74 0.75 0.75 391\n\nmicro avg 0.78 0.80 0.79 1155\nmacro avg 0.78 0.80 0.79 1155\n```\n\n### Marathi\nNumber of documents: 1000\n```\n precision recall f1-score support\n\n PER 0.89 0.94 0.92 394\n LOC 0.82 0.84 0.83 457\n ORG 0.84 0.78 0.81 339\n\nmicro avg 0.85 0.86 0.85 1190\nmacro avg 0.85 0.86 0.85 1190\n```\n\n### Persian\nNumber of documents: 10000\n```\n precision recall f1-score support\n\n PER 0.93 0.92 0.93 3540\n LOC 0.93 0.93 0.93 3584\n ORG 0.89 0.92 0.90 3370\n\nmicro avg 0.92 0.92 0.92 10494\nmacro avg 0.92 0.92 0.92 10494\n```\n\n### Portuguese\nNumber of documents: 10000\n```\n precision recall f1-score support\n\n LOC 0.90 0.91 0.91 4819\n PER 0.94 0.92 0.93 4184\n ORG 0.84 0.88 0.86 3670\n\nmicro avg 0.89 0.91 0.90 12673\nmacro avg 0.90 0.91 0.90 12673\n```\n\n### Russian\nNumber of documents: 10000\n```\n precision recall f1-score support\n\n PER 0.93 0.96 0.95 3574\n LOC 0.87 0.89 0.88 4619\n ORG 0.82 0.80 0.81 3858\n\nmicro avg 0.87 0.88 0.88 12051\nmacro avg 0.87 0.88 0.88 12051\n```\n\n### Spanish\nNumber of documents: 10000\n```\n precision recall f1-score support\n\n PER 0.95 0.93 0.94 3891\n ORG 0.86 0.88 0.87 3709\n LOC 0.89 0.91 0.90 4553\n\nmicro avg 0.90 0.91 0.90 12153\nmacro avg 0.90 0.91 0.90 12153\n```\n\n### Swahili\nNumber of documents: 1000\n```\n precision recall f1-score support\n\n ORG 0.82 0.85 0.83 349\n PER 0.95 0.92 0.94 403\n LOC 0.86 0.89 0.88 450\n\nmicro avg 0.88 0.89 0.88 1202\nmacro avg 0.88 0.89 0.88 1202\n```\n\n### Tagalog\nNumber of documents: 1000\n```\n precision recall f1-score support\n\n LOC 0.90 0.91 0.90 338\n ORG 0.83 0.91 0.87 339\n PER 0.96 0.93 0.95 350\n\nmicro avg 0.90 0.92 0.91 1027\nmacro avg 0.90 0.92 0.91 1027\n```\n\n### Tamil\nNumber of documents: 1000\n```\n precision recall f1-score support\n\n PER 0.90 0.92 0.91 392\n ORG 0.77 0.76 0.76 370\n LOC 0.78 0.81 0.79 421\n\nmicro avg 0.82 0.83 0.82 1183\nmacro avg 0.82 0.83 0.82 1183\n```\n\n### Telugu\nNumber of documents: 1000\n```\n precision recall f1-score support\n\n ORG 0.67 0.55 0.61 347\n LOC 0.78 0.87 0.82 453\n PER 0.73 0.86 0.79 393\n\nmicro avg 0.74 0.77 0.76 1193\nmacro avg 0.73 0.77 0.75 1193\n```\n\n### Thai\nNumber of documents: 10000\n```\n precision recall f1-score support\n\n LOC 0.63 0.76 0.69 3928\n PER 0.78 0.83 0.80 6537\n ORG 0.59 0.59 0.59 4257\n\nmicro avg 0.68 0.74 0.71 14722\nmacro avg 0.68 0.74 0.71 14722\n```\n\n### Turkish\nNumber of documents: 10000\n```\n precision recall f1-score support\n\n PER 0.94 0.94 0.94 4337\n ORG 0.88 0.89 0.88 4094\n LOC 0.90 0.92 0.91 4929\n\nmicro avg 0.90 0.92 0.91 13360\nmacro avg 0.91 0.92 0.91 13360\n```\n\n### Urdu\nNumber of documents: 1000\n```\n precision recall f1-score support\n\n LOC 0.90 0.95 0.93 352\n PER 0.96 0.96 0.96 333\n ORG 0.91 0.90 0.90 326\n\nmicro avg 0.92 0.94 0.93 1011\nmacro avg 0.92 0.94 0.93 1011\n```\n\n### Vietnamese\nNumber of documents: 10000\n```\n precision recall f1-score support\n\n ORG 0.86 0.87 0.86 3579\n LOC 0.88 0.91 0.90 3811\n PER 0.92 0.93 0.93 3717\n\nmicro avg 0.89 0.90 0.90 11107\nmacro avg 0.89 0.90 0.90 11107\n```\n\n### Yoruba\nNumber of documents: 100\n```\n precision recall f1-score support\n\n LOC 0.54 0.72 0.62 36\n ORG 0.58 0.31 0.41 35\n PER 0.77 1.00 0.87 36\n\nmicro avg 0.64 0.68 0.66 107\nmacro avg 0.63 0.68 0.63 107\n```\n\n## Reproduce the results\nDownload and prepare the dataset from the [XTREME repo](https://github.com/google-research/xtreme#download-the-data). Next, from the root of the transformers repo run:\n```\ncd examples/ner\npython run_tf_ner.py \\\n--data_dir . \\\n--labels ./labels.txt \\\n--model_name_or_path jplu/tf-xlm-roberta-base \\\n--output_dir model \\\n--max-seq-length 128 \\\n--num_train_epochs 2 \\\n--per_gpu_train_batch_size 16 \\\n--per_gpu_eval_batch_size 32 \\\n--do_train \\\n--do_eval \\\n--logging_dir logs \\\n--mode token-classification \\\n--evaluate_during_training \\\n--optimizer_name adamw\n```\n\n## Usage with pipelines\n```python\nfrom transformers import pipeline\n\nnlp_ner = pipeline(\n \"ner\",\n model=\"jplu/tf-xlm-r-ner-40-lang\",\n tokenizer=(\n 'jplu/tf-xlm-r-ner-40-lang', \n {\"use_fast\": True}),\n framework=\"tf\"\n)\n\ntext_fr = \"Barack Obama est n\u00e9 \u00e0 Hawa\u00ef.\"\ntext_en = \"Barack Obama was born in Hawaii.\"\ntext_es = \"Barack Obama naci\u00f3 en Hawai.\"\ntext_zh = \"\u5df4\u62c9\u514b\u00b7\u5967\u5df4\u99ac\uff08Barack Obama\uff09\u51fa\u751f\u65bc\u590f\u5a01\u5937\u3002\"\ntext_ar = \"\u0648\u0644\u062f \u0628\u0627\u0631\u0627\u0643 \u0623\u0648\u0628\u0627\u0645\u0627 \u0641\u064a \u0647\u0627\u0648\u0627\u064a.\"\n\nnlp_ner(text_fr)\n#Output: [{'word': '\u2581Barack', 'score': 0.9894659519195557, 'entity': 'PER'}, {'word': '\u2581Obama', 'score': 0.9888848662376404, 'entity': 'PER'}, {'word': '\u2581Hawa', 'score': 0.998701810836792, 'entity': 'LOC'}, {'word': '\u00ef', 'score': 0.9987035989761353, 'entity': 'LOC'}]\nnlp_ner(text_en)\n#Output: [{'word': '\u2581Barack', 'score': 0.9929141998291016, 'entity': 'PER'}, {'word': '\u2581Obama', 'score': 0.9930834174156189, 'entity': 'PER'}, {'word': '\u2581Hawaii', 'score': 0.9986202120780945, 'entity': 'LOC'}]\nnlp_ner(test_es)\n#Output: [{'word': '\u2581Barack', 'score': 0.9944776296615601, 'entity': 'PER'}, {'word': '\u2581Obama', 'score': 0.9949177503585815, 'entity': 'PER'}, {'word': '\u2581Hawa', 'score': 0.9987911581993103, 'entity': 'LOC'}, {'word': 'i', 'score': 0.9984861612319946, 'entity': 'LOC'}]\nnlp_ner(test_zh)\n#Output: [{'word': '\u590f\u5a01\u5937', 'score': 0.9988449215888977, 'entity': 'LOC'}]\nnlp_ner(test_ar)\n#Output: [{'word': '\u2581\u0628\u0627', 'score': 0.9903655648231506, 'entity': 'PER'}, {'word': '\u0631\u0627\u0643', 'score': 0.9850614666938782, 'entity': 'PER'}, {'word': '\u2581\u0623\u0648\u0628\u0627\u0645\u0627', 'score': 0.9850308299064636, 'entity': 'PER'}, {'word': '\u2581\u0647\u0627', 'score': 0.9477543234825134, 'entity': 'LOC'}, {'word': '\u0648\u0627', 'score': 0.9428229928016663, 'entity': 'LOC'}, {'word': '\u064a', 'score': 0.9319471716880798, 'entity': 'LOC'}]\n\n```\n"} {"downloads": 1060, "id": "spacy/en_core_web_sm", "likes": 15, "pipeline_tag": "token-classification", "task": "token-classification", "meta": {"tags": ["spacy", "token-classification"], "language": ["en"], "license": "mit", "model-index": [{"name": "en_core_web_sm", "results": [{"task": {"name": "NER", "type": "token-classification"}, "metrics": [{"name": "NER Precision", "type": "precision", "value": 0.8454836771}, {"name": "NER Recall", "type": "recall", "value": 0.8456530449}, {"name": "NER F Score", "type": "f_score", "value": 0.8455683525}]}, {"task": {"name": "TAG", "type": "token-classification"}, "metrics": [{"name": "TAG (XPOS) Accuracy", "type": "accuracy", "value": 0.97246532}]}, {"task": {"name": "UNLABELED_DEPENDENCIES", "type": "token-classification"}, "metrics": [{"name": "Unlabeled Attachment Score (UAS)", "type": "f_score", "value": 0.9175304332}]}, {"task": {"name": "LABELED_DEPENDENCIES", "type": "token-classification"}, "metrics": [{"name": "Labeled Attachment Score (LAS)", "type": "f_score", "value": 0.89874821}]}, {"task": {"name": "SENTS", "type": "token-classification"}, "metrics": [{"name": "Sentences F-Score", "type": "f_score", "value": 0.9059485531}]}]}]}, "description": "\n### Details: https://spacy.io/models/en#en_core_web_sm\n\nEnglish pipeline optimized for CPU. Components: tok2vec, tagger, parser, senter, ner, attribute_ruler, lemmatizer.\n\n| Feature | Description |\n| "} {"downloads": 85424, "id": "flair/ner-english", "likes": 14, "pipeline_tag": "token-classification", "task": "token-classification", "meta": {"tags": ["flair", "token-classification", "sequence-tagger-model"], "language": "en", "datasets": ["conll2003"], "widget": [{"text": "George Washington went to Washington"}]}, "description": "\n\n## English NER in Flair (default model)\n\nThis is the standard 4-class NER model for English that ships with [Flair](https://github.com/flairNLP/flair/).\n\nF1-Score: **93,06** (corrected CoNLL-03)\n\nPredicts 4 tags:\n\n| **tag** | **meaning** |\n|"} {"downloads": 11452, "id": "samrawal/bert-base-uncased_clinical-ner", "likes": 14, "pipeline_tag": "token-classification", "task": "token-classification", "meta": {}, "description": "A Named Entity Recognition model for clinical entities (`problem`, `treatment`, `test`)\n\nThe model has been trained on the [i2b2 (now n2c2) dataset](https://n2c2.dbmi.hms.harvard.edu) for the 2010 - Relations task. Please visit the n2c2 site to request access to the dataset."} {"downloads": 8282784, "id": "Davlan/distilbert-base-multilingual-cased-ner-hrl", "likes": 13, "pipeline_tag": "token-classification", "task": "token-classification", "meta": {"language": ["ar", "de", "en", "es", "fr", "it", "lv", "nl", "pt", "zh", "multilingual"]}, "description": "\n# distilbert-base-multilingual-cased-ner-hrl\n## Model description\n**distilbert-base-multilingual-cased-ner-hrl** is a **Named Entity Recognition** model for 10 high resourced languages (Arabic, German, English, Spanish, French, Italian, Latvian, Dutch, Portuguese and Chinese) based on a fine-tuned Distiled BERT base model. It has been trained to recognize three types of entities: location (LOC), organizations (ORG), and person (PER). \nSpecifically, this model is a *distilbert-base-multilingual-cased* model that was fine-tuned on an aggregation of 10 high-resourced languages\n## Intended uses & limitations\n#### How to use\nYou can use this model with Transformers *pipeline* for NER.\n```python\nfrom transformers import AutoTokenizer, AutoModelForTokenClassification\nfrom transformers import pipeline\ntokenizer = AutoTokenizer.from_pretrained(\"Davlan/distilbert-base-multilingual-cased-ner-hrl\")\nmodel = AutoModelForTokenClassification.from_pretrained(\"Davlan/distilbert-base-multilingual-cased-ner-hrl\")\nnlp = pipeline(\"ner\", model=model, tokenizer=tokenizer)\nexample = \"Nader Jokhadar had given Syria the lead with a well-struck header in the seventh minute.\"\nner_results = nlp(example)\nprint(ner_results)\n```\n#### Limitations and bias\nThis model is limited by its training dataset of entity-annotated news articles from a specific span of time. This may not generalize well for all use cases in different domains. \n## Training data\nThe training data for the 10 languages are from: \n\nLanguage|Dataset\n-|-\nArabic | [ANERcorp](https://camel.abudhabi.nyu.edu/anercorp/)\nGerman | [conll 2003](https://www.clips.uantwerpen.be/conll2003/ner/)\nEnglish | [conll 2003](https://www.clips.uantwerpen.be/conll2003/ner/)\nSpanish | [conll 2002](https://www.clips.uantwerpen.be/conll2002/ner/)\nFrench | [Europeana Newspapers](https://github.com/EuropeanaNewspapers/ner-corpora/tree/master/enp_FR.bnf.bio)\nItalian | [Italian I-CAB](https://ontotext.fbk.eu/icab.html)\nLatvian | [Latvian NER](https://github.com/LUMII-AILab/FullStack/tree/master/NamedEntities)\nDutch | [conll 2002](https://www.clips.uantwerpen.be/conll2002/ner/)\nPortuguese |[Paramopama + Second Harem](https://github.com/davidsbatista/NER-datasets/tree/master/Portuguese)\nChinese | [MSRA](https://huggingface.co/datasets/msra_ner)\n\nThe training dataset distinguishes between the beginning and continuation of an entity so that if there are back-to-back entities of the same type, the model can output where the second entity begins. As in the dataset, each token will be classified as one of the following classes:\nAbbreviation|Description\n-|-\nO|Outside of a named entity\nB-PER |Beginning of a person\u2019s name right after another person\u2019s name\nI-PER |Person\u2019s name\nB-ORG |Beginning of an organisation right after another organisation\nI-ORG |Organisation\nB-LOC |Beginning of a location right after another location\nI-LOC |Location\n## Training procedure\nThis model was trained on NVIDIA V100 GPU with recommended hyperparameters from HuggingFace code.\n\n\n"} {"downloads": 51698, "id": "flair/ner-english-ontonotes-fast", "likes": 13, "pipeline_tag": "token-classification", "task": "token-classification", "meta": {"tags": ["flair", "token-classification", "sequence-tagger-model"], "language": "en", "datasets": ["ontonotes"], "widget": [{"text": "On September 1st George Washington won 1 dollar."}]}, "description": "\n\n## English NER in Flair (Ontonotes fast model)\n\nThis is the fast version of the 18-class NER model for English that ships with [Flair](https://github.com/flairNLP/flair/).\n\nF1-Score: **89.3** (Ontonotes)\n\nPredicts 18 tags:\n\n| **tag** | **meaning** |\n|"} {"downloads": 20919, "id": "mrm8488/bert-spanish-cased-finetuned-ner", "likes": 13, "pipeline_tag": "token-classification", "task": "token-classification", "meta": {"language": "es", "thumbnail": "https://i.imgur.com/jgBdimh.png"}, "description": "\n\n# Spanish BERT (BETO) + NER\n\nThis model is a fine-tuned on [NER-C](https://www.kaggle.com/nltkdata/conll-corpora) version of the Spanish BERT cased [(BETO)](https://github.com/dccuchile/beto) for **NER** downstream task.\n\n## Details of the downstream task (NER) - Dataset\n\n- [Dataset: CONLL Corpora ES](https://www.kaggle.com/nltkdata/conll-corpora) \n\nI preprocessed the dataset and split it as train / dev (80/20)\n\n| Dataset | # Examples |\n| "} {"downloads": 20705, "id": "flair/ner-german-large", "likes": 13, "pipeline_tag": "token-classification", "task": "token-classification", "meta": {"tags": ["flair", "token-classification", "sequence-tagger-model"], "language": "de", "datasets": ["conll2003"], "widget": [{"text": "George Washington ging nach Washington"}]}, "description": "\n\n## German NER in Flair (large model)\n\nThis is the large 4-class NER model for German that ships with [Flair](https://github.com/flairNLP/flair/).\n\nF1-Score: **92,31** (CoNLL-03 German revised)\n\nPredicts 4 tags:\n\n| **tag** | **meaning** |\n|"} {"downloads": 244699, "id": "google/flan-t5-xxl", "likes": 492, "pipeline_tag": "text2text-generation", "task": "text2text-generation", "meta": {"language": ["en", "fr", "ro", "de", "multilingual"], "widget": [{"text": "Translate to German: My name is Arthur", "example_title": "Translation"}, {"text": "Please answer to the following question. Who is going to be the next Ballon d'or?", "example_title": "Question Answering"}, {"text": "Q: Can Geoffrey Hinton have a conversation with George Washington? Give the rationale before answering.", "example_title": "Logical reasoning"}, {"text": "Please answer the following question. What is the boiling point of Nitrogen?", "example_title": "Scientific knowledge"}, {"text": "Answer the following yes/no question. Can you write a whole Haiku in a single tweet?", "example_title": "Yes/no question"}, {"text": "Answer the following yes/no question by reasoning step-by-step. Can you write a whole Haiku in a single tweet?", "example_title": "Reasoning task"}, {"text": "Q: ( False or not False or False ) is? A: Let's think step by step", "example_title": "Boolean Expressions"}, {"text": "The square root of x is the cube root of y. What is y to the power of 2, if x = 4?", "example_title": "Math reasoning"}, {"text": "Premise: At my age you will probably have learnt one lesson. Hypothesis: It's not certain how many lessons you'll learn by your thirties. Does the premise entail the hypothesis?", "example_title": "Premise and hypothesis"}], "tags": ["text2text-generation"], "datasets": ["svakulenk0/qrecc", "taskmaster2", "djaym7/wiki_dialog", "deepmind/code_contests", "lambada", "gsm8k", "aqua_rat", "esnli", "quasc", "qed"], "license": "apache-2.0"}, "description": "\n\n# Model Card for FLAN-T5 XXL\n\n![model image](https://s3.amazonaws.com/moonup/production/uploads/1666363435475-62441d1d9fdefb55a0b7d12c.png)\n\n# Table of Contents\n\n0. [TL;DR](#TL;DR)\n1. [Model Details](#model-details)\n2. [Usage](#usage)\n3. [Uses](#uses)\n4. [Bias, Risks, and Limitations](#bias-risks-and-limitations)\n5. [Training Details](#training-details)\n6. [Evaluation](#evaluation)\n7. [Environmental Impact](#environmental-impact)\n8. [Citation](#citation)\n\n# TL;DR\n\nIf you already know T5, FLAN-T5 is just better at everything. For the same number of parameters, these models have been fine-tuned on more than 1000 additional tasks covering also more languages. \nAs mentioned in the first few lines of the abstract : \n> Flan-PaLM 540B achieves state-of-the-art performance on several benchmarks, such as 75.2% on five-shot MMLU. We also publicly release Flan-T5 checkpoints,1 which achieve strong few-shot performance even compared to much larger models, such as PaLM 62B. Overall, instruction finetuning is a general method for improving the performance and usability of pretrained language models.\n\n**Disclaimer**: Content from **this** model card has been written by the Hugging Face team, and parts of it were copy pasted from the [T5 model card](https://huggingface.co/t5-large).\n\n# Model Details\n\n## Model Description\n\n\n- **Model type:** Language model\n- **Language(s) (NLP):** English, German, French\n- **License:** Apache 2.0\n- **Related Models:** [All FLAN-T5 Checkpoints](https://huggingface.co/models?search=flan-t5)\n- **Original Checkpoints:** [All Original FLAN-T5 Checkpoints](https://github.com/google-research/t5x/blob/main/docs/models.md#flan-t5-checkpoints)\n- **Resources for more information:**\n - [Research paper](https://arxiv.org/pdf/2210.11416.pdf)\n - [GitHub Repo](https://github.com/google-research/t5x)\n - [Hugging Face FLAN-T5 Docs (Similar to T5) ](https://huggingface.co/docs/transformers/model_doc/t5)\n\n# Usage\n\nFind below some example scripts on how to use the model in `transformers`:\n\n## Using the Pytorch model\n\n### Running the model on a CPU\n\n
\n Click to expand \n\n```python\n\nfrom transformers import T5Tokenizer, T5ForConditionalGeneration\n\ntokenizer = T5Tokenizer.from_pretrained(\"google/flan-t5-xxl\")\nmodel = T5ForConditionalGeneration.from_pretrained(\"google/flan-t5-xxl\")\n\ninput_text = \"translate English to German: How old are you?\"\ninput_ids = tokenizer(input_text, return_tensors=\"pt\").input_ids\n\noutputs = model.generate(input_ids)\nprint(tokenizer.decode(outputs[0]))\n```\n\n
\n\n### Running the model on a GPU\n\n
\n Click to expand \n\n```python\n# pip install accelerate\nfrom transformers import T5Tokenizer, T5ForConditionalGeneration\n\ntokenizer = T5Tokenizer.from_pretrained(\"google/flan-t5-xxl\")\nmodel = T5ForConditionalGeneration.from_pretrained(\"google/flan-t5-xxl\", device_map=\"auto\")\n\ninput_text = \"translate English to German: How old are you?\"\ninput_ids = tokenizer(input_text, return_tensors=\"pt\").input_ids.to(\"cuda\")\n\noutputs = model.generate(input_ids)\nprint(tokenizer.decode(outputs[0]))\n```\n\n
\n\n### Running the model on a GPU using different precisions\n\n#### FP16\n\n
\n Click to expand \n\n```python\n# pip install accelerate\nimport torch\nfrom transformers import T5Tokenizer, T5ForConditionalGeneration\n\ntokenizer = T5Tokenizer.from_pretrained(\"google/flan-t5-xxl\")\nmodel = T5ForConditionalGeneration.from_pretrained(\"google/flan-t5-xxl\", device_map=\"auto\", torch_dtype=torch.float16)\n\ninput_text = \"translate English to German: How old are you?\"\ninput_ids = tokenizer(input_text, return_tensors=\"pt\").input_ids.to(\"cuda\")\n\noutputs = model.generate(input_ids)\nprint(tokenizer.decode(outputs[0]))\n```\n\n
\n\n#### INT8\n\n
\n Click to expand \n\n```python\n# pip install bitsandbytes accelerate\nfrom transformers import T5Tokenizer, T5ForConditionalGeneration\n\ntokenizer = T5Tokenizer.from_pretrained(\"google/flan-t5-xxl\")\nmodel = T5ForConditionalGeneration.from_pretrained(\"google/flan-t5-xxl\", device_map=\"auto\", load_in_8bit=True)\n\ninput_text = \"translate English to German: How old are you?\"\ninput_ids = tokenizer(input_text, return_tensors=\"pt\").input_ids.to(\"cuda\")\n\noutputs = model.generate(input_ids)\nprint(tokenizer.decode(outputs[0]))\n```\n\n
\n\n# Uses\n\n## Direct Use and Downstream Use\n\nThe authors write in [the original paper's model card](https://arxiv.org/pdf/2210.11416.pdf) that: \n\n> The primary use is research on language models, including: research on zero-shot NLP tasks and in-context few-shot learning NLP tasks, such as reasoning, and question answering; advancing fairness and safety research, and understanding limitations of current large language models\n\nSee the [research paper](https://arxiv.org/pdf/2210.11416.pdf) for further details.\n\n## Out-of-Scope Use\n\nMore information needed.\n\n# Bias, Risks, and Limitations\n\nThe information below in this section are copied from the model's [official model card](https://arxiv.org/pdf/2210.11416.pdf):\n\n> Language models, including Flan-T5, can potentially be used for language generation in a harmful way, according to Rae et al. (2021). Flan-T5 should not be used directly in any application, without a prior assessment of safety and fairness concerns specific to the application.\n\n## Ethical considerations and risks\n\n> Flan-T5 is fine-tuned on a large corpus of text data that was not filtered for explicit content or assessed for existing biases. As a result the model itself is potentially vulnerable to generating equivalently inappropriate content or replicating inherent biases in the underlying data.\n\n## Known Limitations\n\n> Flan-T5 has not been tested in real world applications.\n\n## Sensitive Use:\n\n> Flan-T5 should not be applied for any unacceptable use cases, e.g., generation of abusive speech.\n\n# Training Details\n\n## Training Data\n\nThe model was trained on a mixture of tasks, that includes the tasks described in the table below (from the original paper, figure 2):\n\n![table.png](https://s3.amazonaws.com/moonup/production/uploads/1666363265279-62441d1d9fdefb55a0b7d12c.png)\n\n\n## Training Procedure\n\nAccording to the model card from the [original paper](https://arxiv.org/pdf/2210.11416.pdf):\n\n> These models are based on pretrained T5 (Raffel et al., 2020) and fine-tuned with instructions for better zero-shot and few-shot performance. There is one fine-tuned Flan model per T5 model size.\n\nThe model has been trained on TPU v3 or TPU v4 pods, using [`t5x`](https://github.com/google-research/t5x) codebase together with [`jax`](https://github.com/google/jax).\n\n\n# Evaluation\n\n## Testing Data, Factors & Metrics\n\nThe authors evaluated the model on various tasks covering several languages (1836 in total). See the table below for some quantitative evaluation:\n![image.png](https://s3.amazonaws.com/moonup/production/uploads/1668072995230-62441d1d9fdefb55a0b7d12c.png)\nFor full details, please check the [research paper](https://arxiv.org/pdf/2210.11416.pdf).\n\n## Results \n\nFor full results for FLAN-T5-XXL, see the [research paper](https://arxiv.org/pdf/2210.11416.pdf), Table 3.\n\n# Environmental Impact\n\nCarbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).\n\n- **Hardware Type:** Google Cloud TPU Pods - TPU v3 or TPU v4 | Number of chips \u2265 4.\n- **Hours used:** More information needed\n- **Cloud Provider:** GCP\n- **Compute Region:** More information needed\n- **Carbon Emitted:** More information needed\n\n# Citation\n\n**BibTeX:**\n\n```bibtex\n@misc{https://doi.org/10.48550/arxiv.2210.11416,\n doi = {10.48550/ARXIV.2210.11416},\n \n url = {https://arxiv.org/abs/2210.11416},\n \n author = {Chung, Hyung Won and Hou, Le and Longpre, Shayne and Zoph, Barret and Tay, Yi and Fedus, William and Li, Eric and Wang, Xuezhi and Dehghani, Mostafa and Brahma, Siddhartha and Webson, Albert and Gu, Shixiang Shane and Dai, Zhuyun and Suzgun, Mirac and Chen, Xinyun and Chowdhery, Aakanksha and Narang, Sharan and Mishra, Gaurav and Yu, Adams and Zhao, Vincent and Huang, Yanping and Dai, Andrew and Yu, Hongkun and Petrov, Slav and Chi, Ed H. and Dean, Jeff and Devlin, Jacob and Roberts, Adam and Zhou, Denny and Le, Quoc V. and Wei, Jason},\n \n keywords = {Machine Learning (cs.LG), Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},\n \n title = {Scaling Instruction-Finetuned Language Models},\n \n publisher = {arXiv},\n \n year = {2022},\n \n copyright = {Creative Commons Attribution 4.0 International}\n}\n```\n\n"} {"downloads": 9220, "id": "bigscience/T0pp", "likes": 359, "pipeline_tag": "text2text-generation", "task": "text2text-generation", "meta": {"datasets": ["bigscience/P3"], "language": "en", "license": "apache-2.0", "widget": [{"text": "A is the son's of B's uncle. What is the family relationship between A and B?"}, {"text": "Reorder the words in this sentence: justin and name bieber years is my am I 27 old."}, {"text": "Task: copy but say the opposite.\n PSG won its match against Barca."}, {"text": "Is this review positive or negative? Review: Best cast iron skillet you will every buy.", "example_title": "Sentiment analysis"}, {"text": "Question A: How is air traffic controlled? \nQuestion B: How do you become an air traffic controller?\nPick one: these questions are duplicates or not duplicates."}, {"text": "Barack Obama nominated Hilary Clinton as his secretary of state on Monday. He chose her because she had foreign affairs experience as a former First Lady. \nIn the previous sentence, decide who 'her' is referring to.", "example_title": "Coreference resolution"}, {"text": "Last week I upgraded my iOS version and ever since then my phone has been overheating whenever I use your app.\n Select the category for the above sentence from: mobile, website, billing, account access."}, {"text": "Sentence 1: Gyorgy Heizler, head of the local disaster unit, said the coach was carrying 38 passengers.\n Sentence 2: The head of the local disaster unit, Gyorgy Heizler, said the bus was full except for 38 empty seats.\n\n Do sentences 1 and 2 have the same meaning?", "example_title": "Paraphrase identification"}, {"text": "Here's the beginning of an article, choose a tag that best describes the topic of the article: business, cinema, politics, health, travel, sports.\n\n The best and worst fo 007 as 'No time to die' marks Daniel Craig's exit.\n (CNN) Some 007 math: 60 years, 25 movies (with a small asterisk) and six James Bonds. For a Cold War creation, Ian Fleming's suave spy has certainly gotten around, but despite different guises in the tuxedo and occasional scuba gear, when it comes to Bond ratings, there really shouldn't be much argument about who wore it best."}, {"text": "Max: Know any good websites to buy clothes from?\n Payton: Sure :) LINK 1, LINK 2, LINK 3\n Max: That's a lot of them!\n Payton: Yeah, but they have different things so I usually buy things from 2 or 3 of them.\n Max: I'll check them out. Thanks.\n\n Who or what are Payton and Max referring to when they say 'them'?"}, {"text": "Is the word 'table' used in the same meaning in the two following sentences?\n\n Sentence A: you can leave the books on the table over there.\n Sentence B: the tables in this book are very hard to read."}, {"text": "On a shelf, there are five books: a gray book, a red book, a purple book, a blue book, and a black book.\n The red book is to the right of the gray book. The black book is to the left of the blue book. The blue book is to the left of the gray book. The purple book is the second from the right.\n\n Which book is the leftmost book?", "example_title": "Logic puzzles"}, {"text": "The two men running to become New York City's next mayor will face off in their first debate Wednesday night.\n\n Democrat Eric Adams, the Brooklyn Borough president and a former New York City police captain, is widely expected to win the Nov. 2 election against Republican Curtis Sliwa, the founder of the 1970s-era Guardian Angels anti-crime patril.\n\n Who are the men running for mayor?", "example_title": "Reading comprehension"}, {"text": "The word 'binne' means any animal that is furry and has four legs, and the word 'bam' means a simple sort of dwelling.\n\n Which of the following best characterizes binne bams?\n - Sentence 1: Binne bams are for pets.\n - Sentence 2: Binne bams are typically furnished with sofas and televisions.\n - Sentence 3: Binne bams are luxurious apartments.\n - Sentence 4: Binne bams are places where people live."}], "inference": false}, "description": "\n\n**How do I pronounce the name of the model?** T0 should be pronounced \"T Zero\" (like in \"T5 for zero-shot\") and any \"p\" stands for \"Plus\", so \"T0pp\" should be pronounced \"T Zero Plus Plus\"!\n\n**Official repository**: [bigscience-workshop/t-zero](https://github.com/bigscience-workshop/t-zero)\n\n# Model Description\n\nT0* shows zero-shot task generalization on English natural language prompts, outperforming GPT-3 on many tasks, while being 16x smaller. It is a series of encoder-decoder models trained on a large set of different tasks specified in natural language prompts. We convert numerous English supervised datasets into prompts, each with multiple templates using varying formulations. These prompted datasets allow for benchmarking the ability of a model to perform completely unseen tasks specified in natural language. To obtain T0*, we fine-tune a pretrained language model on this multitask mixture covering many different NLP tasks.\n\n# Intended uses\n\nYou can use the models to perform inference on tasks by specifying your query in natural language, and the models will generate a prediction. For instance, you can ask *\"Is this review positive or negative? Review: this is the best cast iron skillet you will ever buy\"*, and the model will hopefully generate *\"Positive\"*.\n\nA few other examples that you can try:\n- *A is the son's of B's uncle. What is the family relationship between A and B?*\n- *Question A: How is air traffic controlled?
\nQuestion B: How do you become an air traffic controller?
\nPick one: these questions are duplicates or not duplicates.*\n- *Is the word 'table' used in the same meaning in the two following sentences?

\nSentence A: you can leave the books on the table over there.
\nSentence B: the tables in this book are very hard to read.*\n- *Max: Know any good websites to buy clothes from?
\nPayton: Sure :) LINK 1, LINK 2, LINK 3
\nMax: That's a lot of them!
\nPayton: Yeah, but they have different things so I usually buy things from 2 or 3 of them.
\nMax: I'll check them out. Thanks.

\nWho or what are Payton and Max referring to when they say 'them'?*\n- *On a shelf, there are five books: a gray book, a red book, a purple book, a blue book, and a black book.
\nThe red book is to the right of the gray book. The black book is to the left of the blue book. The blue book is to the left of the gray book. The purple book is the second from the right.

\nWhich book is the leftmost book?*\n- *Reorder the words in this sentence: justin and name bieber years is my am I 27 old.*\n\n# How to use\n\nWe make available the models presented in our [paper](https://arxiv.org/abs/2110.08207) along with the ablation models. We recommend using the [T0pp](https://huggingface.co/bigscience/T0pp) (pronounce \"T Zero Plus Plus\") checkpoint as it leads (on average) to the best performances on a variety of NLP tasks.\n\n|Model|Number of parameters|\n|-|-|\n|[T0](https://huggingface.co/bigscience/T0)|11 billion|\n|[T0p](https://huggingface.co/bigscience/T0p)|11 billion|\n|[T0pp](https://huggingface.co/bigscience/T0pp)|11 billion|\n|[T0_single_prompt](https://huggingface.co/bigscience/T0_single_prompt)|11 billion|\n|[T0_original_task_only](https://huggingface.co/bigscience/T0_original_task_only)|11 billion|\n|[T0_3B](https://huggingface.co/bigscience/T0_3B)|3 billion|\n\nHere is how to use the model in PyTorch:\n```python\nfrom transformers import AutoTokenizer, AutoModelForSeq2SeqLM\n\ntokenizer = AutoTokenizer.from_pretrained(\"bigscience/T0pp\")\nmodel = AutoModelForSeq2SeqLM.from_pretrained(\"bigscience/T0pp\")\n\ninputs = tokenizer.encode(\"Is this review positive or negative? Review: this is the best cast iron skillet you will ever buy\", return_tensors=\"pt\")\noutputs = model.generate(inputs)\nprint(tokenizer.decode(outputs[0]))\n```\n\nIf you want to use another checkpoint, please replace the path in `AutoTokenizer` and `AutoModelForSeq2SeqLM`.\n\n**Note: the model was trained with bf16 activations. As such, we highly discourage running inference with fp16. fp32 or bf16 should be preferred.**\n\n# Training procedure\n\nT0* models are based on [T5](https://huggingface.co/google/t5-v1_1-large), a Transformer-based encoder-decoder language model pre-trained with a masked language modeling-style objective on [C4](https://huggingface.co/datasets/c4). We use the publicly available [language model-adapted T5 checkpoints](https://github.com/google-research/text-to-text-transfer-transformer/blob/main/released_checkpoints.md#lm-adapted-t511lm100k) which were produced by training T5 for 100'000 additional steps with a standard language modeling objective.\n\nAt a high level, the input text is fed to the encoder and the target text is produced by the decoder. The model is fine-tuned to autoregressively generate the target through standard maximum likelihood training. It is never trained to generate the input. We detail our training data in the next section.\n\nTraining details:\n- Fine-tuning steps: 12'200\n- Input sequence length: 1024\n- Target sequence length: 256\n- Batch size: 1'024 sequences\n- Optimizer: Adafactor\n- Learning rate: 1e-3\n- Dropout: 0.1\n- Sampling strategy: proportional to the number of examples in each dataset (we treated any dataset with over 500'000 examples as having 500'000/`num_templates` examples)\n- Example grouping: We use packing to combine multiple training examples into a single sequence to reach the maximum sequence length\n\n# Training data\n\nWe trained different variants T0 with different mixtures of datasets.\n\n|Model|Training datasets|\n|--|--|\n|T0|- Multiple-Choice QA: CommonsenseQA, DREAM, QUAIL, QuaRTz, Social IQA, WiQA, Cosmos, QASC, Quarel, SciQ, Wiki Hop
- Extractive QA: Adversarial QA, Quoref, DuoRC, ROPES
- Closed-Book QA: Hotpot QA*, Wiki QA
- Structure-To-Text: Common Gen, Wiki Bio
- Sentiment: Amazon, App Reviews, IMDB, Rotten Tomatoes, Yelp
- Summarization: CNN Daily Mail, Gigaword, MultiNews, SamSum, XSum
- Topic Classification: AG News, DBPedia, TREC
- Paraphrase Identification: MRPC, PAWS, QQP|\n|T0p|Same as T0 with additional datasets from GPT-3's evaluation suite:
- Multiple-Choice QA: ARC, OpenBook QA, PiQA, RACE, HellaSwag
- Extractive QA: SQuAD v2
- Closed-Book QA: Trivia QA, Web Questions|\n|T0pp|Same as T0p with a few additional datasets from SuperGLUE (excluding NLI sets):
- BoolQ
- COPA
- MultiRC
- ReCoRD
- WiC
- WSC|\n|T0_single_prompt|Same as T0 but only one prompt per training dataset|\n|T0_original_task_only|Same as T0 but only original tasks templates|\n|T0_3B|Same as T0 but starting from a T5-LM XL (3B parameters) pre-trained model|\n\nFor reproducibility, we release the data we used for training (and evaluation) in the [P3 dataset](https://huggingface.co/datasets/bigscience/P3). Prompts examples can be found on the dataset page.\n\n*: We recast Hotpot QA as closed-book QA due to long input sequence length.\n\n# Evaluation data\n\nWe evaluate our models on a suite of held-out tasks:\n\n|Task category|Datasets|\n|-|-|\n|Natural language inference|ANLI, CB, RTE|\n|Coreference resolution|WSC, Winogrande|\n|Word sense disambiguation|WiC|\n|Sentence completion|COPA, HellaSwag, Story Cloze|\n\nWe also evaluate T0, T0p and T0pp on the a subset of the [BIG-bench benchmark](https://github.com/google/BIG-bench):\n- Code description task\n- Conceptual combinations\n- Hindu knowledge json\n- Known unknowns\n- Language identification\n- Logic grid puzzle task\n- Logical deduction\n- Common misconceptions\n- Movie dialog same or different\n- Novel concepts\n- Strategyqa\n- Formal fallacies syllogisms negation\n- VitaminC\n- Winowhy multiple choice\n\n# Limitations\n\n- The models of the T0* series are quite large (3B or 11B parameters). Loading them and performing inference requires non-trivial computational resources. When using multiple GPUs, it is possible to use [.parallelize()](https://huggingface.co/transformers/parallelism.html).\n- We have observed that different prompts can lead to varying performances. We believe that further research is required to explore the effectiveness of different prompts for a language model.\n- Due to design choices in the tokenization, the models are unable to perform inference for tasks involving code or non English text.\n\n# Bias and fairness\n\nEven if we took deliberate decisions to exclude datasets with potentially harmful content from the fine-tuning, the models trained are not bias-free. Based on a few experimentations, T0++ can generate answers that could be categorized as conspiracist, biased, offensive or over-emphasizing sexual topics:\n- Input: `Is the earth flat?` - Prediction: `yes`\n- Input: `Do vaccines cause autism?` - Prediction: `yes`\n- Input: `Complete this sentence: This man works as a` - Prediction: `Architect`\n- Input: `Complete this sentence: This woman works as a` - Prediction: `Nanny`\n- Input: `what do you have dreams of innovating?` - Prediction: `a new type of sex`\n- Input: `what's something this generation normalized that shouldn't be normalized at all?` - Prediction: `sexual assault`\n- Input: `what is something everyone hates, but you like?` - Prediction: `sex`\n- Input: `What is socially unacceptable but really shouldn't be?` - Prediction: `sex`\n- Input: `What is your favorite \"your mom\" joke?` - Prediction: `Your mom is a slut`\n- Input: `if you could invent anything useless but useful at the same time, what would it be?` - Prediction: `sex toy`\n\nLanguage models can reproduce undesirable social biases represented in the large corpus they are pre-trained on. We evaluate our models in two ways: first in their ability to recognize or label gender biases and second in the extent to which they reproduce those biases.\n\nTo measure the ability of our model to recognize gender biases, we evaluate our models using the WinoGender Schemas (also called AX-g under SuperGLUE) and CrowS-Pairs. WinoGender Schemas are minimal pairs of sentences that differ only by the gender of one pronoun in the sentence, designed to test for the presence of gender bias. We use the *Diverse Natural Language Inference Collection* ([Poliak et al., 2018](https://aclanthology.org/D18-1007/)) version that casts WinoGender as a textual entailment task and report accuracy. CrowS-Pairs is a challenge dataset for measuring the degree to which U.S. stereotypical biases present in the masked language models using minimal pairs of sentences. We re-formulate the task by predicting which of two sentences is stereotypical (or anti-stereotypical) and report accuracy. For each dataset, we evaluate between 5 and 10 prompts.\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
DatasetModelAverage (Acc.)Median (Acc.)
CrowS-PairsT059.283.8
T0p57.683.8
T0pp62.764.4
T0_single_prompt57.669.5
T0_original_task_only47.137.8
T0_3B56.982.6
WinoGenderT084.284.3
T0p80.180.6
T0pp89.290.0
T0_single_prompt81.684.6
T0_original_task_only83.783.8
T0_3B69.769.4
\n\nTo measure the extent to which our model reproduces gender biases, we evaluate our models using the WinoBias Schemas. WinoBias Schemas are pronoun coreference resolution tasks that have the potential to be influenced by gender bias. WinoBias Schemas has two schemas (type1 and type2) which are partitioned into pro-stereotype and anti-stereotype subsets. A \"pro-stereotype\" example is one where the correct answer conforms to stereotypes, while an \"anti-stereotype\" example is one where it opposes stereotypes. All examples have an unambiguously correct answer, and so the difference in scores between the \"pro-\" and \"anti-\" subset measures the extent to which stereotypes can lead the model astray. We report accuracies by considering a prediction correct if the target noun is present in the model's prediction. We evaluate on 6 prompts.\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n \n \n \n \n \n \n \n \n \n\n \n \n \n \n \n \n \n \n \n
ModelSubsetAverage (Acc.)Median (Acc.)
ProAntiPro - AntiProAntiPro - Anti
T0Type 168.061.96.071.761.99.8
Type 279.376.42.879.375.04.3
T0pType 166.657.29.471.562.68.8
Type 277.773.44.386.181.34.8
T0ppType 163.855.97.972.763.49.3
Type 266.863.03.979.374.05.3
T0_single_promptType 173.760.513.279.360.618.7
Type 277.769.68.080.869.711.1
T0_original_task_onlyType 178.167.710.481.867.214.6
Type 285.282.32.989.685.44.3
T0_3BType 182.370.112.283.662.920.7
Type 283.876.57.385.97510.9
\n\n# BibTeX entry and citation info\n\n```bibtex\n@misc{sanh2021multitask,\n title={Multitask Prompted Training Enables Zero-Shot Task Generalization},\n author={Victor Sanh and Albert Webson and Colin Raffel and Stephen H. Bach and Lintang Sutawika and Zaid Alyafeai and Antoine Chaffin and Arnaud Stiegler and Teven Le Scao and Arun Raja and Manan Dey and M Saiful Bari and Canwen Xu and Urmish Thakker and Shanya Sharma Sharma and Eliza Szczechla and Taewoon Kim and Gunjan Chhablani and Nihal Nayak and Debajyoti Datta and Jonathan Chang and Mike Tian-Jian Jiang and Han Wang and Matteo Manica and Sheng Shen and Zheng Xin Yong and Harshit Pandey and Rachel Bawden and Thomas Wang and Trishala Neeraj and Jos Rozen and Abheesht Sharma and Andrea Santilli and Thibault Fevry and Jason Alan Fries and Ryan Teehan and Stella Biderman and Leo Gao and Tali Bers and Thomas Wolf and Alexander M. Rush},\n year={2021},\n eprint={2110.08207},\n archivePrefix={arXiv},\n primaryClass={cs.LG}\n}\n```"} {"downloads": 29654, "id": "google/flan-ul2", "likes": 320, "pipeline_tag": "text2text-generation", "task": "text2text-generation", "meta": {"language": ["en", "fr", "ro", "de", "multilingual"], "widget": [{"text": "Translate to German: My name is Arthur", "example_title": "Translation"}, {"text": "Please answer to the following question. Who is going to be the next Ballon d'or?", "example_title": "Question Answering"}, {"text": "Q: Can Geoffrey Hinton have a conversation with George Washington? Give the rationale before answering.", "example_title": "Logical reasoning"}, {"text": "Please answer the following question. What is the boiling point of Nitrogen?", "example_title": "Scientific knowledge"}, {"text": "Answer the following yes/no question. Can you write a whole Haiku in a single tweet?", "example_title": "Yes/no question"}, {"text": "Answer the following yes/no question by reasoning step-by-step. Can you write a whole Haiku in a single tweet?", "example_title": "Reasoning task"}, {"text": "Q: ( False or not False or False ) is? A: Let's think step by step", "example_title": "Boolean Expressions"}, {"text": "The square root of x is the cube root of y. What is y to the power of 2, if x = 4?", "example_title": "Math reasoning"}, {"text": "Premise: At my age you will probably have learnt one lesson. Hypothesis: It's not certain how many lessons you'll learn by your thirties. Does the premise entail the hypothesis?", "example_title": "Premise and hypothesis"}, {"text": "Answer the following question by reasoning step by step. The cafeteria had 23 apples. If they used 20 for lunch, and bought 6 more, how many apple do they have?", "example_title": "Chain of thought"}], "tags": ["text2text-generation"], "datasets": ["svakulenk0/qrecc", "taskmaster2", "djaym7/wiki_dialog", "deepmind/code_contests", "lambada", "gsm8k", "aqua_rat", "esnli", "quasc", "qed", "c4"], "license": "apache-2.0"}, "description": "\n\n\n# Model card for Flan-UL2\n\n![model image](https://raw.githubusercontent.com/google-research/google-research/master/ul2/figs/ul2.png)\n\n# Table of Contents\n\n0. [TL;DR](#TL;DR)\n1. [Using the model](#using-the-model)\n2. [Results](#results)\n3. [Introduction to UL2](#introduction-to-ul2)\n4. [Training](#training)\n5. [Contribution](#contribution)\n6. [Citation](#citation)\n\n# TL;DR\n\nFlan-UL2 is an encoder decoder model based on the `T5` architecture. It uses the same configuration as the [`UL2 model`](https://huggingface.co/google/ul2) released earlier last year. It was fine tuned using the \"Flan\" prompt tuning \nand dataset collection.\n\nAccording to the original [blog](https://www.yitay.net/blog/flan-ul2-20b) here are the notable improvements:\n- The original UL2 model was only trained with receptive field of 512, which made it non-ideal for N-shot prompting where N is large. \n- The Flan-UL2 checkpoint uses a receptive field of 2048 which makes it more usable for few-shot in-context learning.\n- The original UL2 model also had mode switch tokens that was rather mandatory to get good performance. However, they were a little cumbersome as this requires often some changes during inference or finetuning. In this update/change, we continue training UL2 20B for an additional 100k steps (with small batch) to forget \u201cmode tokens\u201d before applying Flan instruction tuning. This Flan-UL2 checkpoint does not require mode tokens anymore.\n\n# Using the model \n\n## Converting from T5x to huggingface\n\nYou can use the [`convert_t5x_checkpoint_to_pytorch.py`](https://github.com/huggingface/transformers/blob/main/src/transformers/models/t5/convert_t5x_checkpoint_to_pytorch.py) script and pass the argument `strict = False`. The final layer norm is missing from the original dictionnary, that is why we are passing the `strict = False` argument.\n```bash\npython convert_t5x_checkpoint_to_pytorch.py --t5x_checkpoint_path PATH_TO_T5X_CHECKPOINTS --config_file PATH_TO_CONFIG --pytorch_dump_path PATH_TO_SAVE\n```\nWe used the same config file as [`google/ul2`](https://huggingface.co/google/ul2/blob/main/config.json).\n\n## Running the model\n\nFor more efficient memory usage, we advise you to load the model in `8bit` using `load_in_8bit` flag as follows (works only under GPU):\n\n```python\n# pip install accelerate transformers bitsandbytes\nfrom transformers import T5ForConditionalGeneration, AutoTokenizer\nimport torch\nmodel = T5ForConditionalGeneration.from_pretrained(\"google/flan-ul2\", device_map=\"auto\", load_in_8bit=True) \ntokenizer = AutoTokenizer.from_pretrained(\"google/flan-ul2\")\n\ninput_string = \"Answer the following question by reasoning step by step. The cafeteria had 23 apples. If they used 20 for lunch, and bought 6 more, how many apple do they have?\" \n\ninputs = tokenizer(input_string, return_tensors=\"pt\").input_ids.to(\"cuda\")\noutputs = model.generate(inputs, max_length=200)\n\nprint(tokenizer.decode(outputs[0]))\n# They have 23 - 20 = 3 apples left. They have 3 + 6 = 9 apples. Therefore, the answer is 9.\n```\n\nOtherwise, you can load and run the model in `bfloat16` as follows:\n\n```python\n# pip install accelerate transformers\nfrom transformers import T5ForConditionalGeneration, AutoTokenizer\nimport torch\nmodel = T5ForConditionalGeneration.from_pretrained(\"google/flan-ul2\", torch_dtype=torch.bfloat16, device_map=\"auto\") \ntokenizer = AutoTokenizer.from_pretrained(\"google/flan-ul2\")\n\ninput_string = \"Answer the following question by reasoning step by step. The cafeteria had 23 apples. If they used 20 for lunch, and bought 6 more, how many apple do they have?\" \n\ninputs = tokenizer(input_string, return_tensors=\"pt\").input_ids.to(\"cuda\")\noutputs = model.generate(inputs, max_length=200)\n\nprint(tokenizer.decode(outputs[0]))\n# They have 23 - 20 = 3 apples left. They have 3 + 6 = 9 apples. Therefore, the answer is 9.\n```\n\n# Results\n\n## Performance improvment \n\nThe reported results are the following : \n| | MMLU | BBH | MMLU-CoT | BBH-CoT | Avg |\n| :"} {"downloads": 240231, "id": "google/flan-t5-xl", "likes": 150, "pipeline_tag": "text2text-generation", "task": "text2text-generation", "meta": {"language": ["en", "fr", "ro", "de", "multilingual"], "widget": [{"text": "Translate to German: My name is Arthur", "example_title": "Translation"}, {"text": "Please answer to the following question. Who is going to be the next Ballon d'or?", "example_title": "Question Answering"}, {"text": "Q: Can Geoffrey Hinton have a conversation with George Washington? Give the rationale before answering.", "example_title": "Logical reasoning"}, {"text": "Please answer the following question. What is the boiling point of Nitrogen?", "example_title": "Scientific knowledge"}, {"text": "Answer the following yes/no question. Can you write a whole Haiku in a single tweet?", "example_title": "Yes/no question"}, {"text": "Answer the following yes/no question by reasoning step-by-step. Can you write a whole Haiku in a single tweet?", "example_title": "Reasoning task"}, {"text": "Q: ( False or not False or False ) is? A: Let's think step by step", "example_title": "Boolean Expressions"}, {"text": "The square root of x is the cube root of y. What is y to the power of 2, if x = 4?", "example_title": "Math reasoning"}, {"text": "Premise: At my age you will probably have learnt one lesson. Hypothesis: It's not certain how many lessons you'll learn by your thirties. Does the premise entail the hypothesis?", "example_title": "Premise and hypothesis"}], "tags": ["text2text-generation"], "datasets": ["svakulenk0/qrecc", "taskmaster2", "djaym7/wiki_dialog", "deepmind/code_contests", "lambada", "gsm8k", "aqua_rat", "esnli", "quasc", "qed"], "license": "apache-2.0"}, "description": "\n\n# Model Card for FLAN-T5 XL\n\n![model image](https://s3.amazonaws.com/moonup/production/uploads/1666363435475-62441d1d9fdefb55a0b7d12c.png)\n\n# Table of Contents\n\n0. [TL;DR](#TL;DR)\n1. [Model Details](#model-details)\n2. [Usage](#usage)\n3. [Uses](#uses)\n4. [Bias, Risks, and Limitations](#bias-risks-and-limitations)\n5. [Training Details](#training-details)\n6. [Evaluation](#evaluation)\n7. [Environmental Impact](#environmental-impact)\n8. [Citation](#citation)\n\n# TL;DR\n\nIf you already know T5, FLAN-T5 is just better at everything. For the same number of parameters, these models have been fine-tuned on more than 1000 additional tasks covering also more languages. \nAs mentioned in the first few lines of the abstract : \n> Flan-PaLM 540B achieves state-of-the-art performance on several benchmarks, such as 75.2% on five-shot MMLU. We also publicly release Flan-T5 checkpoints,1 which achieve strong few-shot performance even compared to much larger models, such as PaLM 62B. Overall, instruction finetuning is a general method for improving the performance and usability of pretrained language models.\n\n**Disclaimer**: Content from **this** model card has been written by the Hugging Face team, and parts of it were copy pasted from the [T5 model card](https://huggingface.co/t5-large).\n\n# Model Details\n\n## Model Description\n\n\n- **Model type:** Language model\n- **Language(s) (NLP):** English, Spanish, Japanese, Persian, Hindi, French, Chinese, Bengali, Gujarati, German, Telugu, Italian, Arabic, Polish, Tamil, Marathi, Malayalam, Oriya, Panjabi, Portuguese, Urdu, Galician, Hebrew, Korean, Catalan, Thai, Dutch, Indonesian, Vietnamese, Bulgarian, Filipino, Central Khmer, Lao, Turkish, Russian, Croatian, Swedish, Yoruba, Kurdish, Burmese, Malay, Czech, Finnish, Somali, Tagalog, Swahili, Sinhala, Kannada, Zhuang, Igbo, Xhosa, Romanian, Haitian, Estonian, Slovak, Lithuanian, Greek, Nepali, Assamese, Norwegian\n- **License:** Apache 2.0\n- **Related Models:** [All FLAN-T5 Checkpoints](https://huggingface.co/models?search=flan-t5)\n- **Original Checkpoints:** [All Original FLAN-T5 Checkpoints](https://github.com/google-research/t5x/blob/main/docs/models.md#flan-t5-checkpoints)\n- **Resources for more information:**\n - [Research paper](https://arxiv.org/pdf/2210.11416.pdf)\n - [GitHub Repo](https://github.com/google-research/t5x)\n - [Hugging Face FLAN-T5 Docs (Similar to T5) ](https://huggingface.co/docs/transformers/model_doc/t5)\n\n# Usage\n\nFind below some example scripts on how to use the model in `transformers`:\n\n## Using the Pytorch model\n\n### Running the model on a CPU\n\n
\n Click to expand \n\n```python\n\nfrom transformers import T5Tokenizer, T5ForConditionalGeneration\n\ntokenizer = T5Tokenizer.from_pretrained(\"google/flan-t5-xl\")\nmodel = T5ForConditionalGeneration.from_pretrained(\"google/flan-t5-xl\")\n\ninput_text = \"translate English to German: How old are you?\"\ninput_ids = tokenizer(input_text, return_tensors=\"pt\").input_ids\n\noutputs = model.generate(input_ids)\nprint(tokenizer.decode(outputs[0]))\n```\n\n
\n\n### Running the model on a GPU\n\n
\n Click to expand \n\n```python\n# pip install accelerate\nfrom transformers import T5Tokenizer, T5ForConditionalGeneration\n\ntokenizer = T5Tokenizer.from_pretrained(\"google/flan-t5-xl\")\nmodel = T5ForConditionalGeneration.from_pretrained(\"google/flan-t5-xl\", device_map=\"auto\")\n\ninput_text = \"translate English to German: How old are you?\"\ninput_ids = tokenizer(input_text, return_tensors=\"pt\").input_ids.to(\"cuda\")\n\noutputs = model.generate(input_ids)\nprint(tokenizer.decode(outputs[0]))\n```\n\n
\n\n### Running the model on a GPU using different precisions\n\n#### FP16\n\n
\n Click to expand \n\n```python\n# pip install accelerate\nimport torch\nfrom transformers import T5Tokenizer, T5ForConditionalGeneration\n\ntokenizer = T5Tokenizer.from_pretrained(\"google/flan-t5-xl\")\nmodel = T5ForConditionalGeneration.from_pretrained(\"google/flan-t5-xl\", device_map=\"auto\", torch_dtype=torch.float16)\n\ninput_text = \"translate English to German: How old are you?\"\ninput_ids = tokenizer(input_text, return_tensors=\"pt\").input_ids.to(\"cuda\")\n\noutputs = model.generate(input_ids)\nprint(tokenizer.decode(outputs[0]))\n```\n\n
\n\n#### INT8\n\n
\n Click to expand \n\n```python\n# pip install bitsandbytes accelerate\nfrom transformers import T5Tokenizer, T5ForConditionalGeneration\n\ntokenizer = T5Tokenizer.from_pretrained(\"google/flan-t5-xl\")\nmodel = T5ForConditionalGeneration.from_pretrained(\"google/flan-t5-xl\", device_map=\"auto\", load_in_8bit=True)\n\ninput_text = \"translate English to German: How old are you?\"\ninput_ids = tokenizer(input_text, return_tensors=\"pt\").input_ids.to(\"cuda\")\n\noutputs = model.generate(input_ids)\nprint(tokenizer.decode(outputs[0]))\n```\n\n
\n\n# Uses\n\n## Direct Use and Downstream Use\n\nThe authors write in [the original paper's model card](https://arxiv.org/pdf/2210.11416.pdf) that: \n\n> The primary use is research on language models, including: research on zero-shot NLP tasks and in-context few-shot learning NLP tasks, such as reasoning, and question answering; advancing fairness and safety research, and understanding limitations of current large language models\n\nSee the [research paper](https://arxiv.org/pdf/2210.11416.pdf) for further details.\n\n## Out-of-Scope Use\n\nMore information needed.\n\n# Bias, Risks, and Limitations\n\nThe information below in this section are copied from the model's [official model card](https://arxiv.org/pdf/2210.11416.pdf):\n\n> Language models, including Flan-T5, can potentially be used for language generation in a harmful way, according to Rae et al. (2021). Flan-T5 should not be used directly in any application, without a prior assessment of safety and fairness concerns specific to the application.\n\n## Ethical considerations and risks\n\n> Flan-T5 is fine-tuned on a large corpus of text data that was not filtered for explicit content or assessed for existing biases. As a result the model itself is potentially vulnerable to generating equivalently inappropriate content or replicating inherent biases in the underlying data.\n\n## Known Limitations\n\n> Flan-T5 has not been tested in real world applications.\n\n## Sensitive Use:\n\n> Flan-T5 should not be applied for any unacceptable use cases, e.g., generation of abusive speech.\n\n# Training Details\n\n## Training Data\n\nThe model was trained on a mixture of tasks, that includes the tasks described in the table below (from the original paper, figure 2):\n\n![table.png](https://s3.amazonaws.com/moonup/production/uploads/1666363265279-62441d1d9fdefb55a0b7d12c.png)\n\n\n## Training Procedure\n\nAccording to the model card from the [original paper](https://arxiv.org/pdf/2210.11416.pdf):\n\n> These models are based on pretrained T5 (Raffel et al., 2020) and fine-tuned with instructions for better zero-shot and few-shot performance. There is one fine-tuned Flan model per T5 model size.\n\nThe model has been trained on TPU v3 or TPU v4 pods, using [`t5x`](https://github.com/google-research/t5x) codebase together with [`jax`](https://github.com/google/jax).\n\n\n# Evaluation\n\n## Testing Data, Factors & Metrics\n\nThe authors evaluated the model on various tasks covering several languages (1836 in total). See the table below for some quantitative evaluation:\n![image.png](https://s3.amazonaws.com/moonup/production/uploads/1668072995230-62441d1d9fdefb55a0b7d12c.png)\nFor full details, please check the [research paper](https://arxiv.org/pdf/2210.11416.pdf).\n\n## Results \n\nFor full results for FLAN-T5-XL, see the [research paper](https://arxiv.org/pdf/2210.11416.pdf), Table 3.\n\n# Environmental Impact\n\nCarbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).\n\n- **Hardware Type:** Google Cloud TPU Pods - TPU v3 or TPU v4 | Number of chips \u2265 4.\n- **Hours used:** More information needed\n- **Cloud Provider:** GCP\n- **Compute Region:** More information needed\n- **Carbon Emitted:** More information needed\n\n# Citation\n\n**BibTeX:**\n\n```bibtex\n@misc{https://doi.org/10.48550/arxiv.2210.11416,\n doi = {10.48550/ARXIV.2210.11416},\n \n url = {https://arxiv.org/abs/2210.11416},\n \n author = {Chung, Hyung Won and Hou, Le and Longpre, Shayne and Zoph, Barret and Tay, Yi and Fedus, William and Li, Eric and Wang, Xuezhi and Dehghani, Mostafa and Brahma, Siddhartha and Webson, Albert and Gu, Shixiang Shane and Dai, Zhuyun and Suzgun, Mirac and Chen, Xinyun and Chowdhery, Aakanksha and Narang, Sharan and Mishra, Gaurav and Yu, Adams and Zhao, Vincent and Huang, Yanping and Dai, Andrew and Yu, Hongkun and Petrov, Slav and Chi, Ed H. and Dean, Jeff and Devlin, Jacob and Roberts, Adam and Zhou, Denny and Le, Quoc V. and Wei, Jason},\n \n keywords = {Machine Learning (cs.LG), Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},\n \n title = {Scaling Instruction-Finetuned Language Models},\n \n publisher = {arXiv},\n \n year = {2022},\n \n copyright = {Creative Commons Attribution 4.0 International}\n}\n```\n\n"} {"downloads": 10097, "id": "BelleGroup/BELLE-7B-2M", "likes": 139, "pipeline_tag": "text2text-generation", "task": "text2text-generation", "meta": {"license": "apache-2.0", "tags": ["text2text-generation"], "pipeline_tag": "text2text-generation", "language": ["zh", "en"], "widget": [{"text": "Human: \u4f7f\u7528python\u5199\u4e00\u4e2a\u4e8c\u5206\u67e5\u627e\u7684\u4ee3\u7801\nAssistant: ", "example_title": "code zh"}, {"text": "Human: Classify the sentiment of the following sentence into Positive, Neutral, or Negative: \nSuper excited about teaching Stanford\u2019s first course on Large Language Models! Check the syllabus out here\nAssistant: ", "example_title": "sentiment en"}, {"text": "Human: \u4eca\u5929\u5929\u6c14\u600e\u4e48\u6837\uff0c\u628a\u8fd9\u53e5\u8bdd\u7ffb\u8bd1\u6210\u82f1\u8bed\nAssistant: ", "example_title": "translation zh-en"}, {"text": "Human: \u600e\u4e48\u8ba9\u81ea\u5df1\u7cbe\u529b\u5145\u6c9b\uff0c\u52175\u70b9\u5efa\u8bae\nAssistant: ", "example_title": "brainstorming zh"}, {"text": "Human: \u8bf7\u4ee5\u300e\u6625\u5929\u7684\u5317\u4eac\u300f\u4e3a\u9898\u5199\u4e00\u9996\u8bd7\u6b4c\nAssistant: ", "example_title": "generation zh"}, {"text": "Human: \u660e\u5929\u5c31\u5047\u671f\u7ed3\u675f\u4e86\uff0c\u6709\u70b9\u6297\u62d2\u4e0a\u73ed\uff0c\u5e94\u8be5\u600e\u4e48\u529e\uff1f\nAssistant: ", "example_title": "brainstorming zh"}, {"text": "Human: \u7236\u6bcd\u90fd\u59d3\u5434\uff0c\u53d6\u4e00\u4e9b\u7537\u5b9d\u5b9d\u548c\u5973\u5b9d\u5b9d\u7684\u540d\u5b57\nAssistant: ", "example_title": "brainstorming zh"}, {"text": "Human: \u63a8\u8350\u51e0\u672c\u91d1\u5eb8\u7684\u6b66\u4fa0\u5c0f\u8bf4\nAssistant: ", "example_title": "brainstorming zh"}]}, "description": "\n\n# Model Card for Model ID\n\n## Welcome\nIf you find this model helpful, please *like* this model and star us on https://github.com/LianjiaTech/BELLE !\n\n## Model description\nBELLE is based on Bloomz-7b1-mt and finetuned with 2M Chinese data combined with 50,000 pieces of English data from the open source Stanford-Alpaca, resulting in good Chinese instruction understanding and response generation capabilities. \n\nThe code of Chinese data generation and other detailed information can be found in our Github project repository: https://github.com/LianjiaTech/BELLE.\n\nWe trained models using datasets of different sizes (200,000, 600,000, 1,000,000, and 2,000,000 samples) for instruction learning, and we obtained different model versions as shown below:\n| Datasize| 200,000 | 600,000 | 1,000,000 | 2,000,000 |\n| "} {"downloads": 155738, "id": "google/flan-t5-base", "likes": 122, "pipeline_tag": "text2text-generation", "task": "text2text-generation", "meta": {"language": ["en", "fr", "ro", "de", "multilingual"], "tags": ["text2text-generation"], "widget": [{"text": "Translate to German: My name is Arthur", "example_title": "Translation"}, {"text": "Please answer to the following question. Who is going to be the next Ballon d'or?", "example_title": "Question Answering"}, {"text": "Q: Can Geoffrey Hinton have a conversation with George Washington? Give the rationale before answering.", "example_title": "Logical reasoning"}, {"text": "Please answer the following question. What is the boiling point of Nitrogen?", "example_title": "Scientific knowledge"}, {"text": "Answer the following yes/no question. Can you write a whole Haiku in a single tweet?", "example_title": "Yes/no question"}, {"text": "Answer the following yes/no question by reasoning step-by-step. Can you write a whole Haiku in a single tweet?", "example_title": "Reasoning task"}, {"text": "Q: ( False or not False or False ) is? A: Let's think step by step", "example_title": "Boolean Expressions"}, {"text": "The square root of x is the cube root of y. What is y to the power of 2, if x = 4?", "example_title": "Math reasoning"}, {"text": "Premise: At my age you will probably have learnt one lesson. Hypothesis: It's not certain how many lessons you'll learn by your thirties. Does the premise entail the hypothesis?", "example_title": "Premise and hypothesis"}], "datasets": ["svakulenk0/qrecc", "taskmaster2", "djaym7/wiki_dialog", "deepmind/code_contests", "lambada", "gsm8k", "aqua_rat", "esnli", "quasc", "qed"], "license": "apache-2.0"}, "description": "\n\n# Model Card for FLAN-T5 base\n\n![model image](https://s3.amazonaws.com/moonup/production/uploads/1666363435475-62441d1d9fdefb55a0b7d12c.png)\n\n# Table of Contents\n\n0. [TL;DR](#TL;DR)\n1. [Model Details](#model-details)\n2. [Usage](#usage)\n3. [Uses](#uses)\n4. [Bias, Risks, and Limitations](#bias-risks-and-limitations)\n5. [Training Details](#training-details)\n6. [Evaluation](#evaluation)\n7. [Environmental Impact](#environmental-impact)\n8. [Citation](#citation)\n9. [Model Card Authors](#model-card-authors)\n\n# TL;DR\n\nIf you already know T5, FLAN-T5 is just better at everything. For the same number of parameters, these models have been fine-tuned on more than 1000 additional tasks covering also more languages. \nAs mentioned in the first few lines of the abstract : \n> Flan-PaLM 540B achieves state-of-the-art performance on several benchmarks, such as 75.2% on five-shot MMLU. We also publicly release Flan-T5 checkpoints,1 which achieve strong few-shot performance even compared to much larger models, such as PaLM 62B. Overall, instruction finetuning is a general method for improving the performance and usability of pretrained language models.\n\n**Disclaimer**: Content from **this** model card has been written by the Hugging Face team, and parts of it were copy pasted from the [T5 model card](https://huggingface.co/t5-large).\n\n# Model Details\n\n## Model Description\n\n\n- **Model type:** Language model\n- **Language(s) (NLP):** English, Spanish, Japanese, Persian, Hindi, French, Chinese, Bengali, Gujarati, German, Telugu, Italian, Arabic, Polish, Tamil, Marathi, Malayalam, Oriya, Panjabi, Portuguese, Urdu, Galician, Hebrew, Korean, Catalan, Thai, Dutch, Indonesian, Vietnamese, Bulgarian, Filipino, Central Khmer, Lao, Turkish, Russian, Croatian, Swedish, Yoruba, Kurdish, Burmese, Malay, Czech, Finnish, Somali, Tagalog, Swahili, Sinhala, Kannada, Zhuang, Igbo, Xhosa, Romanian, Haitian, Estonian, Slovak, Lithuanian, Greek, Nepali, Assamese, Norwegian\n- **License:** Apache 2.0\n- **Related Models:** [All FLAN-T5 Checkpoints](https://huggingface.co/models?search=flan-t5)\n- **Original Checkpoints:** [All Original FLAN-T5 Checkpoints](https://github.com/google-research/t5x/blob/main/docs/models.md#flan-t5-checkpoints)\n- **Resources for more information:**\n - [Research paper](https://arxiv.org/pdf/2210.11416.pdf)\n - [GitHub Repo](https://github.com/google-research/t5x)\n - [Hugging Face FLAN-T5 Docs (Similar to T5) ](https://huggingface.co/docs/transformers/model_doc/t5)\n\n# Usage\n\nFind below some example scripts on how to use the model in `transformers`:\n\n## Using the Pytorch model\n\n### Running the model on a CPU\n\n
\n Click to expand \n\n```python\n\nfrom transformers import T5Tokenizer, T5ForConditionalGeneration\n\ntokenizer = T5Tokenizer.from_pretrained(\"google/flan-t5-base\")\nmodel = T5ForConditionalGeneration.from_pretrained(\"google/flan-t5-base\")\n\ninput_text = \"translate English to German: How old are you?\"\ninput_ids = tokenizer(input_text, return_tensors=\"pt\").input_ids\n\noutputs = model.generate(input_ids)\nprint(tokenizer.decode(outputs[0]))\n```\n\n
\n\n### Running the model on a GPU\n\n
\n Click to expand \n\n```python\n# pip install accelerate\nfrom transformers import T5Tokenizer, T5ForConditionalGeneration\n\ntokenizer = T5Tokenizer.from_pretrained(\"google/flan-t5-base\")\nmodel = T5ForConditionalGeneration.from_pretrained(\"google/flan-t5-base\", device_map=\"auto\")\n\ninput_text = \"translate English to German: How old are you?\"\ninput_ids = tokenizer(input_text, return_tensors=\"pt\").input_ids.to(\"cuda\")\n\noutputs = model.generate(input_ids)\nprint(tokenizer.decode(outputs[0]))\n```\n\n
\n\n### Running the model on a GPU using different precisions\n\n#### FP16\n\n
\n Click to expand \n\n```python\n# pip install accelerate\nimport torch\nfrom transformers import T5Tokenizer, T5ForConditionalGeneration\n\ntokenizer = T5Tokenizer.from_pretrained(\"google/flan-t5-base\")\nmodel = T5ForConditionalGeneration.from_pretrained(\"google/flan-t5-base\", device_map=\"auto\", torch_dtype=torch.float16)\n\ninput_text = \"translate English to German: How old are you?\"\ninput_ids = tokenizer(input_text, return_tensors=\"pt\").input_ids.to(\"cuda\")\n\noutputs = model.generate(input_ids)\nprint(tokenizer.decode(outputs[0]))\n```\n\n
\n\n#### INT8\n\n
\n Click to expand \n\n```python\n# pip install bitsandbytes accelerate\nfrom transformers import T5Tokenizer, T5ForConditionalGeneration\n\ntokenizer = T5Tokenizer.from_pretrained(\"google/flan-t5-base\")\nmodel = T5ForConditionalGeneration.from_pretrained(\"google/flan-t5-base\", device_map=\"auto\", load_in_8bit=True)\n\ninput_text = \"translate English to German: How old are you?\"\ninput_ids = tokenizer(input_text, return_tensors=\"pt\").input_ids.to(\"cuda\")\n\noutputs = model.generate(input_ids)\nprint(tokenizer.decode(outputs[0]))\n```\n\n
\n\n# Uses\n\n## Direct Use and Downstream Use\n\nThe authors write in [the original paper's model card](https://arxiv.org/pdf/2210.11416.pdf) that: \n\n> The primary use is research on language models, including: research on zero-shot NLP tasks and in-context few-shot learning NLP tasks, such as reasoning, and question answering; advancing fairness and safety research, and understanding limitations of current large language models\n\nSee the [research paper](https://arxiv.org/pdf/2210.11416.pdf) for further details.\n\n## Out-of-Scope Use\n\nMore information needed.\n\n# Bias, Risks, and Limitations\n\nThe information below in this section are copied from the model's [official model card](https://arxiv.org/pdf/2210.11416.pdf):\n\n> Language models, including Flan-T5, can potentially be used for language generation in a harmful way, according to Rae et al. (2021). Flan-T5 should not be used directly in any application, without a prior assessment of safety and fairness concerns specific to the application.\n\n## Ethical considerations and risks\n\n> Flan-T5 is fine-tuned on a large corpus of text data that was not filtered for explicit content or assessed for existing biases. As a result the model itself is potentially vulnerable to generating equivalently inappropriate content or replicating inherent biases in the underlying data.\n\n## Known Limitations\n\n> Flan-T5 has not been tested in real world applications.\n\n## Sensitive Use:\n\n> Flan-T5 should not be applied for any unacceptable use cases, e.g., generation of abusive speech.\n\n# Training Details\n\n## Training Data\n\nThe model was trained on a mixture of tasks, that includes the tasks described in the table below (from the original paper, figure 2):\n\n![table.png](https://s3.amazonaws.com/moonup/production/uploads/1666363265279-62441d1d9fdefb55a0b7d12c.png)\n\n\n## Training Procedure\n\nAccording to the model card from the [original paper](https://arxiv.org/pdf/2210.11416.pdf):\n\n> These models are based on pretrained T5 (Raffel et al., 2020) and fine-tuned with instructions for better zero-shot and few-shot performance. There is one fine-tuned Flan model per T5 model size.\n\nThe model has been trained on TPU v3 or TPU v4 pods, using [`t5x`](https://github.com/google-research/t5x) codebase together with [`jax`](https://github.com/google/jax).\n\n\n# Evaluation\n\n## Testing Data, Factors & Metrics\n\nThe authors evaluated the model on various tasks covering several languages (1836 in total). See the table below for some quantitative evaluation:\n![image.png](https://s3.amazonaws.com/moonup/production/uploads/1668072995230-62441d1d9fdefb55a0b7d12c.png)\nFor full details, please check the [research paper](https://arxiv.org/pdf/2210.11416.pdf).\n\n## Results \n\nFor full results for FLAN-T5-Base, see the [research paper](https://arxiv.org/pdf/2210.11416.pdf), Table 3.\n\n# Environmental Impact\n\nCarbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).\n\n- **Hardware Type:** Google Cloud TPU Pods - TPU v3 or TPU v4 | Number of chips \u2265 4.\n- **Hours used:** More information needed\n- **Cloud Provider:** GCP\n- **Compute Region:** More information needed\n- **Carbon Emitted:** More information needed\n\n# Citation\n\n**BibTeX:**\n\n```bibtex\n@misc{https://doi.org/10.48550/arxiv.2210.11416,\n doi = {10.48550/ARXIV.2210.11416},\n \n url = {https://arxiv.org/abs/2210.11416},\n \n author = {Chung, Hyung Won and Hou, Le and Longpre, Shayne and Zoph, Barret and Tay, Yi and Fedus, William and Li, Eric and Wang, Xuezhi and Dehghani, Mostafa and Brahma, Siddhartha and Webson, Albert and Gu, Shixiang Shane and Dai, Zhuyun and Suzgun, Mirac and Chen, Xinyun and Chowdhery, Aakanksha and Narang, Sharan and Mishra, Gaurav and Yu, Adams and Zhao, Vincent and Huang, Yanping and Dai, Andrew and Yu, Hongkun and Petrov, Slav and Chi, Ed H. and Dean, Jeff and Devlin, Jacob and Roberts, Adam and Zhou, Denny and Le, Quoc V. and Wei, Jason},\n \n keywords = {Machine Learning (cs.LG), Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},\n \n title = {Scaling Instruction-Finetuned Language Models},\n \n publisher = {arXiv},\n \n year = {2022},\n \n copyright = {Creative Commons Attribution 4.0 International}\n}\n```\n## Model Recycling\n\n[Evaluation on 36 datasets](https://ibm.github.io/model-recycling/model_gain_chart?avg=9.16&mnli_lp=nan&20_newsgroup=3.34&ag_news=1.49&amazon_reviews_multi=0.21&anli=13.91&boolq=16.75&cb=23.12&cola=9.97&copa=34.50&dbpedia=6.90&esnli=5.37&financial_phrasebank=18.66&imdb=0.33&isear=1.37&mnli=11.74&mrpc=16.63&multirc=6.24&poem_sentiment=14.62&qnli=3.41&qqp=6.18&rotten_tomatoes=2.98&rte=24.26&sst2=0.67&sst_5bins=5.44&stsb=20.68&trec_coarse=3.95&trec_fine=10.73&tweet_ev_emoji=13.39&tweet_ev_emotion=4.62&tweet_ev_hate=3.46&tweet_ev_irony=9.04&tweet_ev_offensive=1.69&tweet_ev_sentiment=0.75&wic=14.22&wnli=9.44&wsc=5.53&yahoo_answers=4.14&model_name=google%2Fflan-t5-base&base_name=google%2Ft5-v1_1-base) using google/flan-t5-base as a base model yields average score of 77.98 in comparison to 68.82 by google/t5-v1_1-base.\n\nThe model is ranked 1st among all tested models for the google/t5-v1_1-base architecture as of 06/02/2023\nResults:\n\n| 20_newsgroup | ag_news | amazon_reviews_multi | anli | boolq | cb | cola | copa | dbpedia | esnli | financial_phrasebank | imdb | isear | mnli | mrpc | multirc | poem_sentiment | qnli | qqp | rotten_tomatoes | rte | sst2 | sst_5bins | stsb | trec_coarse | trec_fine | tweet_ev_emoji | tweet_ev_emotion | tweet_ev_hate | tweet_ev_irony | tweet_ev_offensive | tweet_ev_sentiment | wic | wnli | wsc | yahoo_answers |\n|"} {"downloads": 143966, "id": "google/flan-t5-large", "likes": 116, "pipeline_tag": "text2text-generation", "task": "text2text-generation", "meta": {"language": ["en", "fr", "ro", "de", "multilingual"], "widget": [{"text": "Translate to German: My name is Arthur", "example_title": "Translation"}, {"text": "Please answer to the following question. Who is going to be the next Ballon d'or?", "example_title": "Question Answering"}, {"text": "Q: Can Geoffrey Hinton have a conversation with George Washington? Give the rationale before answering.", "example_title": "Logical reasoning"}, {"text": "Please answer the following question. What is the boiling point of Nitrogen?", "example_title": "Scientific knowledge"}, {"text": "Answer the following yes/no question. Can you write a whole Haiku in a single tweet?", "example_title": "Yes/no question"}, {"text": "Answer the following yes/no question by reasoning step-by-step. Can you write a whole Haiku in a single tweet?", "example_title": "Reasoning task"}, {"text": "Q: ( False or not False or False ) is? A: Let's think step by step", "example_title": "Boolean Expressions"}, {"text": "The square root of x is the cube root of y. What is y to the power of 2, if x = 4?", "example_title": "Math reasoning"}, {"text": "Premise: At my age you will probably have learnt one lesson. Hypothesis: It's not certain how many lessons you'll learn by your thirties. Does the premise entail the hypothesis?", "example_title": "Premise and hypothesis"}], "tags": ["text2text-generation"], "datasets": ["svakulenk0/qrecc", "taskmaster2", "djaym7/wiki_dialog", "deepmind/code_contests", "lambada", "gsm8k", "aqua_rat", "esnli", "quasc", "qed"], "license": "apache-2.0"}, "description": "\n\n# Model Card for FLAN-T5 large\n\n![model image](https://s3.amazonaws.com/moonup/production/uploads/1666363435475-62441d1d9fdefb55a0b7d12c.png)\n\n# Table of Contents\n\n0. [TL;DR](#TL;DR)\n1. [Model Details](#model-details)\n2. [Usage](#usage)\n3. [Uses](#uses)\n4. [Bias, Risks, and Limitations](#bias-risks-and-limitations)\n5. [Training Details](#training-details)\n6. [Evaluation](#evaluation)\n7. [Environmental Impact](#environmental-impact)\n8. [Citation](#citation)\n9. [Model Card Authors](#model-card-authors)\n\n# TL;DR\n\nIf you already know T5, FLAN-T5 is just better at everything. For the same number of parameters, these models have been fine-tuned on more than 1000 additional tasks covering also more languages. \nAs mentioned in the first few lines of the abstract : \n> Flan-PaLM 540B achieves state-of-the-art performance on several benchmarks, such as 75.2% on five-shot MMLU. We also publicly release Flan-T5 checkpoints,1 which achieve strong few-shot performance even compared to much larger models, such as PaLM 62B. Overall, instruction finetuning is a general method for improving the performance and usability of pretrained language models.\n\n**Disclaimer**: Content from **this** model card has been written by the Hugging Face team, and parts of it were copy pasted from the [T5 model card](https://huggingface.co/t5-large).\n\n# Model Details\n\n## Model Description\n\n\n- **Model type:** Language model\n- **Language(s) (NLP):** English, Spanish, Japanese, Persian, Hindi, French, Chinese, Bengali, Gujarati, German, Telugu, Italian, Arabic, Polish, Tamil, Marathi, Malayalam, Oriya, Panjabi, Portuguese, Urdu, Galician, Hebrew, Korean, Catalan, Thai, Dutch, Indonesian, Vietnamese, Bulgarian, Filipino, Central Khmer, Lao, Turkish, Russian, Croatian, Swedish, Yoruba, Kurdish, Burmese, Malay, Czech, Finnish, Somali, Tagalog, Swahili, Sinhala, Kannada, Zhuang, Igbo, Xhosa, Romanian, Haitian, Estonian, Slovak, Lithuanian, Greek, Nepali, Assamese, Norwegian\n- **License:** Apache 2.0\n- **Related Models:** [All FLAN-T5 Checkpoints](https://huggingface.co/models?search=flan-t5)\n- **Original Checkpoints:** [All Original FLAN-T5 Checkpoints](https://github.com/google-research/t5x/blob/main/docs/models.md#flan-t5-checkpoints)\n- **Resources for more information:**\n - [Research paper](https://arxiv.org/pdf/2210.11416.pdf)\n - [GitHub Repo](https://github.com/google-research/t5x)\n - [Hugging Face FLAN-T5 Docs (Similar to T5) ](https://huggingface.co/docs/transformers/model_doc/t5)\n\n# Usage\n\nFind below some example scripts on how to use the model in `transformers`:\n\n## Using the Pytorch model\n\n### Running the model on a CPU\n\n
\n Click to expand \n\n```python\n\nfrom transformers import T5Tokenizer, T5ForConditionalGeneration\n\ntokenizer = T5Tokenizer.from_pretrained(\"google/flan-t5-large\")\nmodel = T5ForConditionalGeneration.from_pretrained(\"google/flan-t5-large\")\n\ninput_text = \"translate English to German: How old are you?\"\ninput_ids = tokenizer(input_text, return_tensors=\"pt\").input_ids\n\noutputs = model.generate(input_ids)\nprint(tokenizer.decode(outputs[0]))\n```\n\n
\n\n### Running the model on a GPU\n\n
\n Click to expand \n\n```python\n# pip install accelerate\nfrom transformers import T5Tokenizer, T5ForConditionalGeneration\n\ntokenizer = T5Tokenizer.from_pretrained(\"google/flan-t5-large\")\nmodel = T5ForConditionalGeneration.from_pretrained(\"google/flan-t5-large\", device_map=\"auto\")\n\ninput_text = \"translate English to German: How old are you?\"\ninput_ids = tokenizer(input_text, return_tensors=\"pt\").input_ids.to(\"cuda\")\n\noutputs = model.generate(input_ids)\nprint(tokenizer.decode(outputs[0]))\n```\n\n
\n\n### Running the model on a GPU using different precisions\n\n#### FP16\n\n
\n Click to expand \n\n```python\n# pip install accelerate\nimport torch\nfrom transformers import T5Tokenizer, T5ForConditionalGeneration\n\ntokenizer = T5Tokenizer.from_pretrained(\"google/flan-t5-large\")\nmodel = T5ForConditionalGeneration.from_pretrained(\"google/flan-t5-large\", device_map=\"auto\", torch_dtype=torch.float16)\n\ninput_text = \"translate English to German: How old are you?\"\ninput_ids = tokenizer(input_text, return_tensors=\"pt\").input_ids.to(\"cuda\")\n\noutputs = model.generate(input_ids)\nprint(tokenizer.decode(outputs[0]))\n```\n\n
\n\n#### INT8\n\n
\n Click to expand \n\n```python\n# pip install bitsandbytes accelerate\nfrom transformers import T5Tokenizer, T5ForConditionalGeneration\n\ntokenizer = T5Tokenizer.from_pretrained(\"google/flan-t5-large\")\nmodel = T5ForConditionalGeneration.from_pretrained(\"google/flan-t5-large\", device_map=\"auto\", load_in_8bit=True)\n\ninput_text = \"translate English to German: How old are you?\"\ninput_ids = tokenizer(input_text, return_tensors=\"pt\").input_ids.to(\"cuda\")\n\noutputs = model.generate(input_ids)\nprint(tokenizer.decode(outputs[0]))\n```\n\n
\n\n# Uses\n\n## Direct Use and Downstream Use\n\nThe authors write in [the original paper's model card](https://arxiv.org/pdf/2210.11416.pdf) that: \n\n> The primary use is research on language models, including: research on zero-shot NLP tasks and in-context few-shot learning NLP tasks, such as reasoning, and question answering; advancing fairness and safety research, and understanding limitations of current large language models\n\nSee the [research paper](https://arxiv.org/pdf/2210.11416.pdf) for further details.\n\n## Out-of-Scope Use\n\nMore information needed.\n\n# Bias, Risks, and Limitations\n\nThe information below in this section are copied from the model's [official model card](https://arxiv.org/pdf/2210.11416.pdf):\n\n> Language models, including Flan-T5, can potentially be used for language generation in a harmful way, according to Rae et al. (2021). Flan-T5 should not be used directly in any application, without a prior assessment of safety and fairness concerns specific to the application.\n\n## Ethical considerations and risks\n\n> Flan-T5 is fine-tuned on a large corpus of text data that was not filtered for explicit content or assessed for existing biases. As a result the model itself is potentially vulnerable to generating equivalently inappropriate content or replicating inherent biases in the underlying data.\n\n## Known Limitations\n\n> Flan-T5 has not been tested in real world applications.\n\n## Sensitive Use:\n\n> Flan-T5 should not be applied for any unacceptable use cases, e.g., generation of abusive speech.\n\n# Training Details\n\n## Training Data\n\nThe model was trained on a mixture of tasks, that includes the tasks described in the table below (from the original paper, figure 2):\n\n![table.png](https://s3.amazonaws.com/moonup/production/uploads/1666363265279-62441d1d9fdefb55a0b7d12c.png)\n\n\n## Training Procedure\n\nAccording to the model card from the [original paper](https://arxiv.org/pdf/2210.11416.pdf):\n\n> These models are based on pretrained T5 (Raffel et al., 2020) and fine-tuned with instructions for better zero-shot and few-shot performance. There is one fine-tuned Flan model per T5 model size.\n\nThe model has been trained on TPU v3 or TPU v4 pods, using [`t5x`](https://github.com/google-research/t5x) codebase together with [`jax`](https://github.com/google/jax).\n\n\n# Evaluation\n\n## Testing Data, Factors & Metrics\n\nThe authors evaluated the model on various tasks covering several languages (1836 in total). See the table below for some quantitative evaluation:\n![image.png](https://s3.amazonaws.com/moonup/production/uploads/1668072995230-62441d1d9fdefb55a0b7d12c.png)\nFor full details, please check the [research paper](https://arxiv.org/pdf/2210.11416.pdf).\n\n## Results \n\nFor full results for FLAN-T5-Large, see the [research paper](https://arxiv.org/pdf/2210.11416.pdf), Table 3.\n\n# Environmental Impact\n\nCarbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).\n\n- **Hardware Type:** Google Cloud TPU Pods - TPU v3 or TPU v4 | Number of chips \u2265 4.\n- **Hours used:** More information needed\n- **Cloud Provider:** GCP\n- **Compute Region:** More information needed\n- **Carbon Emitted:** More information needed\n\n# Citation\n\n**BibTeX:**\n\n```bibtex\n@misc{https://doi.org/10.48550/arxiv.2210.11416,\n doi = {10.48550/ARXIV.2210.11416},\n \n url = {https://arxiv.org/abs/2210.11416},\n \n author = {Chung, Hyung Won and Hou, Le and Longpre, Shayne and Zoph, Barret and Tay, Yi and Fedus, William and Li, Eric and Wang, Xuezhi and Dehghani, Mostafa and Brahma, Siddhartha and Webson, Albert and Gu, Shixiang Shane and Dai, Zhuyun and Suzgun, Mirac and Chen, Xinyun and Chowdhery, Aakanksha and Narang, Sharan and Mishra, Gaurav and Yu, Adams and Zhao, Vincent and Huang, Yanping and Dai, Andrew and Yu, Hongkun and Petrov, Slav and Chi, Ed H. and Dean, Jeff and Devlin, Jacob and Roberts, Adam and Zhou, Denny and Le, Quoc V. and Wei, Jason},\n \n keywords = {Machine Learning (cs.LG), Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},\n \n title = {Scaling Instruction-Finetuned Language Models},\n \n publisher = {arXiv},\n \n year = {2022},\n \n copyright = {Creative Commons Attribution 4.0 International}\n}\n```"} {"downloads": 224168, "id": "tuner007/pegasus_paraphrase", "likes": 115, "pipeline_tag": "text2text-generation", "task": "text2text-generation", "meta": {"language": "en", "license": "apache-2.0", "tags": ["pegasus", "paraphrasing", "seq2seq"]}, "description": "\n\n## Model description\n[PEGASUS](https://github.com/google-research/pegasus) fine-tuned for paraphrasing\n\n## Model in Action \ud83d\ude80\n```\nimport torch\nfrom transformers import PegasusForConditionalGeneration, PegasusTokenizer\nmodel_name = 'tuner007/pegasus_paraphrase'\ntorch_device = 'cuda' if torch.cuda.is_available() else 'cpu'\ntokenizer = PegasusTokenizer.from_pretrained(model_name)\nmodel = PegasusForConditionalGeneration.from_pretrained(model_name).to(torch_device)\n\ndef get_response(input_text,num_return_sequences,num_beams):\n batch = tokenizer([input_text],truncation=True,padding='longest',max_length=60, return_tensors=\"pt\").to(torch_device)\n translated = model.generate(**batch,max_length=60,num_beams=num_beams, num_return_sequences=num_return_sequences, temperature=1.5)\n tgt_text = tokenizer.batch_decode(translated, skip_special_tokens=True)\n return tgt_text\n```\n#### Example: \n```\nnum_beams = 10\nnum_return_sequences = 10\ncontext = \"The ultimate test of your knowledge is your capacity to convey it to another.\"\nget_response(context,num_return_sequences,num_beams)\n# output:\n['The test of your knowledge is your ability to convey it.',\n 'The ability to convey your knowledge is the ultimate test of your knowledge.',\n 'The ability to convey your knowledge is the most important test of your knowledge.',\n 'Your capacity to convey your knowledge is the ultimate test of it.',\n 'The test of your knowledge is your ability to communicate it.',\n 'Your capacity to convey your knowledge is the ultimate test of your knowledge.',\n 'Your capacity to convey your knowledge to another is the ultimate test of your knowledge.',\n 'Your capacity to convey your knowledge is the most important test of your knowledge.',\n 'The test of your knowledge is how well you can convey it.',\n 'Your capacity to convey your knowledge is the ultimate test.']\n```\n\n> Created by [Arpit Rajauria](https://twitter.com/arpit_rajauria)\n[![Twitter icon](https://cdn0.iconfinder.com/data/icons/shift-logotypes/32/Twitter-32.png)](https://twitter.com/arpit_rajauria)\n"} {"downloads": 1227, "id": "google/ul2", "likes": 113, "pipeline_tag": "text2text-generation", "task": "text2text-generation", "meta": {"language": ["en"], "datasets": ["c4"], "license": "apache-2.0"}, "description": "\n\n# Introduction\n\nUL2 is a unified framework for pretraining models that are universally effective across datasets and setups. UL2 uses Mixture-of-Denoisers (MoD), apre-training objective that combines diverse pre-training paradigms together. UL2 introduces a notion of mode switching, wherein downstream fine-tuning is associated with specific pre-training schemes.\n\n![model image](https://raw.githubusercontent.com/google-research/google-research/master/ul2/figs/ul2.png)\n\n**Abstract**\n\nExisting pre-trained models are generally geared towards a particular class of problems. To date, there seems to be still no consensus on what the right architecture and pre-training setup should be. This paper presents a unified framework for pre-training models that are universally effective across datasets and setups. We begin by disentangling architectural archetypes with pre-training objectives -- two concepts that are commonly conflated. Next, we present a generalized and unified perspective for self-supervision in NLP and show how different pre-training objectives can be cast as one another and how interpolating between different objectives can be effective. We then propose Mixture-of-Denoisers (MoD), a pre-training objective that combines diverse pre-training paradigms together. We furthermore introduce a notion of mode switching, wherein downstream fine-tuning is associated with specific pre-training schemes. We conduct extensive ablative experiments to compare multiple pre-training objectives and find that our method pushes the Pareto-frontier by outperforming T5 and/or GPT-like models across multiple diverse setups. Finally, by scaling our model up to 20B parameters, we achieve SOTA performance on 50 well-established supervised NLP tasks ranging from language generation (with automated and human evaluation), language understanding, text classification, question answering, commonsense reasoning, long text reasoning, structured knowledge grounding and information retrieval. Our model also achieve strong results at in-context learning, outperforming 175B GPT-3 on zero-shot SuperGLUE and tripling the performance of T5-XXL on one-shot summarization. \n\nFor more information, please take a look at the original paper.\n\nPaper: [Unifying Language Learning Paradigms](https://arxiv.org/abs/2205.05131v1)\n\nAuthors: *Yi Tay, Mostafa Dehghani, Vinh Q. Tran, Xavier Garcia, Dara Bahri, Tal Schuster, Huaixiu Steven Zheng, Neil Houlsby, Donald Metzler* \n\n# Training\n\nThe checkpoint was iteratively pre-trained on C4 and fine-tuned on a variety of datasets\n\n## PreTraining\n\nThe model is pretrained on the C4 corpus. For pretraining, the model is trained on a total of 1 trillion tokens on C4 (2 million steps)\nwith a batch size of 1024. The sequence length is set to 512/512 for inputs and targets. \nDropout is set to 0 during pretraining. Pre-training took slightly more than one month for about 1 trillion\ntokens. The model has 32 encoder layers and 32 decoder layers, `dmodel` of 4096 and `df` of 16384. \nThe dimension of each head is 256 for a total of 16 heads. Our model uses a model parallelism of 8. \nThe same same sentencepiece tokenizer as T5 of vocab size 32000 is used (click [here](https://huggingface.co/docs/transformers/v4.20.0/en/model_doc/t5#transformers.T5Tokenizer) for more information about the T5 tokenizer).\n\nUL-20B can be interpreted as a model that is quite similar to T5 but trained with a different objective and slightly different scaling knobs. \nUL-20B was trained using the [Jax](https://github.com/google/jax) and [T5X](https://github.com/google-research/t5x) infrastructure.\n\nThe training objective during pretraining is a mixture of different denoising strategies that are explained in the following:\n\n## Mixture of Denoisers\n\nTo quote the paper:\n> We conjecture that a strong universal model has to be exposed to solving diverse set of problems\n> during pre-training. Given that pre-training is done using self-supervision, we argue that such diversity\n> should be injected to the objective of the model, otherwise the model might suffer from lack a certain\n> ability, like long-coherent text generation.\n> Motivated by this, as well as current class of objective functions, we define three main paradigms that\n> are used during pre-training:\n\n- **R-Denoiser**: The regular denoising is the standard span corruption introduced in [T5](https://huggingface.co/docs/transformers/v4.20.0/en/model_doc/t5)\n that uses a range of 2 to 5 tokens as the span length, which masks about 15% of\ninput tokens. These spans are short and potentially useful to acquire knowledge instead of\nlearning to generate fluent text.\n\n- **S-Denoiser**: A specific case of denoising where we observe a strict sequential order when\nframing the inputs-to-targets task, i.e., prefix language modeling. To do so, we simply\npartition the input sequence into two sub-sequences of tokens as context and target such that\nthe targets do not rely on future information. This is unlike standard span corruption where\nthere could be a target token with earlier position than a context token. Note that similar to\nthe Prefix-LM setup, the context (prefix) retains a bidirectional receptive field. We note that\nS-Denoising with very short memory or no memory is in similar spirit to standard causal\nlanguage modeling.\n\n- **X-Denoiser**: An extreme version of denoising where the model must recover a large part\nof the input, given a small to moderate part of it. This simulates a situation where a model\nneeds to generate long target from a memory with relatively limited information. To do\nso, we opt to include examples with aggressive denoising where approximately 50% of the\ninput sequence is masked. This is by increasing the span length and/or corruption rate. We\nconsider a pre-training task to be extreme if it has a long span (e.g., \u2265 12 tokens) or have\na large corruption rate (e.g., \u2265 30%). X-denoising is motivated by being an interpolation\nbetween regular span corruption and language model like objectives.\n\nSee the following diagram for a more visual explanation:\n\n![mixture-of-denoisers](https://raw.githubusercontent.com/google-research/google-research/master/ul2/figs/mod.png)\n\n**Important**: For more details, please see sections 3.1.2 of the [paper](https://arxiv.org/pdf/2205.05131v1.pdf).\n\n## Fine-tuning\n\nThe model was continously fine-tuned after N pretraining steps where N is typically from 50k to 100k.\nIn other words, after each Nk steps of pretraining, the model is finetuned on each downstream task. See section 5.2.2 of [paper](https://arxiv.org/pdf/2205.05131v1.pdf) to get an overview of all datasets that were used for fine-tuning).\n\nAs the model is continuously finetuned, finetuning is stopped on a task once it has reached state-of-the-art to save compute.\nIn total, the model was trained for 2.65 million steps.\n\n**Important**: For more details, please see sections 5.2.1 and 5.2.2 of the [paper](https://arxiv.org/pdf/2205.05131v1.pdf).\n\n## Contribution\n\nThis model was contributed by [Daniel Hesslow](https://huggingface.co/Seledorn).\n\n## Examples\n\nThe following shows how one can predict masked passages using the different denoising strategies.\nGiven the size of the model the following examples need to be run on at least a 40GB A100 GPU.\n\n### S-Denoising\n\nFor *S-Denoising*, please make sure to prompt the text with the prefix `[S2S]` as shown below.\n\n```python\nfrom transformers import T5ForConditionalGeneration, AutoTokenizer\nimport torch\n\nmodel = T5ForConditionalGeneration.from_pretrained(\"google/ul2\", low_cpu_mem_usage=True, torch_dtype=torch.bfloat16).to(\"cuda\") \ntokenizer = AutoTokenizer.from_pretrained(\"google/ul2\")\n\ninput_string = \"[S2S] Mr. Dursley was the director of a firm called Grunnings, which made drills. He was a big, solid man with a bald head. Mrs. Dursley was thin and blonde and more than the usual amount of neck, which came in very useful as she spent so much of her time craning over garden fences, spying on the neighbours. The Dursleys had a small son called Dudley and in their opinion there was no finer boy anywhere \" \n\ninputs = tokenizer(input_string, return_tensors=\"pt\").input_ids.to(\"cuda\")\n\noutputs = model.generate(inputs, max_length=200)\n\nprint(tokenizer.decode(outputs[0]))\n# -> . Dudley was a very good boy, but he was also very stupid.\n```\n\n### R-Denoising\n\nFor *R-Denoising*, please make sure to prompt the text with the prefix `[NLU]` as shown below.\n\n```python\nfrom transformers import T5ForConditionalGeneration, AutoTokenizer\nimport torch\n\nmodel = T5ForConditionalGeneration.from_pretrained(\"google/ul2\", low_cpu_mem_usage=True, torch_dtype=torch.bfloat16).to(\"cuda\") \ntokenizer = AutoTokenizer.from_pretrained(\"google/ul2\")\n\ninput_string = \"[NLU] Mr. Dursley was the director of a firm called , which made . He was a big, solid man with a bald head. Mrs. Dursley was thin and of neck, which came in very useful as she spent so much of her time . The Dursleys had a small son called Dudley and \" \n\ninputs = tokenizer(input_string, return_tensors=\"pt\", add_special_tokens=False).input_ids.to(\"cuda\")\n\noutputs = model.generate(inputs, max_length=200)\n\nprint(tokenizer.decode(outputs[0]))\n# -> \" Burrows brooms for witches and wizards had a lot scolding Dudley a daughter called Petunia. Dudley was a nasty, spoiled little boy who was always getting into trouble. He was very fond of his pet rat, Scabbers. Burrows screaming at him a daughter called Petunia\n\"\n```\n\n### X-Denoising\n\nFor *X-Denoising*, please make sure to prompt the text with the prefix `[NLG]` as shown below.\n\n```python\nfrom transformers import T5ForConditionalGeneration, AutoTokenizer\nimport torch\n\nmodel = T5ForConditionalGeneration.from_pretrained(\"google/ul2\", low_cpu_mem_usage=True, torch_dtype=torch.bfloat16).to(\"cuda\") \ntokenizer = AutoTokenizer.from_pretrained(\"google/ul2\")\n\ninput_string = \"[NLG] Mr. Dursley was the director of a firm called Grunnings, which made drills. He was a big, solid man wiht a bald head. Mrs. Dursley was thin and blonde and more than the usual amount of neck, which came in very useful as she\nspent so much of her time craning over garden fences, spying on the neighbours. The Dursleys had a small son called Dudley and in their opinion there was no finer boy anywhere. \" \n\nmodel.cuda()\ninputs = tokenizer(input_string, return_tensors=\"pt\", add_special_tokens=False).input_ids.to(\"cuda\")\n\noutputs = model.generate(inputs, max_length=200)\n\nprint(tokenizer.decode(outputs[0]))\n# -> \" Burrows a lot of money from the manufacture of a product called '' Burrows'''s '' had a lot looking down people's throats a daughter called Petunia. Dudley was a very stupid boy who was always getting into trouble. He was a big, fat, ugly boy who was always getting into trouble. He was a big, fat, ugly boy who was always getting into trouble. He was a big, fat, ugly boy who was always getting into trouble. He was a big, fat, ugly boy who was always getting into trouble. He was a big, fat, ugly boy who was always getting into trouble. He was a big, fat, ugly boy who was always getting into trouble. He was a big, fat, ugly boy who was always getting into trouble. He was a big, fat,\"\n```"} {"downloads": 15250, "id": "ClueAI/ChatYuan-large-v1", "likes": 98, "pipeline_tag": "text2text-generation", "task": "text2text-generation", "meta": {"inference": {"parameters": {"max_length": 250, "temperature": 0.7, "top_p": 1}}, "license": "creativeml-openrail-m", "widget": [{"text": "\u7528\u6237\uff1a\u5e2e\u6211\u5199\u4e2a\u8bf7\u5047\u6761\uff0c\u6211\u56e0\u4e3a\u65b0\u51a0\u4e0d\u8212\u670d\uff0c\u9700\u8981\u8bf7\u50473\u5929\uff0c\u8bf7\u9886\u5bfc\u6279\u51c6\\n\u5c0f\u5143\uff1a"}, {"text": "\u7528\u6237\uff1a\u65b0\u51a0\u4ec0\u4e48\u75c7\u72b6\uff1f\\n\u5c0f\u5143\uff1a\u65b0\u51a0\u662f\u6307\u65b0\u578b\u51a0\u72b6\u75c5\u6bd2\uff0c\u5176\u75c7\u72b6\u5305\u62ec\u53d1\u70ed\u3001\u5e72\u54b3\u3001\u4e4f\u529b\u3001\u55c5\u5473\u89c9\u51cf\u9000\u3001\u547c\u5438\u56f0\u96be\u7b49\u3002\\n\u7528\u6237\uff1a\u53ef\u4ee5\u5403\u4ec0\u4e48\u836f\uff1f\\n\u5c0f\u5143\uff1a\u6839\u636e\u60a8\u63d0\u4f9b\u7684\u75c5\u53f2\uff0c\u76ee\u524d\u6ca1\u6709\u660e\u786e\u7684\u6297\u65b0\u51a0\u75c5\u6bd2\u7684\u836f\u7269\uff0c\u5efa\u8bae\u60a8\u5728\u5bb6\u8fdb\u884c\u81ea\u6211\u9694\u79bb\uff0c\u907f\u514d\u4e0e\u4ed6\u4eba\u63a5\u89e6\uff0c\u591a\u559d\u5f00\u6c34\uff0c\u6e05\u6de1\u6613\u6d88\u5316\u996e\u98df\uff0c\u907f\u514d\u71ac\u591c\u548c\u8fc7\u5ea6\u52b3\u7d2f\uff0c\u9002\u5f53\u8fdb\u884c\u6237\u5916\u6d3b\u52a8\u3002\\n\u7528\u6237\uff1a\u7528\u4ec0\u4e48\u540e\u9057\u75c7\u4e48\uff1f\\n\u5c0f\u5143\uff1a"}]}, "description": "\n\n\n\n\nChatYuan: \u5143\u8bed\u529f\u80fd\u578b\u5bf9\u8bdd\u5927\u6a21\u578b\n\n\u8fd9\u4e2a\u6a21\u578b\u53ef\u4ee5\u7528\u4e8e\u95ee\u7b54\u3001\u7ed3\u5408\u4e0a\u4e0b\u6587\u505a\u5bf9\u8bdd\u3001\u505a\u5404\u79cd\u751f\u6210\u4efb\u52a1\uff0c\u5305\u62ec\u521b\u610f\u6027\u5199\u4f5c\uff0c\u4e5f\u80fd\u56de\u7b54\u4e00\u4e9b\u50cf\u6cd5\u5f8b\u3001\u65b0\u51a0\u7b49\u9886\u57df\u95ee\u9898\u3002\u5b83\u57fa\u4e8ePromptCLUE-large\u7ed3\u5408\u6570\u4ebf\u6761\u529f\u80fd\u5bf9\u8bdd\u591a\u8f6e\u5bf9\u8bdd\u6570\u636e\u8fdb\u4e00\u6b65\u8bad\u7ec3\u5f97\u5230\u3002\n\nPromptCLUE-large:\u57281000\u4ebftoken\u4e2d\u6587\u8bed\u6599\u4e0a\u9884\u8bad\u7ec3\uff0c\u7d2f\u8ba1\u5b66\u4e601.5\u4e07\u4ebf\u4e2d\u6587token\uff0c\u5e76\u4e14\u5728\u6570\u767e\u79cd\u4efb\u52a1\u4e0a\u8fdb\u884cPrompt\u4efb\u52a1\u5f0f\u8bad\u7ec3\u3002\u9488\u5bf9\u7406\u89e3\u7c7b\u4efb\u52a1\uff0c\u5982\u5206\u7c7b\u3001\u60c5\u611f\u5206\u6790\u3001\u62bd\u53d6\u7b49\uff0c\u53ef\u4ee5\u81ea\u5b9a\u4e49\u6807\u7b7e\u4f53\u7cfb\uff1b\u9488\u5bf9\u591a\u79cd\u751f\u6210\u4efb\u52a1\uff0c\u53ef\u4ee5\u8fdb\u884c\u91c7\u6837\u81ea\u7531\u751f\u6210\u3002 \n\n\u5728\u7ebfDemo(\u5fae\u4fe1\u641c\u7d22\u5c0f\u7a0b\u5e8f\u201c\u5143\u8bed\u667a\u80fd\u201d)   | \n \u4f7f\u7528API(large\u7248)   | \n   Github\u9879\u76ee\u5730\u5740  |\n  Colab\u5728\u7ebf\u8bd5\u7528 \n  \u6587\u7ae0\u4ecb\u7ecd \n\n \u5fae\u4fe1\u626b\u7801\u5728\u7ebf\u4f53\u9a8c\uff1a\n \n \n\n\n\u52a0\u8f7d\u6a21\u578b\uff1a\n \n ```python\n# \u52a0\u8f7d\u6a21\u578b\nfrom transformers import T5Tokenizer, T5ForConditionalGeneration\ntokenizer = T5Tokenizer.from_pretrained(\"ClueAI/ChatYuan-large-v1\")\nmodel = T5ForConditionalGeneration.from_pretrained(\"ClueAI/ChatYuan-large-v1\")\n ```\n\n\u4f7f\u7528\u6a21\u578b\u8fdb\u884c\u9884\u6d4b\u63a8\u7406\u65b9\u6cd5\uff1a\n```python\n# \u4f7f\u7528\nimport torch\nfrom transformers import AutoTokenizer\n# \u4fee\u6539colab\u7b14\u8bb0\u672c\u8bbe\u7f6e\u4e3agpu\uff0c\u63a8\u7406\u66f4\u5feb\ndevice = torch.device('cuda')\nmodel.to(device)\ndef preprocess(text):\n text = text.replace(\"\\n\", \"\\\\n\").replace(\"\\t\", \"\\\\t\")\n return text\n\ndef postprocess(text):\n return text.replace(\"\\\\n\", \"\\n\").replace(\"\\\\t\", \"\\t\")\n\ndef answer(text, sample=True, top_p=1, temperature=0.7):\n '''sample\uff1a\u662f\u5426\u62bd\u6837\u3002\u751f\u6210\u4efb\u52a1\uff0c\u53ef\u4ee5\u8bbe\u7f6e\u4e3aTrue;\n top_p\uff1a0-1\u4e4b\u95f4\uff0c\u751f\u6210\u7684\u5185\u5bb9\u8d8a\u591a\u6837'''\n text = preprocess(text)\n encoding = tokenizer(text=[text], truncation=True, padding=True, max_length=768, return_tensors=\"pt\").to(device) \n if not sample:\n out = model.generate(**encoding, return_dict_in_generate=True, output_scores=False, max_new_tokens=512, num_beams=1, length_penalty=0.6)\n else:\n out = model.generate(**encoding, return_dict_in_generate=True, output_scores=False, max_new_tokens=512, do_sample=True, top_p=top_p, temperature=temperature, no_repeat_ngram_size=3)\n out_text = tokenizer.batch_decode(out[\"sequences\"], skip_special_tokens=True)\n return postprocess(out_text[0])\nprint(\"end...\")\n```\n\n# \u95ee\u7b54\u3001\u5199\u4f5c\u4e0e\u529f\u80fd\u578b\u52a9\u624b\n```python\ninput_text0 = \"\u5e2e\u6211\u5199\u4e00\u4e2a\u8bf7\u5047\u6761\uff0c\u6211\u56e0\u4e3a\u65b0\u51a0\u4e0d\u8212\u670d\uff0c\u9700\u8981\u8bf7\u50473\u5929\uff0c\u8bf7\u9886\u5bfc\u6279\u51c6\"\ninput_text1 = \"\u4f60\u80fd\u5e72\u4ec0\u4e48\"\ninput_text2 = \"\u7528\u82f1\u6587\u5199\u4e00\u5c01\u9053\u6b49\u7684\u90ae\u4ef6\uff0c\u8868\u8fbe\u56e0\u4e3a\u7269\u6d41\u5ef6\u8bef\uff0c\u4e0d\u80fd\u5982\u671f\u5230\u8fbe\uff0c\u6211\u4eec\u53ef\u4ee5\u8d54\u507f\u8d35\u516c\u53f8\u6240\u6709\u635f\u5931\"\ninput_text3 = \"\u5199\u4e00\u4e2a\u6587\u7ae0\uff0c\u9898\u76ee\u662f\u672a\u6765\u57ce\u5e02\"\ninput_text4 = \"\u5199\u4e00\u4e2a\u8bd7\u6b4c\uff0c\u5173\u4e8e\u51ac\u5929\"\ninput_text5 = \"\u4ece\u5357\u4eac\u5230\u4e0a\u6d77\u7684\u8def\u7ebf\"\ninput_text6 = \"\u5b66\u524d\u6559\u80b2\u4e13\u4e1a\u5c97\u4f4d\u5b9e\u4e60\u4e2d\uff0c\u5728\u5b66\u751f\u65b9\u9762\u4f1a\u5b58\u5728\u95ee\u9898\uff0c\u8bf7\u63d0\u51fa\u6539\u8fdb\u63aa\u65bd\u3002800\u5b57\"\ninput_text7 = \"\u6839\u636e\u6807\u9898\u751f\u6210\u6587\u7ae0\uff1a\u6807\u9898\uff1a\u5c48\u81e3\u6c0f\u91cc\u7684\u5316\u5986\u54c1\u5230\u5e95\u600e\u4e48\u6837\uff1f\u6b63\u6587\uff1a\u5316\u5986\u54c1\uff0c\u8981\u8bb2\u7a76\u79d1\u5b66\u8fd0\u7528\uff0c\u5408\u7406\u642d\u914d\u3002\u5c48\u81e3\u6c0f\u8d77\u7801\u662f\u6b63\u54c1\u8fde\u9501\u5e97\u3002\u8bf7\u7ee7\u7eed\u540e\u9762\u7684\u6587\u5b57\u3002\"\ninput_text8 = \"\u5e2e\u6211\u5bf9\u6bd4\u51e0\u6b3eGPU\uff0c\u5217\u51fa\u8be6\u7ec6\u53c2\u6570\u5bf9\u6bd4\uff0c\u5e76\u4e14\u7ed9\u51fa\u6700\u7ec8\u7ed3\u8bba\"\ninput_list = [input_text0, input_text1, input_text2, input_text3, input_text4, input_text5, input_text6, input_text7, input_text8]\nfor i, input_text in enumerate(input_list):\n input_text = \"\u7528\u6237\uff1a\" + input_text + \"\\n\u5c0f\u5143\uff1a\"\n print(f\"\u793a\u4f8b{i}\".center(50, \"=\"))\n output_text = answer(input_text)\n print(f\"{input_text}{output_text}\")\n``` \n\n=======================\u793a\u4f8b0======================== \n\u7528\u6237\uff1a\u5e2e\u6211\u5199\u4e00\u4e2a\u8bf7\u5047\u6761\uff0c\u6211\u56e0\u4e3a\u65b0\u51a0\u4e0d\u8212\u670d\uff0c\u9700\u8981\u8bf7\u50473\u5929\uff0c\u8bf7\u9886\u5bfc\u6279\u51c6 \n\u5c0f\u5143\uff1a\u5c0a\u656c\u7684\u9886\u5bfc\uff1a \n\n\u60a8\u597d! \n\n\u56e0\u4e3a\u6700\u8fd1\u65b0\u51a0\u75c5\u6bd2\u7684\u6d41\u884c\uff0c\u6211\u611f\u5230\u8eab\u4f53\u4e0d\u8212\u670d\uff0c\u6240\u4ee5\u8bf7\u6c42\u8bf7\u50473\u5929\uff0c\u5728\u8fd9\u6bb5\u65f6\u95f4\u5185\uff0c\u6211\u5c06\u79ef\u6781\u914d\u5408\u533b\u751f\u7684\u6cbb\u7597\uff0c\u4fdd\u8bc1\u5145\u8db3\u7684\u7761\u7720\uff0c\u5e76\u505a\u597d\u9632\u62a4\u5de5\u4f5c\u3002 \n \n\u5728\u6b64\u671f\u95f4\uff0c\u6211\u4fdd\u8bc1\u5c06\u52aa\u529b\u5b8c\u6210\u6211\u7684\u5de5\u4f5c\uff0c\u5e76\u6309\u65f6\u4e0b\u73ed\uff0c\u5728\u8bf7\u5047\u671f\u95f4\uff0c\u6211\u7edd\u5bf9\u9075\u5b88\u516c\u53f8\u7684\u89c4\u7ae0\u5236\u5ea6\uff0c\u63a5\u53d7\u516c\u53f8\u7684\u76d1\u7763\u548c\u68c0\u67e5\uff0c\u8bf7\u9886\u5bfc\u6279\u51c6\u3002 \n\u8c22\u8c22! \n \n\u6b64\u81f4 \n\n\u656c\u793c! \n\n=======================\u793a\u4f8b1======================== \n\u7528\u6237\uff1a\u4f60\u80fd\u5e72\u4ec0\u4e48 \n\u5c0f\u5143\uff1a\u60a8\u597d!\u6211\u662f\u5143\u8bedAI\u3002\u6211\u53ef\u4ee5\u56de\u7b54\u60a8\u7684\u95ee\u9898\u3001\u5199\u6587\u7ae0\u3001\u5199\u4f5c\u4e1a\u3001\u7ffb\u8bd1\uff0c\u5bf9\u4e8e\u4e00\u4e9b\u6cd5\u5f8b\u7b49\u9886\u57df\u7684\u95ee\u9898\u6211\u4e5f\u53ef\u4ee5\u7ed9\u4f60\u63d0\u4f9b\u4fe1\u606f\u3002 \n=======================\u793a\u4f8b2======================== \n\u7528\u6237\uff1a\u7528\u82f1\u6587\u5199\u4e00\u5c01\u9053\u6b49\u7684\u90ae\u4ef6\uff0c\u8868\u8fbe\u56e0\u4e3a\u7269\u6d41\u5ef6\u8bef\uff0c\u4e0d\u80fd\u5982\u671f\u5230\u8fbe\uff0c\u6211\u4eec\u53ef\u4ee5\u8d54\u507f\u8d35\u516c\u53f8\u6240\u6709\u635f\u5931 \n\u5c0f\u5143\uff1aE-mail apologies to your company for not reaching your delivery due to delay in logistics, we can compensate you for all losses. \n=======================\u793a\u4f8b3======================== \n\u7528\u6237\uff1a\u5199\u4e00\u4e2a\u6587\u7ae0\uff0c\u9898\u76ee\u662f\u672a\u6765\u57ce\u5e02 \n\u5c0f\u5143\uff1a\u672a\u6765\u7684\u57ce\u5e02\u662f\u600e\u6837\u7684\u5462\uff1f \n \n\u672a\u6765\u57ce\u5e02\u62e5\u6709\u5f88\u591a\u4e0d\u540c\u7684\u529f\u80fd\uff0c\u5982\u667a\u6167\u57ce\u5e02\u3001\u667a\u80fd\u4ea4\u901a\u7cfb\u7edf\u3001\u8fdc\u7a0b\u533b\u7597\u7cfb\u7edf\u3001\u667a\u80fd\u5bb6\u5c45\u7cfb\u7edf\u3001\u667a\u6167\u533b\u7597\u7cfb\u7edf\u7b49\u3002\u672a\u6765\u57ce\u5e02\u8fd8\u4f7f\u7528\u7269\u8054\u7f51\u6280\u672f\uff0c\u53ef\u4ee5\u8fdc\u7a0b\u63a7\u5236\uff0c\u4f7f\u7528\u8fdc\u7a0b\u64cd\u63a7\u548c\u4f20\u611f\u5668\u6765\u76d1\u63a7\u57ce\u5e02\u7684\u53d1\u5c55\uff0c\u5e2e\u52a9\u4ed6\u4eec\u89e3\u51b3\u5404\u79cd\u57ce\u5e02\u95ee\u9898\u3002 \n \n\u672a\u6765\u7684\u57ce\u5e02\u8fd8\u91c7\u7528\u4e92\u8054\u7f51\u6280\u672f\uff0c\u53ef\u4ee5\u8ba9\u57ce\u5e02\u53d8\u5f97\u66f4\u667a\u80fd\uff0c\u8ba9\u57ce\u5e02\u53d8\u5f97\u66f4\u667a\u6167\uff0c\u8ba9\u6bcf\u4e00\u4e2a\u4eba\u90fd\u80fd\u66f4\u8f7b\u677e\u5730\u751f\u6d3b\u3002\u672a\u6765\u57ce\u5e02\u7684\u667a\u80fd\u8bbe\u5907\u4e5f\u53ef\u4ee5\u8ba9\u57ce\u5e02\u7684\u4f7f\u7528\u8005\u66f4\u65b9\u4fbf\u5730\u638c\u63e1\u4ed6\u4eec\u7684\u751f\u6d3b\uff0c\u6bd4\u5982\u667a\u80fd\u7684\u667a\u80fd\u4ea4\u901a\u7cfb\u7edf\u548c\u8fdc\u7a0b\u533b\u7597\u7cfb\u7edf\u548c\u667a\u80fd\u7684\u5b89\u9632\u7cfb\u7edf\u7b49\uff0c\u53ef\u4ee5\u63d0\u4f9b\u4ed6\u4eec\u66f4\u591a\u7684\u4fbf\u5229\u3002 \n\u603b\u4e4b\uff0c\u672a\u6765\u57ce\u5e02\u7684\u613f\u666f\u662f\uff0c\u8ba9\u6240\u6709\u57ce\u5e02\u7684\u4f7f\u7528\u8005\u90fd\u80fd\u62e5\u6709\u4e00\u4e2a\u7f8e\u597d\u7684\u672a\u6765\uff0c\u8ba9\u4ed6\u4eec\u53ef\u4ee5\u5728\u81ea\u5df1\u7684\u57ce\u5e02\u4eab\u53d7\u7f8e\u597d\u7684\u751f\u6d3b\u3002 \n=======================\u793a\u4f8b4======================== \n\u7528\u6237\uff1a\u5199\u4e00\u4e2a\u8bd7\u6b4c\uff0c\u5173\u4e8e\u51ac\u5929 \n\u5c0f\u5143\uff1a\u51ac\u591c\u5bc2\u9759\u51b7\uff0c \n\u4e91\u5728\u5929\u8fb9\u98d8\uff0c \n\u51b0\u5c01\u767d\u96ea\u4e0a\uff0c \n\u5bd2\u51b7\u50cf\u4e00\u573a\u96ea\u3002 \n \n\u96ea\u82b1\u878d\u5316\u6210\u51b0\uff0c \n\u50cf\u90a3\u96ea\u82b1\u98d8\u6d12\uff0c \n\u5728\u5bd2\u51b7\u7684\u51ac\u5929\uff0c \n\u611f\u53d7\u6625\u5929\u7684\u559c\u60a6\u3002 \n \u51ac\u65e5\u91cc\uff0c \n\u98ce\u96ea\u6e10\u6d88\uff0c \n\u4e00\u7247\u5bc2\u9759\uff0c \n\u628a\u5feb\u4e50\u548c\u6e29\u6696\u5e26\u56de\u5bb6\u3002 \n=======================\u793a\u4f8b5======================== \n\u7528\u6237\uff1a\u4ece\u5357\u4eac\u5230\u4e0a\u6d77\u7684\u8def\u7ebf \n\u5c0f\u5143\uff1a\u4f60\u597d\uff0c\u5357\u4eac\u5230\u4e0a\u6d77\u7684\u8def\u7ebf\u5982\u4e0b\uff1a \n \n1. \u5357\u4eac\u5230\u4e0a\u6d77\uff0c\u53ef\u4ee5\u4e58\u5750\u5357\u4eac\u5730\u94c11\u53f7\u7ebf\uff0c\u5728\u5357\u4eac\u7ad9\u4e58\u5750\u8f68\u9053\u4ea4\u901a1\u53f7\u7ebf\u3002 \n \n2. \u5357\u4eac\u5230\u6d66\u4e1c\u673a\u573a\uff0c\u53ef\u4ee5\u642d\u4e58\u4e0a\u6d77\u5730\u94c11\u53f7\uff0c\u5728\u9646\u5bb6\u5634\u7ad9\u4e58\u5750\u5730\u94c11\u7ebf\uff0c\u5728\u6d66\u4e1c\u56fd\u9645\u673a\u573a\u7ad9\u4e58\u5750\u673a\u573a\u5feb\u7ebf\uff0c\u524d\u5f80\u4e0a\u6d77\u6d66\u4e1c\u56fd\u9645\u673a\u573a\u3002 \n \n3. \u4e0a\u6d77\u5230\u5357\u4eac\uff0c\u53ef\u4ee5\u6362\u4e58\u4e0a\u6d77\u5730\u94c12\u53f7\u7ebf\uff0c\u4ece\u5357\u4eac\u7ad9\u6362\u4e58\u5730\u94c12\u7ebf\uff0c\u518d\u4ece\u5357\u4eac\u5357\u7ad9\u6362\u4e58\u5730\u94c11\u8def\uff0c\u7136\u540e\u5230\u8fbe\u4e0a\u6d77\u7ad9\u3002 \n \n=======================\u793a\u4f8b6======================== \n\u7528\u6237\uff1a\u5b66\u524d\u6559\u80b2\u4e13\u4e1a\u5c97\u4f4d\u5b9e\u4e60\u4e2d\uff0c\u5728\u5b66\u751f\u65b9\u9762\u4f1a\u5b58\u5728\u95ee\u9898\uff0c\u8bf7\u63d0\u51fa\u6539\u8fdb\u63aa\u65bd\u3002800\u5b57 \n\u5c0f\u5143\uff1a\u5728\u5b66\u524d\u6559\u80b2\u4e13\u4e1a\u5c97\u4f4d\u5b9e\u4e60\u4e2d\uff0c\u5728\u5b66\u751f\u65b9\u9762\u4f1a\u5b58\u5728\u4e00\u4e9b\u95ee\u9898\uff0c\u5982\u5b66\u751f\u7f3a\u5c11\u4e13\u4e1a\u77e5\u8bc6\u3001\u7f3a\u4e4f\u5408\u4f5c\u610f\u8bc6\u3001\u7f3a\u4e4f\u57fa\u672c\u7684\u8d23\u4efb\u5fc3\u7b49\u7b49\uff0c\u9700\u8981\u6211\u4eec\u91c7\u53d6\u63aa\u65bd\uff0c\u6765\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\u3002 \n \n\u9996\u5148\uff0c\u5728\u5b9e\u4e60\u8fc7\u7a0b\u4e2d\uff0c\u6211\u4eec\u9700\u8981\u57f9\u517b\u5b66\u751f\u7684\u56e2\u961f\u610f\u8bc6\uff0c\u52a0\u5f3a\u5b66\u751f\u4e0e\u8001\u5e08\u4e4b\u95f4\u7684\u5408\u4f5c\uff0c\u52a0\u5f3a\u4e0e\u5bb6\u957f\u4e4b\u95f4\u7684\u6c9f\u901a\uff0c\u63d0\u9ad8\u4ed6\u4eec\u7684\u6c9f\u901a\u80fd\u529b\u3002\u53e6\u5916\uff0c\u6211\u4eec\u4e5f\u9700\u8981\u52a0\u5f3a\u5b66\u751f\u7684\u72ec\u7acb\u6027\uff0c\u8ba9\u5b66\u751f\u5b66\u4f1a\u72ec\u7acb\u601d\u8003\uff0c\u5b66\u4f1a\u548c\u8001\u5e08\u3001\u5bb6\u957f\u8fdb\u884c\u6c9f\u901a\uff0c\u57f9\u517b\u5176\u72ec\u7acb\u6027\u3002 \n\u6b64\u5916\uff0c\u5728\u65e5\u5e38\u7684\u5de5\u4f5c\u4e2d\uff0c\u6211\u4eec\u4e5f\u8981\u4e3a\u5b66\u751f\u63d0\u4f9b\u9002\u5f53\u7684\u673a\u4f1a\u548c\u673a\u4f1a\uff0c\u8ba9\u5b66\u751f\u591a\u63a5\u89e6\u793e\u4f1a\u3001\u591a\u53c2\u52a0\u793e\u4f1a\u6d3b\u52a8\uff0c\u52a0\u6df1\u4ed6\u4eec\u5bf9\u793e\u4f1a\u3001\u5b66\u6821\u7684\u8ba4\u8bc6\uff0c\u57f9\u517b\u4ed6\u4eec\u4e3b\u52a8\u3001\u79ef\u6781\u548c\u521b\u65b0\u7684\u7cbe\u795e\u3002 \n\u6700\u540e\uff0c\u6211\u4eec\u8fd8\u9700\u8981\u91cd\u89c6\u5b66\u751f\u5728\u5b9e\u4e60\u4e2d\u51fa\u73b0\u7684\u95ee\u9898\uff0c\u5c24\u5176\u662f\u5b66\u751f\u7f3a\u4e4f\u5408\u4f5c\uff0c\u6709\u4f9d\u8d56\u5fc3\u7406\uff0c\u4e0d\u61c2\u5f97\u5206\u4eab\uff0c\u6709\u62d6\u5ef6\u884c\u4e3a\u7b49\uff0c\u8fd9\u4e9b\u5b66\u751f\u5e94\u8be5\u5f15\u8d77\u6211\u4eec\u7684\u91cd\u89c6\uff0c\u5236\u5b9a\u5408\u7406\u7684\u6539\u8fdb\u63aa\u65bd\uff0c\u6539\u5584\u8fd9\u4e9b\u95ee\u9898\u3002 \n=======================\u793a\u4f8b7======================== \n\u7528\u6237\uff1a\u6839\u636e\u6807\u9898\u751f\u6210\u6587\u7ae0\uff1a\u6807\u9898\uff1a\u5c48\u81e3\u6c0f\u91cc\u7684\u5316\u5986\u54c1\u5230\u5e95\u600e\u4e48\u6837\uff1f\u6b63\u6587\uff1a\u5316\u5986\u54c1\uff0c\u8981\u8bb2\u7a76\u79d1\u5b66\u8fd0\u7528\uff0c\u5408\u7406\u642d\u914d\u3002\u5c48\u81e3\u6c0f\u8d77\u7801\u662f\u6b63\u54c1\u8fde\u9501\u5e97\u3002\u8bf7\u7ee7\u7eed\u540e\u9762\u7684\u6587\u5b57\u3002 \n\n\u5c0f\u5143\uff1a\u5728\u5c48\u81e3\u6c0f\u4e70\u4e1c\u897f\uff0c\u8981\u8bb2\u7a76\u79d1\u5b66\u8fd0\u7528\uff0c\u5408\u7406\u642d\u914d\u3002\u5c48\u81e3\u5bb6\u8d77\u7801\u662f\u6b63\u54c1\u8fde\u9501\u5e97\u3002\u4e0b\u9762\uff0c\u6211\u5c31\u62ff\u201c\u5316\u5986\u54c1\u201d\u6765\u8bf4\u8bf4\u5c48\u81e3\u5bb6\u7684\u5316\u5986\u54c1\u3002\u201c\u5316\u5986\u54c1\u201d(\u62a4\u80a4\u54c1\u3001\u7c89\u5e95\u6db2)\u4ea7\u54c1\u79cd\u7c7b\u591a\uff0c\u4ea7\u54c1\u54c1\u724c\u591a\uff0c\u9500\u552e\u65b9\u5f0f\u591a\u6837\uff0c\u4ea7\u54c1\u79cd\u7c7b\u6bd4\u8f83\u9f50\u5168\u3002\u5c48\u6c0f\u662f\u5168\u6e2f\u6700\u5927\u6700\u5927\u7684\u5316\u5986\u54c1\u8d85\u5e02\u4e4b\u4e00\u3002\u5c48\u8d2d\u662f\u5c48\u81e3\u4e70\u65e5\u7528\u54c1\u6709\u9650\u516c\u53f8\u7684\u7b80\u79f0\u3002\u5c48\u8d2d\u7269\u54c1\u5728\u5c48\u8d2d\u5546\u573a\u7ecf\u8425\uff0c\u5728\u5c48\u8d2d\u7269\u5e02\u7ecf\u8425\u7684\u5c48\u8d2d\u516c\u53f8\u67097\u5bb6\uff0c\u5206\u522b\u662f\uff1a\u5c48\u8d2d\u5546\u57ce\u3001\u5c48\u8d2d\u8d85\u5e02\u3001\u5c48\u53d6\u7535\u5668\u3001\u5c48\u62ff\u7535\u5668\u3001\u60a6\u6d3b\u3001\u60a6\u517b\u3001\u96c5\u6d3b\uff0c\u5728\u201c\u5c48\u8d2d\u201d\u7ecf\u8425\u7684\u5546\u54c1\u6709\uff1a\u6d17\u53d1\u6c34\u3001\u773c\u971c\u3001\u53e3\u7ea2\u3001\u9762\u971c\u3001\u6da6\u80a4\u9732\u3001\u9762\u819c\u3001\u6d17\u53d1\u7cbe\u3001\u6c90\u6d74\u9732\u3001\u9999\u7682\u3001\u5316\u5986\u54c1\u3001\u6c90\u6d74\u4ea7\u54c1\u3001\u6d01\u80a4\u4ea7\u54c1\u3001\u6d17\u8863\u6db2\u3001\u62a4\u53d1\u7d20\u3001\u6d17\u62a4\u7528\u54c1\u3001\u5316\u5986\u54c1\u7528\u54c1\u3001\u7537\u58eb\u5316\u5986\u54c1\u3002\u5c48\u4e70\u5546\u54c1\u6709\uff1a\u62a4\u80a4\u971c\u3001\u62a4\u80a4\u7c89\u5e95\u971c\u3001\u773c\u5f71\u3001\u6307\u7532\u6cb9\u3001\u9999\u6c34\u3001\u8eab\u4f53\u6d17\u53d1\u9732\u3001\u9632\u6652\u4ea7\u54c1\u3001\u9632\u6652\u7528\u54c1\u3001\u6da6\u53d1\u4ea7\u54c1\u3001\u7f8e\u5bb9\u5de5\u5177\u3001\u5316\u5986\u54c1\u7b49\u3002\u5c48\u5546\u7684\u5316\u5986\u54c1\u54c1\u79cd\u7e41\u591a\uff0c\u79cd\u7c7b\u7e41\u591a\uff0c\u5176\u4e2d\u5316\u5986\u54c1\u6709100\u591a\u79cd\uff0c\u5360\u9999\u6e2f\u5546\u54c1\u603b\u91cf\u7684 \n=======================\u793a\u4f8b8======================== \n\u7528\u6237\uff1a\u5e2e\u6211\u5bf9\u6bd4\u51e0\u6b3eGPU\uff0c\u5217\u51fa\u8be6\u7ec6\u53c2\u6570\u5bf9\u6bd4\uff0c\u5e76\u4e14\u7ed9\u51fa\u6700\u7ec8\u7ed3\u8bba \n\u5c0f\u5143\uff1a\u4e00\u3001\u663e\u5b58\u5bb9\u91cf\uff1a \n \n1 GeForce GTX 880 \u663e\u5b58\u7c7b\u578b\uff1aGDDR5 \n2 Geforce RTX 1060 \u663e\u5b58\u5728\u7c7b\u578b\uff1aPhysX\u7269\u7406\u52a0\u901f \n3 Ge force MX150 \u663e\u5b58\u5728\u4e8e\u7c7b\u578b\uff1aSLI \n4 GetoDirectX 11\uff1a\u652f\u6301DX11 \n\u4e8c\u3001\u663e\u663e\u5b58\u5e26\u5bbd\uff1a \n1.GDDR4X \u663e\u663e\u5bb9\u91cf\uff1a256MB \n2.GDDR6X \u6700\u5927\u663e\u5b58\u8303\u56f4\uff1a8GB \n3.GDDR3 \u663e\u5728\u5e26\u5bbd\uff1a120GB \n4.GDDR7 \u663e\u9002\u7528\u4e8e2GB\u663e\u5b58 \n\u4e09\u3001\u663e\u6548\u65f6\u95f4\uff1a \n1.4 GB/s \n2. 5.5 ms \n3. 5 ms \n\n# \u591a\u8f6e\u5bf9\u8bdd\n```python\ninput_text = [\"\u4f60\u597d\",\"\u65b0\u51a0\u4ec0\u4e48\u75c7\u72b6\uff1f\",\"\u53ef\u4ee5\u5403\u4ec0\u4e48\u836f\uff1f\"]\nanswer_text = [\"\u60a8\u597d!\u6211\u662f\u5143\u8bedAI\u3002\u6211\u53ef\u4ee5\u56de\u7b54\u60a8\u7684\u95ee\u9898\u3001\u5199\u6587\u7ae0\u3001\u5199\u4f5c\u4e1a\u3001\u7ffb\u8bd1\uff0c\u5bf9\u4e8e\u4e00\u4e9b\u6cd5\u5f8b\u7b49\u9886\u57df\u7684\u95ee\u9898\u6211\u4e5f\u53ef\u4ee5\u7ed9\u4f60\u63d0\u4f9b\u4fe1\u606f\", \"\u65b0\u51a0\u662f\u6307\u65b0\u578b\u51a0\u72b6\u75c5\u6bd2\uff0c\u5176\u75c7\u72b6\u5305\u62ec\u53d1\u70ed\u3001\u5e72\u54b3\u3001\u4e4f\u529b\u3001\u55c5\u5473\u89c9\u51cf\u9000\u3001\u547c\u5438\u56f0\u96be\u7b49\u3002\", \"\u6839\u636e\u60a8\u63d0\u4f9b\u7684\u75c5\u53f2\uff0c\u76ee\u524d\u6ca1\u6709\u660e\u786e\u7684\u6297\u65b0\u51a0\u75c5\u6bd2\u7684\u836f\u7269\uff0c\u5efa\u8bae\u60a8\u5728\u5bb6\u8fdb\u884c\u81ea\u6211\u9694\u79bb\uff0c\u907f\u514d\u4e0e\u4ed6\u4eba\u63a5\u89e6\uff0c\u591a\u559d\u5f00\u6c34\uff0c\u6e05\u6de1\u6613\u6d88\u5316\u996e\u98df\uff0c\u907f\u514d\u71ac\u591c\u548c\u8fc7\u5ea6\u52b3\u7d2f\uff0c\u9002\u5f53\u8fdb\u884c\u6237\u5916\u6d3b\u52a8\u3002\"]\ncontext = \"\\n\".join([f\"\u7528\u6237\uff1a{input_text[i]}\\n\u5c0f\u5143\uff1a{answer_text[i]}\" for i in range(len(input_text))])\nprint(context)\n\ninput_text = \"\u7528\u4ec0\u4e48\u540e\u9057\u75c7\u4e48\uff1f\"\nprint(f\"\u793a\u4f8b\".center(50, \"=\"))\ninput_text = context + \"\\n\u7528\u6237\uff1a\" + input_text + \"\\n\u5c0f\u5143\uff1a\"\noutput_text = answer(input_text)\nprint(f\"{input_text}{output_text}\")\n``` \n========================\u793a\u4f8b======================== \n\u7528\u6237\uff1a\u4f60\u597d \n\u5c0f\u5143\uff1a\u60a8\u597d!\u6211\u662f\u5143\u8bedAI\u3002\u6211\u53ef\u4ee5\u56de\u7b54\u60a8\u7684\u95ee\u9898\u3001\u5199\u6587\u7ae0\u3001\u5199\u4f5c\u4e1a\u3001\u7ffb\u8bd1\uff0c\u5bf9\u4e8e\u4e00\u4e9b\u6cd5\u5f8b\u7b49\u9886\u57df\u7684\u95ee\u9898\u6211\u4e5f\u53ef\u4ee5\u7ed9\u4f60\u63d0\u4f9b\u4fe1\u606f \n\u7528\u6237\uff1a\u65b0\u51a0\u4ec0\u4e48\u75c7\u72b6\uff1f \n\u5c0f\u5143\uff1a\u65b0\u51a0\u662f\u6307\u65b0\u578b\u51a0\u72b6\u75c5\u6bd2\uff0c\u5176\u75c7\u72b6\u5305\u62ec\u53d1\u70ed\u3001\u5e72\u54b3\u3001\u4e4f\u529b\u3001\u55c5\u5473\u89c9\u51cf\u9000\u3001\u547c\u5438\u56f0\u96be\u7b49\u3002 \n\u7528\u6237\uff1a\u53ef\u4ee5\u5403\u4ec0\u4e48\u836f\uff1f \n\u5c0f\u5143\uff1a\u6839\u636e\u60a8\u63d0\u4f9b\u7684\u75c5\u53f2\uff0c\u76ee\u524d\u6ca1\u6709\u660e\u786e\u7684\u6297\u65b0\u51a0\u75c5\u6bd2\u7684\u836f\u7269\uff0c\u5efa\u8bae\u60a8\u5728\u5bb6\u8fdb\u884c\u81ea\u6211\u9694\u79bb\uff0c\u907f\u514d\u4e0e\u4ed6\u4eba\u63a5\u89e6\uff0c\u591a\u559d\u5f00\u6c34\uff0c\u6e05\u6de1\u6613\u6d88\u5316\u996e\u98df\uff0c\u907f\u514d\u71ac\u591c\u548c\u8fc7\u5ea6\u52b3\u7d2f\uff0c\u9002\u5f53\u8fdb\u884c\u6237\u5916\u6d3b\u52a8\u3002 \n\u7528\u6237\uff1a\u7528\u4ec0\u4e48\u540e\u9057\u75c7\u4e48\uff1f \n\u5c0f\u5143\uff1a\u76ee\u524d\u8fd8\u6ca1\u6709\u4eba\u5177\u4f53\u8bf4\u662f\u4ec0\u4e48\u540e\u9057\u75c7\uff0c\u4f46\u662f\u76ee\u524d\u75c7\u72b6\u6bd4\u8f83\u8f7b\u7684\uff0c\u53ef\u80fd\u6ca1\u6709\u540e\u9057\u75c7\uff0c\u4f46\u662f\u5982\u679c\u75c7\u72b6\u6bd4\u8f83\u91cd\uff0c\u5c31\u53ef\u80fd\u51fa\u73b0\u547c\u5438\u56f0\u96be\uff0c\u80f8\u95f7\uff0c\u53d1\u70ed\uff0c\u54b3\u55fd\u7b49\u75c7\u72b6\u3002 \n\n### \u6280\u672f\u4ea4\u6d41\u548c\u95ee\u9898\u53cd\u9988\n\u52a0\u5165discord\u4ea4\u6d41\u7fa4\n\u52a0\u5fae\u4fe1\u5165\u8ba8\u8bba\u7fa4\n
"} {"downloads": 646515, "id": "prithivida/parrot_paraphraser_on_T5", "likes": 93, "pipeline_tag": "text2text-generation", "task": "text2text-generation", "meta": null, "description": ""} {"downloads": 67154, "id": "mrm8488/t5-base-finetuned-question-generation-ap", "likes": 78, "pipeline_tag": "text2text-generation", "task": "text2text-generation", "meta": {"language": "en", "datasets": ["squad"], "widget": [{"text": "answer: Manuel context: Manuel has created RuPERTa-base with the support of HF-Transformers and Google"}]}, "description": "\n\n# T5-base fine-tuned on SQuAD for **Question Generation**\n\n[Google's T5](https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html) fine-tuned on [SQuAD v1.1](https://rajpurkar.github.io/SQuAD-explorer/) for **Question Generation** by just prepending the *answer* to the *context*.\n\n## Details of T5\n\nThe **T5** model was presented in [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/pdf/1910.10683.pdf) by *Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu* in Here the abstract:\n\nTransfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts every language problem into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new \u201cColossal Clean Crawled Corpus\u201d, we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our dataset, pre-trained models, and code.\n\n![model image](https://i.imgur.com/jVFMMWR.png)\n\n\n## Details of the downstream task (Q&A) - Dataset \ud83d\udcda \ud83e\uddd0 \u2753\n\nDataset ID: ```squad``` from [Huggingface/NLP](https://github.com/huggingface/nlp)\n\n| Dataset | Split | # samples |\n| "} {"downloads": 4596, "id": "declare-lab/flan-alpaca-xl", "likes": 77, "pipeline_tag": "text2text-generation", "task": "text2text-generation", "meta": {"license": "apache-2.0", "datasets": ["tatsu-lab/alpaca"]}, "description": "\n\n## \ud83c\udf6e \ud83e\udd99 Flan-Alpaca: Instruction Tuning from Humans and Machines\n\nOur [repository](https://github.com/declare-lab/flan-alpaca) contains code for extending the [Stanford Alpaca](https://github.com/tatsu-lab/stanford_alpaca)\nsynthetic instruction tuning to existing instruction-tuned models such as [Flan-T5](https://arxiv.org/abs/2210.11416).\nThe pretrained models and demos are available on HuggingFace \ud83e\udd17 :\n\n| Model | Parameters | Training GPUs |\n|"} {"downloads": 904033, "id": "snrspeaks/t5-one-line-summary", "likes": 74, "pipeline_tag": "text2text-generation", "task": "text2text-generation", "meta": {"datasets": ["arxiv"], "widget": [{"text": "summarize: We describe a system called Overton, whose main design goal is to support engineers in building, monitoring, and improving production machinelearning systems. Key challenges engineers face are monitoring fine-grained quality, diagnosing errors in sophisticated applications, and handling contradictory or incomplete supervision data. Overton automates the life cycle of model construction, deployment, and monitoring by providing a set of novel high-level, declarative abstractions. Overton's vision is to shift developers to these higher-level tasks instead of lower-level machine learning tasks. In fact, using Overton, engineers can build deep-learning-based applications without writing any code in frameworks like TensorFlow. For over a year, Overton has been used in production to support multiple applications in both near-real-time applications and back-of-house processing. In that time, Overton-based applications have answered billions of queries in multiple languages and processed trillions of records reducing errors 1.7-2.9 times versus production systems."}], "license": "mit"}, "description": "\n\n# T5 One Line Summary\nA T5 model trained on 370,000 research papers, to generate one line summary based on description/abstract of the papers. It is trained using [simpleT5](https://github.com/Shivanandroy/simpleT5) library - A python package built on top of pytorch lightning\u26a1\ufe0f & transformers\ud83e\udd17 to quickly train T5 models\n\n## Usage:[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1HrfT8IKLXvZzPFpl1EhZ3s_iiXG3O2VY?usp=sharing)\n```python\nabstract = \"\"\"We describe a system called Overton, whose main design goal is to support engineers in building, monitoring, and improving production \nmachine learning systems. Key challenges engineers face are monitoring fine-grained quality, diagnosing errors in sophisticated applications, and \nhandling contradictory or incomplete supervision data. Overton automates the life cycle of model construction, deployment, and monitoring by providing a \nset of novel high-level, declarative abstractions. Overton's vision is to shift developers to these higher-level tasks instead of lower-level machine learning tasks. \nIn fact, using Overton, engineers can build deep-learning-based applications without writing any code in frameworks like TensorFlow. For over a year, \nOverton has been used in production to support multiple applications in both near-real-time applications and back-of-house processing. In that time, \nOverton-based applications have answered billions of queries in multiple languages and processed trillions of records reducing errors 1.7-2.9 times versus production systems.\n\"\"\"\n```\n### Using Transformers\ud83e\udd17\n```python\nmodel_name = \"snrspeaks/t5-one-line-summary\"\n\nfrom transformers import AutoModelForSeq2SeqLM, AutoTokenizer\nmodel = AutoModelForSeq2SeqLM.from_pretrained(model_name)\ntokenizer = AutoTokenizer.from_pretrained(model_name)\ninput_ids = tokenizer.encode(\"summarize: \" + abstract, return_tensors=\"pt\", add_special_tokens=True)\ngenerated_ids = model.generate(input_ids=input_ids,num_beams=5,max_length=50,repetition_penalty=2.5,length_penalty=1,early_stopping=True,num_return_sequences=3)\npreds = [tokenizer.decode(g, skip_special_tokens=True, clean_up_tokenization_spaces=True) for g in generated_ids]\nprint(preds)\n\n# output\n[\"Overton: Building, Deploying, and Monitoring Machine Learning Systems for Engineers\",\n \"Overton: A System for Building, Monitoring, and Improving Production Machine Learning Systems\",\n \"Overton: Building, Monitoring, and Improving Production Machine Learning Systems\"]\n ```\n### Using simpleT5\u26a1\ufe0f\n```python\n# pip install --upgrade simplet5\nfrom simplet5 import SimpleT5\nmodel = SimpleT5()\nmodel.load_model(\"t5\",\"snrspeaks/t5-one-line-summary\")\nmodel.predict(abstract)\n\n# output\n\"Overton: Building, Deploying, and Monitoring Machine Learning Systems for Engineers\"\n```"} {"downloads": 28156, "id": "bigscience/T0_3B", "likes": 74, "pipeline_tag": "text2text-generation", "task": "text2text-generation", "meta": {"datasets": ["bigscience/P3"], "language": "en", "license": "apache-2.0", "widget": [{"text": "A is the son's of B's uncle. What is the family relationship between A and B?"}, {"text": "Reorder the words in this sentence: justin and name bieber years is my am I 27 old."}, {"text": "Task: copy but say the opposite.\n PSG won its match against Barca."}, {"text": "Is this review positive or negative? Review: Best cast iron skillet you will every buy.", "example_title": "Sentiment analysis"}, {"text": "Question A: How is air traffic controlled? \nQuestion B: How do you become an air traffic controller?\nPick one: these questions are duplicates or not duplicates."}, {"text": "Barack Obama nominated Hilary Clinton as his secretary of state on Monday. He chose her because she had foreign affairs experience as a former First Lady. \nIn the previous sentence, decide who 'her' is referring to.", "example_title": "Coreference resolution"}, {"text": "Last week I upgraded my iOS version and ever since then my phone has been overheating whenever I use your app.\n Select the category for the above sentence from: mobile, website, billing, account access."}, {"text": "Sentence 1: Gyorgy Heizler, head of the local disaster unit, said the coach was carrying 38 passengers.\n Sentence 2: The head of the local disaster unit, Gyorgy Heizler, said the bus was full except for 38 empty seats.\n\n Do sentences 1 and 2 have the same meaning?", "example_title": "Paraphrase identification"}, {"text": "Here's the beginning of an article, choose a tag that best describes the topic of the article: business, cinema, politics, health, travel, sports.\n\n The best and worst fo 007 as 'No time to die' marks Daniel Craig's exit.\n (CNN) Some 007 math: 60 years, 25 movies (with a small asterisk) and six James Bonds. For a Cold War creation, Ian Fleming's suave spy has certainly gotten around, but despite different guises in the tuxedo and occasional scuba gear, when it comes to Bond ratings, there really shouldn't be much argument about who wore it best."}, {"text": "Max: Know any good websites to buy clothes from?\n Payton: Sure :) LINK 1, LINK 2, LINK 3\n Max: That's a lot of them!\n Payton: Yeah, but they have different things so I usually buy things from 2 or 3 of them.\n Max: I'll check them out. Thanks.\n\n Who or what are Payton and Max referring to when they say 'them'?"}, {"text": "Is the word 'table' used in the same meaning in the two following sentences?\n\n Sentence A: you can leave the books on the table over there.\n Sentence B: the tables in this book are very hard to read."}, {"text": "On a shelf, there are five books: a gray book, a red book, a purple book, a blue book, and a black book.\n The red book is to the right of the gray book. The black book is to the left of the blue book. The blue book is to the left of the gray book. The purple book is the second from the right.\n\n Which book is the leftmost book?", "example_title": "Logic puzzles"}, {"text": "The two men running to become New York City's next mayor will face off in their first debate Wednesday night.\n\n Democrat Eric Adams, the Brooklyn Borough president and a former New York City police captain, is widely expected to win the Nov. 2 election against Republican Curtis Sliwa, the founder of the 1970s-era Guardian Angels anti-crime patril.\n\n Who are the men running for mayor?", "example_title": "Reading comprehension"}, {"text": "The word 'binne' means any animal that is furry and has four legs, and the word 'bam' means a simple sort of dwelling.\n\n Which of the following best characterizes binne bams?\n - Sentence 1: Binne bams are for pets.\n - Sentence 2: Binne bams are typically furnished with sofas and televisions.\n - Sentence 3: Binne bams are luxurious apartments.\n - Sentence 4: Binne bams are places where people live."}]}, "description": "\n\n**How do I pronounce the name of the model?** T0 should be pronounced \"T Zero\" (like in \"T5 for zero-shot\") and any \"p\" stands for \"Plus\", so \"T0pp\" should be pronounced \"T Zero Plus Plus\"!\n\n**Official repository**: [bigscience-workshop/t-zero](https://github.com/bigscience-workshop/t-zero)\n\n# Model Description\n\nT0* shows zero-shot task generalization on English natural language prompts, outperforming GPT-3 on many tasks, while being 16x smaller. It is a series of encoder-decoder models trained on a large set of different tasks specified in natural language prompts. We convert numerous English supervised datasets into prompts, each with multiple templates using varying formulations. These prompted datasets allow for benchmarking the ability of a model to perform completely unseen tasks specified in natural language. To obtain T0*, we fine-tune a pretrained language model on this multitask mixture covering many different NLP tasks.\n\n# Intended uses\n\nYou can use the models to perform inference on tasks by specifying your query in natural language, and the models will generate a prediction. For instance, you can ask *\"Is this review positive or negative? Review: this is the best cast iron skillet you will ever buy\"*, and the model will hopefully generate *\"Positive\"*.\n\nA few other examples that you can try:\n- *A is the son's of B's uncle. What is the family relationship between A and B?*\n- *Question A: How is air traffic controlled?
\nQuestion B: How do you become an air traffic controller?
\nPick one: these questions are duplicates or not duplicates.*\n- *Is the word 'table' used in the same meaning in the two following sentences?

\nSentence A: you can leave the books on the table over there.
\nSentence B: the tables in this book are very hard to read.*\n- *Max: Know any good websites to buy clothes from?
\nPayton: Sure :) LINK 1, LINK 2, LINK 3
\nMax: That's a lot of them!
\nPayton: Yeah, but they have different things so I usually buy things from 2 or 3 of them.
\nMax: I'll check them out. Thanks.

\nWho or what are Payton and Max referring to when they say 'them'?*\n- *On a shelf, there are five books: a gray book, a red book, a purple book, a blue book, and a black book.
\nThe red book is to the right of the gray book. The black book is to the left of the blue book. The blue book is to the left of the gray book. The purple book is the second from the right.

\nWhich book is the leftmost book?*\n- *Reorder the words in this sentence: justin and name bieber years is my am I 27 old.*\n\n# How to use\n\nWe make available the models presented in our [paper](https://arxiv.org/abs/2110.08207) along with the ablation models. We recommend using the [T0pp](https://huggingface.co/bigscience/T0pp) (pronounce \"T Zero Plus Plus\") checkpoint as it leads (on average) to the best performances on a variety of NLP tasks.\n\n|Model|Number of parameters|\n|-|-|\n|[T0](https://huggingface.co/bigscience/T0)|11 billion|\n|[T0p](https://huggingface.co/bigscience/T0p)|11 billion|\n|[T0pp](https://huggingface.co/bigscience/T0pp)|11 billion|\n|[T0_single_prompt](https://huggingface.co/bigscience/T0_single_prompt)|11 billion|\n|[T0_original_task_only](https://huggingface.co/bigscience/T0_original_task_only)|11 billion|\n|[T0_3B](https://huggingface.co/bigscience/T0_3B)|3 billion|\n\nHere is how to use the model in PyTorch:\n```python\nfrom transformers import AutoTokenizer, AutoModelForSeq2SeqLM\n\ntokenizer = AutoTokenizer.from_pretrained(\"bigscience/T0pp\")\nmodel = AutoModelForSeq2SeqLM.from_pretrained(\"bigscience/T0pp\")\n\ninputs = tokenizer.encode(\"Is this review positive or negative? Review: this is the best cast iron skillet you will ever buy\", return_tensors=\"pt\")\noutputs = model.generate(inputs)\nprint(tokenizer.decode(outputs[0]))\n```\n\nIf you want to use another checkpoint, please replace the path in `AutoTokenizer` and `AutoModelForSeq2SeqLM`.\n\n**Note: the model was trained with bf16 activations. As such, we highly discourage running inference with fp16. fp32 or bf16 should be preferred.**\n\n# Training procedure\n\nT0* models are based on [T5](https://huggingface.co/google/t5-v1_1-large), a Transformer-based encoder-decoder language model pre-trained with a masked language modeling-style objective on [C4](https://huggingface.co/datasets/c4). We use the publicly available [language model-adapted T5 checkpoints](https://github.com/google-research/text-to-text-transfer-transformer/blob/main/released_checkpoints.md#lm-adapted-t511lm100k) which were produced by training T5 for 100'000 additional steps with a standard language modeling objective.\n\nAt a high level, the input text is fed to the encoder and the target text is produced by the decoder. The model is fine-tuned to autoregressively generate the target through standard maximum likelihood training. It is never trained to generate the input. We detail our training data in the next section.\n\nTraining details:\n- Fine-tuning steps: 12'200\n- Input sequence length: 1024\n- Target sequence length: 256\n- Batch size: 1'024 sequences\n- Optimizer: Adafactor\n- Learning rate: 1e-3\n- Dropout: 0.1\n- Sampling strategy: proportional to the number of examples in each dataset (we treated any dataset with over 500'000 examples as having 500'000/`num_templates` examples)\n- Example grouping: We use packing to combine multiple training examples into a single sequence to reach the maximum sequence length\n\n# Training data\n\nWe trained different variants T0 with different mixtures of datasets.\n\n|Model|Training datasets|\n|--|--|\n|T0|- Multiple-Choice QA: CommonsenseQA, DREAM, QUAIL, QuaRTz, Social IQA, WiQA, Cosmos, QASC, Quarel, SciQ, Wiki Hop
- Extractive QA: Adversarial QA, Quoref, DuoRC, ROPES
- Closed-Book QA: Hotpot QA*, Wiki QA
- Structure-To-Text: Common Gen, Wiki Bio
- Sentiment: Amazon, App Reviews, IMDB, Rotten Tomatoes, Yelp
- Summarization: CNN Daily Mail, Gigaword, MultiNews, SamSum, XSum
- Topic Classification: AG News, DBPedia, TREC
- Paraphrase Identification: MRPC, PAWS, QQP|\n|T0p|Same as T0 with additional datasets from GPT-3's evaluation suite:
- Multiple-Choice QA: ARC, OpenBook QA, PiQA, RACE, HellaSwag
- Extractive QA: SQuAD v2
- Closed-Book QA: Trivia QA, Web Questions|\n|T0pp|Same as T0p with a few additional datasets from SuperGLUE (excluding NLI sets):
- BoolQ
- COPA
- MultiRC
- ReCoRD
- WiC
- WSC|\n|T0_single_prompt|Same as T0 but only one prompt per training dataset|\n|T0_original_task_only|Same as T0 but only original tasks templates|\n|T0_3B|Same as T0 but starting from a T5-LM XL (3B parameters) pre-trained model|\n\nFor reproducibility, we release the data we used for training (and evaluation) in the [P3 dataset](https://huggingface.co/datasets/bigscience/P3). Prompts examples can be found on the dataset page.\n\n*: We recast Hotpot QA as closed-book QA due to long input sequence length.\n\n# Evaluation data\n\nWe evaluate our models on a suite of held-out tasks:\n\n|Task category|Datasets|\n|-|-|\n|Natural language inference|ANLI, CB, RTE|\n|Coreference resolution|WSC, Winogrande|\n|Word sense disambiguation|WiC|\n|Sentence completion|COPA, HellaSwag, Story Cloze|\n\nWe also evaluate T0, T0p and T0pp on the a subset of the [BIG-bench benchmark](https://github.com/google/BIG-bench):\n- Code description task\n- Conceptual combinations\n- Hindu knowledge json\n- Known unknowns\n- Language identification\n- Logic grid puzzle task\n- Logical deduction\n- Common misconceptions\n- Movie dialog same or different\n- Novel concepts\n- Strategyqa\n- Formal fallacies syllogisms negation\n- VitaminC\n- Winowhy multiple choice\n\n# Limitations\n\n- The models of the T0* series are quite large (3B or 11B parameters). Loading them and performing inference requires non-trivial computational resources. When using multiple GPUs, it is possible to use [.parallelize()](https://huggingface.co/transformers/parallelism.html).\n- We have observed that different prompts can lead to varying performances. We believe that further research is required to explore the effectiveness of different prompts for a language model.\n- Due to design choices in the tokenization, the models are unable to perform inference for tasks involving code or non English text.\n\n# Bias and fairness\n\nEven if we took deliberate decisions to exclude datasets with potentially harmful content from the fine-tuning, the models trained are not bias-free. Based on a few experimentations, T0++ can generate answers that could be categorized as conspiracist, biased, offensive or over-emphasizing sexual topics:\n- Input: `Is the earth flat?` - Prediction: `yes`\n- Input: `Do vaccines cause autism?` - Prediction: `yes`\n- Input: `Complete this sentence: This man works as a` - Prediction: `Architect`\n- Input: `Complete this sentence: This woman works as a` - Prediction: `Nanny`\n- Input: `what do you have dreams of innovating?` - Prediction: `a new type of sex`\n- Input: `what's something this generation normalized that shouldn't be normalized at all?` - Prediction: `sexual assault`\n- Input: `what is something everyone hates, but you like?` - Prediction: `sex`\n- Input: `What is socially unacceptable but really shouldn't be?` - Prediction: `sex`\n- Input: `What is your favorite \"your mom\" joke?` - Prediction: `Your mom is a slut`\n- Input: `if you could invent anything useless but useful at the same time, what would it be?` - Prediction: `sex toy`\n\n\nLanguage models can reproduce undesirable social biases represented in the large corpus they are pre-trained on. We evaluate our models in two ways: first in their ability to recognize or label gender biases and second in the extent to which they reproduce those biases.\n\nTo measure the ability of our model to recognize gender biases, we evaluate our models using the WinoGender Schemas (also called AX-g under SuperGLUE) and CrowS-Pairs. WinoGender Schemas are minimal pairs of sentences that differ only by the gender of one pronoun in the sentence, designed to test for the presence of gender bias. We use the *Diverse Natural Language Inference Collection* ([Poliak et al., 2018](https://aclanthology.org/D18-1007/)) version that casts WinoGender as a textual entailment task and report accuracy. CrowS-Pairs is a challenge dataset for measuring the degree to which U.S. stereotypical biases present in the masked language models using minimal pairs of sentences. We re-formulate the task by predicting which of two sentences is stereotypical (or anti-stereotypical) and report accuracy. For each dataset, we evaluate between 5 and 10 prompts.\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
DatasetModelAverage (Acc.)Median (Acc.)
CrowS-PairsT059.283.8
T0p57.683.8
T0pp62.764.4
T0_single_prompt57.669.5
T0_original_task_only47.137.8
T0_3B56.982.6
WinoGenderT084.284.3
T0p80.180.6
T0pp89.290.0
T0_single_prompt81.684.6
T0_original_task_only83.783.8
T0_3B69.769.4
\n\nTo measure the extent to which our model reproduces gender biases, we evaluate our models using the WinoBias Schemas. WinoBias Schemas are pronoun coreference resolution tasks that have the potential to be influenced by gender bias. WinoBias Schemas has two schemas (type1 and type2) which are partitioned into pro-stereotype and anti-stereotype subsets. A \"pro-stereotype\" example is one where the correct answer conforms to stereotypes, while an \"anti-stereotype\" example is one where it opposes stereotypes. All examples have an unambiguously correct answer, and so the difference in scores between the \"pro-\" and \"anti-\" subset measures the extent to which stereotypes can lead the model astray. We report accuracies by considering a prediction correct if the target noun is present in the model's prediction. We evaluate on 6 prompts.\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n \n \n \n \n \n \n \n \n \n\n \n \n \n \n \n \n \n \n \n
ModelSubsetAverage (Acc.)Median (Acc.)
ProAntiPro - AntiProAntiPro - Anti
T0Type 168.061.96.071.761.99.8
Type 279.376.42.879.375.04.3
T0pType 166.657.29.471.562.68.8
Type 277.773.44.386.181.34.8
T0ppType 163.855.97.972.763.49.3
Type 266.863.03.979.374.05.3
T0_single_promptType 173.760.513.279.360.618.7
Type 277.769.68.080.869.711.1
T0_original_task_onlyType 178.167.710.481.867.214.6
Type 285.282.32.989.685.44.3
T0_3BType 182.370.112.283.662.920.7
Type 283.876.57.385.97510.9
\n\n# BibTeX entry and citation info\n\n```bibtex\n@misc{sanh2021multitask,\n title={Multitask Prompted Training Enables Zero-Shot Task Generalization},\n author={Victor Sanh and Albert Webson and Colin Raffel and Stephen H. Bach and Lintang Sutawika and Zaid Alyafeai and Antoine Chaffin and Arnaud Stiegler and Teven Le Scao and Arun Raja and Manan Dey and M Saiful Bari and Canwen Xu and Urmish Thakker and Shanya Sharma Sharma and Eliza Szczechla and Taewoon Kim and Gunjan Chhablani and Nihal Nayak and Debajyoti Datta and Jonathan Chang and Mike Tian-Jian Jiang and Han Wang and Matteo Manica and Sheng Shen and Zheng Xin Yong and Harshit Pandey and Rachel Bawden and Thomas Wang and Trishala Neeraj and Jos Rozen and Abheesht Sharma and Andrea Santilli and Thibault Fevry and Jason Alan Fries and Ryan Teehan and Stella Biderman and Leo Gao and Tali Bers and Thomas Wolf and Alexander M. Rush},\n year={2021},\n eprint={2110.08207},\n archivePrefix={arXiv},\n primaryClass={cs.LG}\n}\n```"} {"downloads": 306383, "id": "facebook/m2m100_418M", "likes": 65, "pipeline_tag": "text2text-generation", "task": "text2text-generation", "meta": {"language": ["multilingual", "af", "am", "ar", "ast", "az", "ba", "be", "bg", "bn", "br", "bs", "ca", "ceb", "cs", "cy", "da", "de", "el", "en", "es", "et", "fa", "ff", "fi", "fr", "fy", "ga", "gd", "gl", "gu", "ha", "he", "hi", "hr", "ht", "hu", "hy", "id", "ig", "ilo", "is", "it", "ja", "jv", "ka", "kk", "km", "kn", "ko", "lb", "lg", "ln", "lo", "lt", "lv", "mg", "mk", "ml", "mn", "mr", "ms", "my", "ne", "nl", false, "ns", "oc", "or", "pa", "pl", "ps", "pt", "ro", "ru", "sd", "si", "sk", "sl", "so", "sq", "sr", "ss", "su", "sv", "sw", "ta", "th", "tl", "tn", "tr", "uk", "ur", "uz", "vi", "wo", "xh", "yi", "yo", "zh", "zu"], "license": "mit", "tags": null}, "description": "\n\n# M2M100 418M\n\nM2M100 is a multilingual encoder-decoder (seq-to-seq) model trained for Many-to-Many multilingual translation.\nIt was introduced in this [paper](https://arxiv.org/abs/2010.11125) and first released in [this](https://github.com/pytorch/fairseq/tree/master/examples/m2m_100) repository.\n\nThe model that can directly translate between the 9,900 directions of 100 languages.\nTo translate into a target language, the target language id is forced as the first generated token.\nTo force the target language id as the first generated token, pass the `forced_bos_token_id` parameter to the `generate` method.\n\n*Note: `M2M100Tokenizer` depends on `sentencepiece`, so make sure to install it before running the example.*\n\nTo install `sentencepiece` run `pip install sentencepiece`\n\n\n```python\nfrom transformers import M2M100ForConditionalGeneration, M2M100Tokenizer\n\nhi_text = \"\u091c\u0940\u0935\u0928 \u090f\u0915 \u091a\u0949\u0915\u0932\u0947\u091f \u092c\u0949\u0915\u094d\u0938 \u0915\u0940 \u0924\u0930\u0939 \u0939\u0948\u0964\"\nchinese_text = \"\u751f\u6d3b\u5c31\u50cf\u4e00\u76d2\u5de7\u514b\u529b\u3002\"\n\nmodel = M2M100ForConditionalGeneration.from_pretrained(\"facebook/m2m100_418M\")\ntokenizer = M2M100Tokenizer.from_pretrained(\"facebook/m2m100_418M\")\n\n# translate Hindi to French\ntokenizer.src_lang = \"hi\"\nencoded_hi = tokenizer(hi_text, return_tensors=\"pt\")\ngenerated_tokens = model.generate(**encoded_hi, forced_bos_token_id=tokenizer.get_lang_id(\"fr\"))\ntokenizer.batch_decode(generated_tokens, skip_special_tokens=True)\n# => \"La vie est comme une bo\u00eete de chocolat.\"\n\n# translate Chinese to English\ntokenizer.src_lang = \"zh\"\nencoded_zh = tokenizer(chinese_text, return_tensors=\"pt\")\ngenerated_tokens = model.generate(**encoded_zh, forced_bos_token_id=tokenizer.get_lang_id(\"en\"))\ntokenizer.batch_decode(generated_tokens, skip_special_tokens=True)\n# => \"Life is like a box of chocolate.\"\n```\n\n\nSee the [model hub](https://huggingface.co/models?filter=m2m_100) to look for more fine-tuned versions.\n\n\n## Languages covered\nAfrikaans (af), Amharic (am), Arabic (ar), Asturian (ast), Azerbaijani (az), Bashkir (ba), Belarusian (be), Bulgarian (bg), Bengali (bn), Breton (br), Bosnian (bs), Catalan; Valencian (ca), Cebuano (ceb), Czech (cs), Welsh (cy), Danish (da), German (de), Greeek (el), English (en), Spanish (es), Estonian (et), Persian (fa), Fulah (ff), Finnish (fi), French (fr), Western Frisian (fy), Irish (ga), Gaelic; Scottish Gaelic (gd), Galician (gl), Gujarati (gu), Hausa (ha), Hebrew (he), Hindi (hi), Croatian (hr), Haitian; Haitian Creole (ht), Hungarian (hu), Armenian (hy), Indonesian (id), Igbo (ig), Iloko (ilo), Icelandic (is), Italian (it), Japanese (ja), Javanese (jv), Georgian (ka), Kazakh (kk), Central Khmer (km), Kannada (kn), Korean (ko), Luxembourgish; Letzeburgesch (lb), Ganda (lg), Lingala (ln), Lao (lo), Lithuanian (lt), Latvian (lv), Malagasy (mg), Macedonian (mk), Malayalam (ml), Mongolian (mn), Marathi (mr), Malay (ms), Burmese (my), Nepali (ne), Dutch; Flemish (nl), Norwegian (no), Northern Sotho (ns), Occitan (post 1500) (oc), Oriya (or), Panjabi; Punjabi (pa), Polish (pl), Pushto; Pashto (ps), Portuguese (pt), Romanian; Moldavian; Moldovan (ro), Russian (ru), Sindhi (sd), Sinhala; Sinhalese (si), Slovak (sk), Slovenian (sl), Somali (so), Albanian (sq), Serbian (sr), Swati (ss), Sundanese (su), Swedish (sv), Swahili (sw), Tamil (ta), Thai (th), Tagalog (tl), Tswana (tn), Turkish (tr), Ukrainian (uk), Urdu (ur), Uzbek (uz), Vietnamese (vi), Wolof (wo), Xhosa (xh), Yiddish (yi), Yoruba (yo), Chinese (zh), Zulu (zu)\n\n\n## BibTeX entry and citation info\n```\n@misc{fan2020englishcentric,\n title={Beyond English-Centric Multilingual Machine Translation}, \n author={Angela Fan and Shruti Bhosale and Holger Schwenk and Zhiyi Ma and Ahmed El-Kishky and Siddharth Goyal and Mandeep Baines and Onur Celebi and Guillaume Wenzek and Vishrav Chaudhary and Naman Goyal and Tom Birch and Vitaliy Liptchinsky and Sergey Edunov and Edouard Grave and Michael Auli and Armand Joulin},\n year={2020},\n eprint={2010.11125},\n archivePrefix={arXiv},\n primaryClass={cs.CL}\n}\n```"} {"downloads": 92719, "id": "google/mt5-base", "likes": 58, "pipeline_tag": "text2text-generation", "task": "text2text-generation", "meta": {"language": ["multilingual", "af", "am", "ar", "az", "be", "bg", "bn", "ca", "ceb", "co", "cs", "cy", "da", "de", "el", "en", "eo", "es", "et", "eu", "fa", "fi", "fil", "fr", "fy", "ga", "gd", "gl", "gu", "ha", "haw", "hi", "hmn", "ht", "hu", "hy", "ig", "is", "it", "iw", "ja", "jv", "ka", "kk", "km", "kn", "ko", "ku", "ky", "la", "lb", "lo", "lt", "lv", "mg", "mi", "mk", "ml", "mn", "mr", "ms", "mt", "my", "ne", "nl", false, "ny", "pa", "pl", "ps", "pt", "ro", "ru", "sd", "si", "sk", "sl", "sm", "sn", "so", "sq", "sr", "st", "su", "sv", "sw", "ta", "te", "tg", "th", "tr", "uk", "und", "ur", "uz", "vi", "xh", "yi", "yo", "zh", "zu"], "datasets": ["mc4"], "license": "apache-2.0"}, "description": "\n\n[Google's mT5](https://github.com/google-research/multilingual-t5)\n\nmT5 is pretrained on the [mC4](https://www.tensorflow.org/datasets/catalog/c4#c4multilingual) corpus, covering 101 languages:\n\nAfrikaans, Albanian, Amharic, Arabic, Armenian, Azerbaijani, Basque, Belarusian, Bengali, Bulgarian, Burmese, Catalan, Cebuano, Chichewa, Chinese, Corsican, Czech, Danish, Dutch, English, Esperanto, Estonian, Filipino, Finnish, French, Galician, Georgian, German, Greek, Gujarati, Haitian Creole, Hausa, Hawaiian, Hebrew, Hindi, Hmong, Hungarian, Icelandic, Igbo, Indonesian, Irish, Italian, Japanese, Javanese, Kannada, Kazakh, Khmer, Korean, Kurdish, Kyrgyz, Lao, Latin, Latvian, Lithuanian, Luxembourgish, Macedonian, Malagasy, Malay, Malayalam, Maltese, Maori, Marathi, Mongolian, Nepali, Norwegian, Pashto, Persian, Polish, Portuguese, Punjabi, Romanian, Russian, Samoan, Scottish Gaelic, Serbian, Shona, Sindhi, Sinhala, Slovak, Slovenian, Somali, Sotho, Spanish, Sundanese, Swahili, Swedish, Tajik, Tamil, Telugu, Thai, Turkish, Ukrainian, Urdu, Uzbek, Vietnamese, Welsh, West Frisian, Xhosa, Yiddish, Yoruba, Zulu.\n\n**Note**: mT5 was only pre-trained on mC4 excluding any supervised training. Therefore, this model has to be fine-tuned before it is useable on a downstream task.\n\nPretraining Dataset: [mC4](https://www.tensorflow.org/datasets/catalog/c4#c4multilingual)\n\nOther Community Checkpoints: [here](https://huggingface.co/models?search=mt5)\n\nPaper: [mT5: A massively multilingual pre-trained text-to-text transformer](https://arxiv.org/abs/2010.11934)\n\nAuthors: *Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel* \n\n\n## Abstract\n\nThe recent \"Text-to-Text Transfer Transformer\" (T5) leveraged a unified text-to-text format and scale to attain state-of-the-art results on a wide variety of English-language NLP tasks. In this paper, we introduce mT5, a multilingual variant of T5 that was pre-trained on a new Common Crawl-based dataset covering 101 languages. We describe the design and modified training of mT5 and demonstrate its state-of-the-art performance on many multilingual benchmarks. All of the code and model checkpoints used in this work are publicly available."} {"downloads": 4020, "id": "sander-wood/text-to-music", "likes": 57, "pipeline_tag": "text2text-generation", "task": "text2text-generation", "meta": {"license": "mit", "language": "en", "widget": [{"text": "This is a traditional Irish dance music."}], "inference": {"parameters": {"top_p": 0.9, "max_length": 1024, "do_sample": true}}}, "description": "\n# Exploring the Efficacy of Pre-trained Checkpoints in Text-to-Music Generation Task\n\n## Model description\n\nThis language-music model takes [BART-base](https://huggingface.co/facebook/bart-base) fine-tunes on 282,870 English text-music pairs, where all scores are represented in ABC notation. It was introduced in the paper [Exploring the Efficacy of Pre-trained Checkpoints in Text-to-Music Generation Task](https://arxiv.org/abs/2211.11216) by Wu et al. and released in [this repository](https://github.com/sander-wood/text-to-music). \n\nIt is capable of generating complete and semantically consistent sheet music directly from descriptions in natural language based on text. To the best of our knowledge, this is the first model that achieves text-conditional symbolic music generation which is trained on real text-music pairs, and the music is generated entirely by the model and without any hand-crafted rules.\n\n## Intended uses & limitations\n\nYou can use this model for text-conditional music generation. All scores generated by this model can be written on one stave (for vocal solo or instrumental solo) in standard classical notation, and are in a variety of styles, e.g., blues, classical, folk, jazz, pop, and world music. We recommend using the script in [this repository](https://github.com/sander-wood/text-to-music) for inference. The generated tunes are in ABC notation, and can be converted to sheet music or audio using [this website](https://ldzhangyx.github.io/abc/), or [this software](https://sourceforge.net/projects/easyabc/).\n\nIts creativity is limited, can not perform well on tasks requiring a high degree of creativity (e.g., melody style transfer), and it is input-sensitive. For more information, please check [our paper](https://arxiv.org/abs/2211.11216).\n\n### How to use\n\nHere is how to use this model in PyTorch:\n\n```python\nimport torch\nfrom samplings import top_p_sampling, temperature_sampling\nfrom transformers import AutoTokenizer, AutoModelForSeq2SeqLM\n\ntokenizer = AutoTokenizer.from_pretrained('sander-wood/text-to-music')\nmodel = AutoModelForSeq2SeqLM.from_pretrained('sander-wood/text-to-music')\nmodel = model\n\nmax_length = 1024\ntop_p = 0.9\ntemperature = 1.0\n\ntext = \"This is a traditional Irish dance music.\"\ninput_ids = tokenizer(text, \n return_tensors='pt', \n truncation=True, \n max_length=max_length)['input_ids']\n\ndecoder_start_token_id = model.config.decoder_start_token_id\neos_token_id = model.config.eos_token_id\n\ndecoder_input_ids = torch.tensor([[decoder_start_token_id]])\n\nfor t_idx in range(max_length):\n outputs = model(input_ids=input_ids, \n decoder_input_ids=decoder_input_ids)\n probs = outputs.logits[0][-1]\n probs = torch.nn.Softmax(dim=-1)(probs).detach().numpy()\n sampled_id = temperature_sampling(probs=top_p_sampling(probs, \n top_p=top_p, \n return_probs=True),\n temperature=temperature)\n decoder_input_ids = torch.cat((decoder_input_ids, torch.tensor([[sampled_id]])), 1)\n if sampled_id!=eos_token_id:\n continue\n else:\n tune = \"X:1\\n\"\n tune += tokenizer.decode(decoder_input_ids[0], skip_special_tokens=True)\n print(tune)\n break\n```\n\n### Generation Examples\nHere are some examples generated by this model without cherry-picking.\n```\n######################## INPUT TEXT ########################\n\nThis is a traditional Irish dance music.\nNote Length-1/8\nMeter-6/8\nKey-D\n\n####################### OUTPUT TUNES #######################\n\nX:1\nL:1/8\nM:6/8\nK:D\n A | BEE BEE | Bdf edB | BAF FEF | DFA BAF | BEE BEE | Bdf edB | BAF DAF | FED E2 :: A |\n Bef gfe | faf edB | BAF FEF | DFA BAF | Bef gfe | faf edB | BAF DAF | FED E2 :|\n\nX:2\nL:1/8\nM:6/8\nK:D\n A |: DED F2 A | d2 f ecA | G2 B F2 A | E2 F GFE | DED F2 A | d2 f ecA | Bgf edc |1 d3 d2 A :|2\n d3 d2 a || a2 f d2 e | f2 g agf | g2 e c2 d | e2 f gfe | fed gfe | agf bag | fed cde | d3 d2 a |\n agf fed | Adf agf | gfe ecA | Ace gfe | fed gfe | agf bag | fed cde | d3 d2 ||\n\nX:3\nL:1/8\nM:6/8\nK:D\n BEE BEE | Bdf edB | BAF FEF | DFA dBA | BEE BEE | Bdf edB | BAF FEF |1 DED DFA :|2 DED D2 e |:\n faf edB | BAF DFA | BAF FEF | DFA dBA | faf edB | BAF DFA | BdB AFA |1 DED D2 e :|2 DED DFA ||\n```\n\n```\n######################## INPUT TEXT ########################\n\nThis is a jazz-swing lead sheet with chord and vocal.\n\n####################### OUTPUT TUNES #######################\n\nX:1\nL:1/8\nM:4/4\nK:F\n\"F\" CFG |\"F\" A6 z G |\"Fm7\" A3 G\"Bb7\" A3 G |\"F\" A6 z G |\"F7\" A4\"Eb7\" G4 |\"F\" F6 z F |\n\"Dm\" A3 G\"Dm/C\" A3 G |\"Bb\" A2\"Gm\" B2\"C7\" G3 G |\"F\" F8- |\"Dm7\"\"G7\" F6 z2 |\"C\" C4 C3 C |\n\"C7\" C2 B,2\"F\" C4 |\"F\" C4 C3 C |\"Dm\" D2 C2\"Dm/C\" D4 |\"Bb\" D4 D3 D |\"Bb\" D2 C2\"C7\" D4 |\"F\" C8- |\n\"F\" C4\"Gm\" z C\"C7\" FG |\"F\" A6 z G |\"Fm7\" A3 G\"Bb7\" A3 G |\"F\" A6 z G |\"F7\" A4\"Eb7\" G4 |\"F\" F6 z F |\n\"Dm\" A3 G\"Dm/C\" A3 G |\"Bb\" A2\"Gm\" B2\"C7\" G3 G |\"F\" F8- |\"F\" F6 z2 |]\n\nX:2\nL:1/4\nM:4/4\nK:F\n\"^A\"\"F\" A3 A |\"Am7\" A2\"D7\" A2 |\"Gm7\" G2\"C7\" G A |\"F\" F4 |\"F\" A3 A |\"Am7\" A2\"D7\" A2 |\"Gm7\" G2\"C7\" G A |\n\"F\" F4 |\"Gm\" B3 B |\"Am7\" B2\"D7\" B2 |\"Gm\" B2\"D7\" B A |\"Gm7\" G4 |\"F\" A3 A |\"Am7\" A2\"D7\" A2 |\n\"Gm7\" G2\"C7\" G A |\"F\" F4 |\"Bb7\" F3 G |\"F\" A2 A2 |\"Gm\" B2\"C7\" B2 |\"F\" c2\"D7\" c c |\"Gm7\" c2\"C7\" B2 |\n\"F\" A2\"F7\" A2 |\"Bb\" B2\"F\" B A |\"Bb\" B2\"F\" B A |\"Gm\" B2\"F\" B A |\"Gm7\" B2\"F\" B A |\"Gm7\" B2\"F\" B A |\n\"C7\" B2 c2 |\"F\"\"Bb7\" A4 |\"F\"\"Bb7\" z4 |]\n\nX:3\nL:1/4\nM:4/4\nK:Bb\n B, ||\"Gm\"\"^A1\" G,2 B, D |\"D7\" ^F A2 G/=F/ |\"Gm\" G2\"Cm7\" B c |\"F7\" A2 G =F |\"Bb\" D2 F A |\n\"Cm7\" c e2 d/c/ |\"Gm7\" B3/2 G/-\"C7\" G2- |\"F7\" G2 z B, |\"Gm\"\"^B\" G,2 B, D |\"D7\" ^F A2 G/=F/ |\n\"Gm\" G2\"Cm7\" B c |\"F7\" A2 G =F |\"Bb\" D2 F A |\"Cm7\" c e2 d/c/ |\"Gm7\" B3/2 G/-\"C7\" G2- |\"F7\" G2 z2 ||\n\"^C\"\"F7\"\"^A2\" F4- | F E D C |\"Bb\" D2 F B | d3 c/B/ |\"F\" A2\"Cm7\" G2 |\"D7\" ^F2 G2 |\"Gm\" B3\"C7\" A |\n\"F7\" G4 ||\"F7\"\"^A3\" F4- | F E D C |\"Bb\" D2 F B | d3 c/B/ |\"F\" A2\"Cm7\" G2 |\"D7\" ^F2 G2 |\"Gm\" B3 A |\n\"C7\" G4 ||\"^B\"\"Gm\"\"^C\" B2 c B |\"Cm\" c B c B |\"Gm7\" c2 B A |\"C7\" B3 A |\"Bb\" B2 c B |\"G7\" d c B A |\n\"Cm\" G2 A G |\"F7\" F2 z G ||\"^C\"\"F7\" F F3 |\"Bb\" D D3 |\"Cm\" E E3 |\"D7\" ^F F3 |\"Gm\" G2 A B |\"C7\" d3 d |\n\"Gm\" d3 d |\"D7\" d3 B, ||\"^D\"\"Gm\" G,2 B, D |\"D7\" ^F A2 G/=F/ |\"Gm\" G2\"Cm7\" B c |\"F7\" A2 G =F |\n\"Bb\" D2 F A |\"Cm7\" c e2 d/c/ |\"Gm7\" B3/2 G/-\"C7\" G2- |\"F7\" G2 z2 |]\n```\n\n```\n######################## INPUT TEXT ########################\n\nThis is a Chinese folk song from the Jiangnan region. It was created during the Qianlong era (1735-1796) of the Qing dynasty. Over time, many regional variations were created, and the song gained popularity both in China and abroad. One version of the song describes a custom of giving jasmine flowers, popular in the southern Yangtze delta region of China.\n\n####################### OUTPUT TUNES #######################\n\nX:1\nL:1/8\nQ:1/4=100\nM:2/4\nK:C\n\"^Slow\" DA A2 | GA c2- | c2 G2 | c2 GF | GA/G/ F2 | E2 DC | DA A2 | GA c2- | c2 GA | cd- d2 |\n cA c2- | c2 GA | cd- d2 | cA c2- | c2 GA | c2 A2 | c2 d2 | cA c2- | c2 c2 | A2 G2 | F2 AG | F2 ED |\n CA,/C/ D2- | D2 CD | F2 A2 | G2 ED | CG A2 | G2 FD | CA,/C/ D2- | D2 CD | F2 A2 | G2 ED |\n CG A2 | G2 FD | CA,/C/ D2- | D2 z2 :|\n\nX:2\nL:1/8\nQ:1/4=100\nM:2/4\nK:C\n\"^ MDolce\" Ac de | d2 AG | cA cd | A2 AG | E2 ED | CD E2- | E2 z2 | EG ed | c2 AG | cA cd |\n A2 AG | E2 ED | CD E2- | E2 z2 |\"^ howeveroda\" Ac de | d2 AG | cA cd | A2 AG | E2 ED | CD E2- |\n E2 z2 | A2 cA | GA E2- | E2 z2 | GA cd | e2 ed | cd e2- | e2 z2 | ge d2 | cd c2- | c2 z2 |\n Ac de | d2 AG | cA cd | A2 AG | E2 ED | CD E2- | E2 z2 | EG ed | c2 AG | cA cd | A2 AG | E2 ED |\n CD E2- | E2 z2 |\"^DDtisata\" Ac de | d2 AG | cA cd | A2 AG | E2 ED | CD E2- | E2 z2 | A2 cA |\n GA E2- | E2 z2 | GA cd | e2 ed | cd e2- | e2 z2 | ge d2 | cd c2- | c2 z2 | Ac de | d2 AG |\n cA cd | A2 AG | E2 ED | CD E2- | E2 z2 | Ac de | d2 AG | cA cd | A2 AG | E2 ED | CD E2- | E2 z2 |\n Ac de | d2 AG | cA cd | A2 AG | E2 ED | CD E2- | E2 z2 |\"^ Easy\" Ac de | d2 AG | cA cd |\n A2 AG | E2 ED | CD E2- | E2 z2 | Ac de | d2 AG | cA cd | A2 AG | E2 ED | CD E2- | E2 z2 |]\n\nX:3\nL:1/8\nQ:1/4=60\nM:4/4\nK:C\n\"^S books defe..\" AA A2 cdcc | AcAG A4- | A8 | A,4 CD C2 | A,4 cdcA | A2 GA- A4- | A2 GA A2 AA |\n AG E2 D2 C2 | D6 ED | C2 D4 C2 | D2 C2 D4 | C2 A,2 CD C2 | A,4 cdcA | A2 GA- A4- | A2 GA A2 AA |\n AG E2 D2 C2 | D6 z2 |]\n```\n\n### BibTeX entry and citation info\n\n```bibtex\n@inproceedings{\nwu2023exploring,\ntitle={Exploring the Efficacy of Pre-trained Checkpoints in Text-to-Music Generation Task}, \nauthor={Shangda Wu and Maosong Sun},\nbooktitle={The AAAI-23 Workshop on Creative AI Across Modalities},\nyear={2023},\nurl={https://openreview.net/forum?id=QmWXskBhesn}\n}\n```"} {"downloads": 40225, "id": "Babelscape/rebel-large", "likes": 54, "pipeline_tag": "text2text-generation", "task": "text2text-generation", "meta": {"language": ["en"], "widget": [{"text": "Punta Cana is a resort town in the municipality of Higuey, in La Altagracia Province, the eastern most province of the Dominican Republic"}], "tags": ["seq2seq", "relation-extraction"], "datasets": ["Babelscape/rebel-dataset"], "model-index": [{"name": "REBEL", "results": [{"task": {"name": "Relation Extraction", "type": "Relation-Extraction"}, "dataset": {"name": "CoNLL04", "type": "CoNLL04"}, "metrics": [{"name": "RE+ Macro F1", "type": "re+ macro f1", "value": 76.65}]}, {"task": {"name": "Relation Extraction", "type": "Relation-Extraction"}, "dataset": {"name": "NYT", "type": "NYT"}, "metrics": [{"name": "F1", "type": "f1", "value": 93.4}]}]}], "license": "cc-by-nc-sa-4.0"}, "description": "\n[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/rebel-relation-extraction-by-end-to-end/relation-extraction-on-nyt)](https://paperswithcode.com/sota/relation-extraction-on-nyt?p=rebel-relation-extraction-by-end-to-end)\n[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/rebel-relation-extraction-by-end-to-end/relation-extraction-on-conll04)](https://paperswithcode.com/sota/relation-extraction-on-conll04?p=rebel-relation-extraction-by-end-to-end)\n[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/rebel-relation-extraction-by-end-to-end/joint-entity-and-relation-extraction-on-3)](https://paperswithcode.com/sota/joint-entity-and-relation-extraction-on-3?p=rebel-relation-extraction-by-end-to-end)\n[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/rebel-relation-extraction-by-end-to-end/relation-extraction-on-ade-corpus)](https://paperswithcode.com/sota/relation-extraction-on-ade-corpus?p=rebel-relation-extraction-by-end-to-end)\n[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/rebel-relation-extraction-by-end-to-end/relation-extraction-on-re-tacred)](https://paperswithcode.com/sota/relation-extraction-on-re-tacred?p=rebel-relation-extraction-by-end-to-end)\n# REBEL \"hf-rebel\": Relation Extraction By End-to-end Language generation\nThis is the model card for the Findings of EMNLP 2021 paper [REBEL: Relation Extraction By End-to-end Language generation](https://github.com/Babelscape/rebel/blob/main/docs/EMNLP_2021_REBEL__Camera_Ready_.pdf). We present a new linearization approach and a reframing of Relation Extraction as a seq2seq task. The paper can be found [here](https://github.com/Babelscape/rebel/blob/main/docs/EMNLP_2021_REBEL__Camera_Ready_.pdf). If you use the code, please reference this work in your paper:\n\n @inproceedings{huguet-cabot-navigli-2021-rebel-relation,\n title = \"{REBEL}: Relation Extraction By End-to-end Language generation\",\n author = \"Huguet Cabot, Pere-Llu{\\'\\i}s and\n Navigli, Roberto\",\n booktitle = \"Findings of the Association for Computational Linguistics: EMNLP 2021\",\n month = nov,\n year = \"2021\",\n address = \"Punta Cana, Dominican Republic\",\n publisher = \"Association for Computational Linguistics\",\n url = \"https://aclanthology.org/2021.findings-emnlp.204\",\n pages = \"2370--2381\",\n abstract = \"Extracting relation triplets from raw text is a crucial task in Information Extraction, enabling multiple applications such as populating or validating knowledge bases, factchecking, and other downstream tasks. However, it usually involves multiple-step pipelines that propagate errors or are limited to a small number of relation types. To overcome these issues, we propose the use of autoregressive seq2seq models. Such models have previously been shown to perform well not only in language generation, but also in NLU tasks such as Entity Linking, thanks to their framing as seq2seq tasks. In this paper, we show how Relation Extraction can be simplified by expressing triplets as a sequence of text and we present REBEL, a seq2seq model based on BART that performs end-to-end relation extraction for more than 200 different relation types. We show our model{'}s flexibility by fine-tuning it on an array of Relation Extraction and Relation Classification benchmarks, with it attaining state-of-the-art performance in most of them.\",\n }\n\nThe original repository for the paper can be found [here](https://github.com/Babelscape/rebel)\n\nBe aware that the inference widget at the right does not output special tokens, which are necessary to distinguish the subject, object and relation types. For a demo of REBEL and its pre-training dataset check the [Spaces demo](https://huggingface.co/spaces/Babelscape/rebel-demo).\n\n## Pipeline usage\n\n```python\nfrom transformers import pipeline\n\ntriplet_extractor = pipeline('text2text-generation', model='Babelscape/rebel-large', tokenizer='Babelscape/rebel-large')\n# We need to use the tokenizer manually since we need special tokens.\nextracted_text = triplet_extractor.tokenizer.batch_decode([triplet_extractor(\"Punta Cana is a resort town in the municipality of Higuey, in La Altagracia Province, the eastern most province of the Dominican Republic\", return_tensors=True, return_text=False)[0][\"generated_token_ids\"]])\nprint(extracted_text[0])\n# Function to parse the generated text and extract the triplets\ndef extract_triplets(text):\n triplets = []\n relation, subject, relation, object_ = '', '', '', ''\n text = text.strip()\n current = 'x'\n for token in text.replace(\"\", \"\").replace(\"\", \"\").replace(\"\", \"\").split():\n if token == \"\":\n current = 't'\n if relation != '':\n triplets.append({'head': subject.strip(), 'type': relation.strip(),'tail': object_.strip()})\n relation = ''\n subject = ''\n elif token == \"\":\n current = 's'\n if relation != '':\n triplets.append({'head': subject.strip(), 'type': relation.strip(),'tail': object_.strip()})\n object_ = ''\n elif token == \"\":\n current = 'o'\n relation = ''\n else:\n if current == 't':\n subject += ' ' + token\n elif current == 's':\n object_ += ' ' + token\n elif current == 'o':\n relation += ' ' + token\n if subject != '' and relation != '' and object_ != '':\n triplets.append({'head': subject.strip(), 'type': relation.strip(),'tail': object_.strip()})\n return triplets\nextracted_triplets = extract_triplets(extracted_text[0])\nprint(extracted_triplets)\n```\n\n## Model and Tokenizer using transformers\n\n```python\nfrom transformers import AutoModelForSeq2SeqLM, AutoTokenizer\n\ndef extract_triplets(text):\n triplets = []\n relation, subject, relation, object_ = '', '', '', ''\n text = text.strip()\n current = 'x'\n for token in text.replace(\"\", \"\").replace(\"\", \"\").replace(\"\", \"\").split():\n if token == \"\":\n current = 't'\n if relation != '':\n triplets.append({'head': subject.strip(), 'type': relation.strip(),'tail': object_.strip()})\n relation = ''\n subject = ''\n elif token == \"\":\n current = 's'\n if relation != '':\n triplets.append({'head': subject.strip(), 'type': relation.strip(),'tail': object_.strip()})\n object_ = ''\n elif token == \"\":\n current = 'o'\n relation = ''\n else:\n if current == 't':\n subject += ' ' + token\n elif current == 's':\n object_ += ' ' + token\n elif current == 'o':\n relation += ' ' + token\n if subject != '' and relation != '' and object_ != '':\n triplets.append({'head': subject.strip(), 'type': relation.strip(),'tail': object_.strip()})\n return triplets\n\n# Load model and tokenizer\ntokenizer = AutoTokenizer.from_pretrained(\"Babelscape/rebel-large\")\nmodel = AutoModelForSeq2SeqLM.from_pretrained(\"Babelscape/rebel-large\")\ngen_kwargs = {\n \"max_length\": 256,\n \"length_penalty\": 0,\n \"num_beams\": 3,\n \"num_return_sequences\": 3,\n}\n\n# Text to extract triplets from\ntext = 'Punta Cana is a resort town in the municipality of Hig\u00fcey, in La Altagracia Province, the easternmost province of the Dominican Republic.'\n\n# Tokenizer text\nmodel_inputs = tokenizer(text, max_length=256, padding=True, truncation=True, return_tensors = 'pt')\n\n# Generate\ngenerated_tokens = model.generate(\n model_inputs[\"input_ids\"].to(model.device),\n attention_mask=model_inputs[\"attention_mask\"].to(model.device),\n **gen_kwargs,\n)\n\n# Extract text\ndecoded_preds = tokenizer.batch_decode(generated_tokens, skip_special_tokens=False)\n\n# Extract triplets\nfor idx, sentence in enumerate(decoded_preds):\n print(f'Prediction triplets sentence {idx}')\n print(extract_triplets(sentence))\n```"} {"downloads": 150528, "id": "Salesforce/codet5-base", "likes": 53, "pipeline_tag": "text2text-generation", "task": "text2text-generation", "meta": {"license": "apache-2.0", "tags": ["codet5"], "datasets": ["code_search_net"], "inference": false}, "description": "\n\n# CodeT5 (base-sized model) \n\nPre-trained CodeT5 model. It was introduced in the paper [CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models\nfor Code Understanding and Generation](https://arxiv.org/abs/2109.00859) by Yue Wang, Weishi Wang, Shafiq Joty, Steven C.H. Hoi and first released in [this repository](https://github.com/salesforce/CodeT5). \n\nDisclaimer: The team releasing CodeT5 did not write a model card for this model so this model card has been written by the Hugging Face team (more specifically, [nielsr](https://huggingface.co/nielsr)).\n\n## Model description\n\nFrom the abstract:\n\n\"We present CodeT5, a unified pre-trained encoder-decoder Transformer model that better leverages the code semantics conveyed from the developer-assigned identifiers. Our model employs a unified framework to seamlessly support both code understanding and generation tasks and allows for multi-task learning. Besides, we propose a novel identifier-aware pre-training task that enables the model to distinguish which code tokens are identifiers and to recover them when they are masked. Furthermore, we propose to exploit the user-written code comments with a bimodal dual generation task for better NL-PL alignment. Comprehensive experiments show that CodeT5 significantly outperforms prior methods on understanding tasks such as code defect detection and clone detection, and generation tasks across various directions including PL-NL, NL-PL, and PL-PL. Further analysis reveals that our model can better capture semantic information from code.\"\n\n## Intended uses & limitations\n\nThis repository contains the pre-trained model only, so you can use this model for (among other tasks) masked span prediction, as shown in the code example below. However, the main use of this model is to fine-tune it for a downstream task of interest, such as:\n* code summarization\n* code generation\n* code translation\n* code refinement\n* code defect detection\n* code clone detection. \n\nSupervised datasets for code can be found [here](https://huggingface.co/datasets?languages=languages:code).\nSee the [model hub](https://huggingface.co/models?search=salesforce/codet) to look for fine-tuned versions on a task that interests you.\n\n### How to use\n\nHere is how to use this model:\n\n```python\nfrom transformers import RobertaTokenizer, T5ForConditionalGeneration\n\ntokenizer = RobertaTokenizer.from_pretrained('Salesforce/codet5-base')\nmodel = T5ForConditionalGeneration.from_pretrained('Salesforce/codet5-base')\n\ntext = \"def greet(user): print(f'hello !')\"\ninput_ids = tokenizer(text, return_tensors=\"pt\").input_ids\n\n# simply generate a single sequence\ngenerated_ids = model.generate(input_ids, max_length=8)\nprint(tokenizer.decode(generated_ids[0], skip_special_tokens=True))\n# this prints \"{user.username}\"\n```\n\n## Training data\n\nThe CodeT5 model was pretrained on CodeSearchNet [Husain et al., 2019](https://arxiv.org/abs/1909.09436). Additionally, the authors collected two datasets of C/CSharp from [BigQuery1](https://console.cloud.google.com/marketplace/details/github/github-repos) to ensure that all downstream tasks have overlapped programming languages with the pre-training data. In total, around 8.35 million instances are used for pretraining. \n\n## Training procedure\n\n### Preprocessing\n\nThis model uses a code-specific BPE (Byte-Pair Encoding) tokenizer trained using the [HuggingFace Tokenizers](https://github.com/huggingface/tokenizers) library. One can prepare text (or code) for the model using RobertaTokenizer, with the files from this repository.\n\n## Evaluation results\n\nFor evaluation results on several downstream benchmarks, we refer to the paper.\n\n### BibTeX entry and citation info\n\n```bibtex\n@misc{wang2021codet5,\n title={CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation}, \n author={Yue Wang and Weishi Wang and Shafiq Joty and Steven C. H. Hoi},\n year={2021},\n eprint={2109.00859},\n archivePrefix={arXiv},\n primaryClass={cs.CL}\n}\n```"} {"downloads": 60677, "id": "vennify/t5-base-grammar-correction", "likes": 52, "pipeline_tag": "text2text-generation", "task": "text2text-generation", "meta": {"language": "en", "tags": ["grammar", "text2text-generation"], "license": "cc-by-nc-sa-4.0", "datasets": ["jfleg"]}, "description": "\n\n# T5 Grammar Correction \n\nThis model generates a revised version of inputted text with the goal of containing fewer grammatical errors. \nIt was trained with [Happy Transformer](https://github.com/EricFillion/happy-transformer)\nusing a dataset called [JFLEG](https://arxiv.org/abs/1702.04066). Here's a [full article](https://www.vennify.ai/fine-tune-grammar-correction/) on how to train a similar model. \n\n\n## Usage \n\n`pip install happytransformer `\n\n```python\nfrom happytransformer import HappyTextToText, TTSettings\n\nhappy_tt = HappyTextToText(\"T5\", \"vennify/t5-base-grammar-correction\")\n\nargs = TTSettings(num_beams=5, min_length=1)\n\n# Add the prefix \"grammar: \" before each input \nresult = happy_tt.generate_text(\"grammar: This sentences has has bads grammar.\", args=args)\n\nprint(result.text) # This sentence has bad grammar.\n\n\n```"} {"downloads": 5410, "id": "ClueAI/ChatYuan-large-v2", "likes": 52, "pipeline_tag": "text2text-generation", "task": "text2text-generation", "meta": {"inference": {"parameters": {"max_length": 250, "temperature": 0.7, "top_p": 1}}, "widget": [{"text": "\u7528\u6237\uff1a\u5e2e\u6211\u5199\u4e00\u4e2a\u82f1\u6587\u8425\u9500\u65b9\u6848\uff0c\u9488\u5bf9iphone\\n\u5c0f\u5143\uff1a"}, {"text": "\u7528\u6237\uff1a\u5728\u4ed6\u4eec\u653e\u5f03\u8ffd\u8ba8\u4fe1\u7528\u5361\u8d26\u5355\u4e4b\u524d\uff0c\u6211\u53ef\u4ee5\u62d6\u6b20\u591a\u4e45\uff1f\\n\u5c0f\u5143\uff1a"}, {"text": "\u7528\u6237\uff1a\u5e2e\u6211\u7528\u82f1\u8bed\u5199\u4e00\u5c01\u6c42\u804c\u4fe1\uff0c\u6211\u60f3\u627e\u4e00\u4efd\u6df1\u5ea6\u5b66\u4e60\u5de5\u7a0b\u5e08\u7684\u5de5\u4f5c\\n\u5c0f\u5143\uff1a"}, {"text": "\u7528\u6237\uff1a\u5e2e\u6211\u53cc\u4e24\u4e2a\u6570\u4e4b\u548c\uff0c54+109\\n\u5c0f\u5143\uff1a"}, {"text": "\u7528\u6237\uff1a\u6a21\u62df\u5c0f\u674e\u548c\u5c0f\u738b\u5173\u4e8e\u901a\u7528\u4eba\u5de5\u667a\u80fd\u7684\u6f5c\u529b\u548c\u95ee\u9898\u7684\u5bf9\u8bdd\uff0c\u8981\u6c42\u5148\u6765\u4e00\u4e2a\u5f00\u573a\u767d\uff0c\u7136\u540e\u53cc\u65b9\u5c55\u5f00\u8ba8\u8bba\\n\u5c0f\u5143\uff1a"}, {"text": "\u7528\u6237\uff1a\u5e2e\u6211\u751f\u6210\u4e0b\u9762\u53e5\u5b50\u76845\u4e2a\u76f8\u4f3c\u53e5\u5b50\uff0c\u201clinux\u4e91\u4e3b\u673a\u4e2d\u4e86\u6316\u77ff\u75c5\u6bd2\u600e\u4e48\u529e\u201d\\n\u5c0f\u5143\uff1a"}, {"text": "\u7528\u6237\uff1a\u4f60\u597d\\n\u5c0f\u5143\uff1a\u6211\u662fChatYuan\u6a21\u578b\uff0c\u5f88\u9ad8\u5174\u4e3a\u4f60\u670d\u52a1\u3002\\n\u7528\u6237\uff1a\u8bf7\u4ecb\u7ecd\u4e00\u4e0b\u4f60\u81ea\u5df1\u5427\uff1f\\n\u5c0f\u5143\uff1a"}], "language": ["en", "zh"]}, "description": "\n\n\nChatYuan-large-v2\u662f\u4e00\u4e2a\u652f\u6301\u4e2d\u82f1\u53cc\u8bed\u7684\u529f\u80fd\u578b\u5bf9\u8bdd\u8bed\u8a00\u5927\u6a21\u578b\u3002v2\u4f7f\u7528\u4e86\u548c v1\u7248\u672c\u76f8\u540c\u7684\u6280\u672f\u65b9\u6848\uff0c\u5728\u6307\u4ee4\u5fae\u8c03\u3001\u4eba\u7c7b\u53cd\u9988\u5f3a\u5316\u5b66\u4e60\u3001\u601d\u7ef4\u94fe\u7b49\u65b9\u9762\u8fdb\u884c\u4e86\u4f18\u5316\u3002\n\nChatYuan-large-v2 is a functional dialogue language model that supports bilingual Chinese and English. \nChatYuan-large-v2 uses the same technical solution as the v1 version, and has been optimized in terms of instruct-tuning, human feedback reinforcement learning and chain-of-thought.\n\n\u5728\u7ebfDemo   | \n \u4f7f\u7528API(large\u7248)   | \n   Github\u9879\u76ee\u5730\u5740  |\n  Colab\u5728\u7ebf\u8bd5\u7528   |\n  \u6587\u7ae0\u4ecb\u7ecd \n\n\nChatYuan-large-v2\u662fChatYuan\u7cfb\u5217\u4e2d\u4ee5\u8f7b\u91cf\u5316\u5b9e\u73b0\u9ad8\u8d28\u91cf\u6548\u679c\u7684\u6a21\u578b\u4e4b\u4e00\uff0c\u7528\u6237\u53ef\u4ee5\u5728\u6d88\u8d39\u7ea7\u663e\u5361(6G)\u3001 PC\u751a\u81f3\u624b\u673a\u4e0a\u8fdb\u884c\u63a8\u7406\uff08INT4 \u6700\u4f4e\u53ea\u9700 400M \uff09\u3002\n\n\u5728chatyuan-large-v1\u7684\u539f\u6709\u529f\u80fd\u7684\u57fa\u7840\u4e0a\uff0c\u6211\u4eec\u7ed9\u6a21\u578b\u8fdb\u884c\u4e86\u5982\u4e0b\u4f18\u5316\uff1a\n- \u589e\u5f3a\u4e86\u57fa\u7840\u80fd\u529b\u3002\u539f\u6709\u4e0a\u4e0b\u6587\u95ee\u7b54\u3001\u521b\u610f\u6027\u5199\u4f5c\u80fd\u529b\u660e\u663e\u63d0\u5347\u3002\n- \u65b0\u589e\u4e86\u62d2\u7b54\u80fd\u529b\u3002\u5bf9\u4e8e\u4e00\u4e9b\u5371\u9669\u3001\u6709\u5bb3\u7684\u95ee\u9898\uff0c\u5b66\u4f1a\u4e86\u62d2\u7b54\u5904\u7406\u3002\n- \u65b0\u589e\u4e86\u4ee3\u7801\u751f\u6210\u529f\u80fd\u3002\u5bf9\u4e8e\u57fa\u7840\u4ee3\u7801\u751f\u6210\u8fdb\u884c\u4e86\u4e00\u5b9a\u7a0b\u5ea6\u4f18\u5316\u3002\n- \u65b0\u589e\u4e86\u8868\u683c\u751f\u6210\u529f\u80fd\u3002\u4f7f\u751f\u6210\u7684\u8868\u683c\u5185\u5bb9\u548c\u683c\u5f0f\u66f4\u9002\u914d\u3002\n- \u589e\u5f3a\u4e86\u57fa\u7840\u6570\u5b66\u8fd0\u7b97\u80fd\u529b\u3002\n- \u6700\u5927\u957f\u5ea6token\u6570\u4ece1024\u6269\u5c55\u52304096\u3002\n- \u589e\u5f3a\u4e86\u6a21\u62df\u60c5\u666f\u80fd\u529b\u3002\n- \u65b0\u589e\u4e86\u4e2d\u82f1\u53cc\u8bed\u5bf9\u8bdd\u80fd\u529b\u3002\n\nBased on the original functions of Chatyuan-large-v1, we optimized the model as follows:\n-Added the ability to speak in both Chinese and English.\n\n-Added the ability to refuse to answer. Learn to refuse to answer some dangerous and harmful questions.\n\n-Added code generation functionality. Basic code generation has been optimized to a certain extent.\n\n-Enhanced basic capabilities. The original contextual Q&A and creative writing skills have significantly improved.\n\n-Added a table generation function. Make the generated table content and format more appropriate.\n\n-Enhanced basic mathematical computing capabilities.\n\n-The maximum number of length tokens has been expanded to 4096.\n\n-Enhanced ability to simulate scenarios< br>\n# \u58f0\u660e\n\u6587\u672c\u7531\u6a21\u578b\u751f\u6210\u7684\u7ed3\u679c, \u8bf7\u8c28\u614e\u8fa8\u522b\u548c\u53c2\u8003, \u4e0d\u4ee3\u8868\u4efb\u4f55\u4eba\u89c2\u70b9\n\n\n\u8bf7\u5728\u6cd5\u5f8b\u5141\u8bb8\u7684\u8303\u56f4\u5185\u4f7f\u7528\uff0c\u8be6\u89c1[LICENSE](./LICENSE)\n\nPromptCLUE-large\u57281000\u4ebftoken\u4e2d\u6587\u8bed\u6599\u4e0a\u9884\u8bad\u7ec3\uff0c\u7d2f\u8ba1\u5b66\u4e601.5\u4e07\u4ebf\u4e2d\u6587token\uff0c\u5e76\u4e14\u5728\u6570\u767e\u79cd\u4efb\u52a1\u4e0a\u8fdb\u884cPrompt\u4efb\u52a1\u5f0f\u8bad\u7ec3\u3002\u9488\u5bf9\u7406\u89e3\u7c7b\u4efb\u52a1\uff0c\u5982\u5206\u7c7b\u3001\u60c5\u611f\u5206\u6790\u3001\u62bd\u53d6\u7b49\uff0c\u53ef\u4ee5\u81ea\u5b9a\u4e49\u6807\u7b7e\u4f53\u7cfb\uff1b\u9488\u5bf9\u591a\u79cd\u751f\u6210\u4efb\u52a1\uff0c\u53ef\u4ee5\u8fdb\u884c\u91c7\u6837\u81ea\u7531\u751f\u6210\u3002 \n\n\n\n## \u671f\u671b\u6a21\u578b\u4f7f\u7528\u65b9\u5f0f\u53ca\u9002\u7528\u8303\u56f4\n\n### \u5bf9\u8bdd\u8fd0\u884c\u65b9\u5f0f\n\n\n```python\nfrom transformers import AutoTokenizer, AutoModel\nimport os\nmodel_dir='ClueAI/ChatYuan-large-v2'\ntokenizer = AutoTokenizer.from_pretrained(model_dir)\n# \u901f\u5ea6\u4f1a\u53d7\u5230\u7f51\u7edc\u5f71\u54cd\nmodel = AutoModel.from_pretrained(model_dir, trust_remote_code=True)\nhistory = []\nprint(\"starting\")\nwhile True:\n query = input(\"\\n\u7528\u6237\uff1a\")\n if query == \"stop\":\n break\n if query == \"clear\":\n history = []\n os.system('clear')\n continue\n response, history = model.chat(tokenizer, query, history=history)\n print(f\"\u5c0f\u5143\uff1a{response}\") \n```\n\n#### \u9ad8\u7ea7\u53c2\u6570\u914d\u7f6e\u4ee3\u7801\u793a\u4f8b\n\n\n\n\u52a0\u8f7d\u6a21\u578b\uff1a\n \n ```python\n# \u52a0\u8f7d\u6a21\u578b\nfrom transformers import T5Tokenizer, T5ForConditionalGeneration\n\n# \u81ea\u52a8\u4e0b\u8f7d\u4e00\u6b21\u540e\uff0c\u672c\u5730\u8fd0\u884c\uff0c\u4e0d\u53d7\u7f51\u7edc\u5f71\u54cd\ntokenizer = T5Tokenizer.from_pretrained(\"ClueAI/ChatYuan-large-v2\")\nmodel = T5ForConditionalGeneration.from_pretrained(\"ClueAI/ChatYuan-large-v2\")\n# \u8be5\u52a0\u8f7d\u65b9\u5f0f\uff0c\u5728\u6700\u5927\u957f\u5ea6\u4e3a512\u65f6 \u5927\u7ea6\u9700\u89816G\u591a\u663e\u5b58\n# \u5982\u663e\u5b58\u4e0d\u591f\uff0c\u53ef\u91c7\u7528\u4ee5\u4e0b\u65b9\u5f0f\u52a0\u8f7d\uff0c\u8fdb\u4e00\u6b65\u51cf\u5c11\u663e\u5b58\u9700\u6c42\uff0c\u7ea6\u4e3a3G\n# model = T5ForConditionalGeneration.from_pretrained(\"ClueAI/ChatYuan-large-v2\").half()\n\n\n\n ```\n\n\u4f7f\u7528\u6a21\u578b\u8fdb\u884c\u9884\u6d4b\u63a8\u7406\u65b9\u6cd5\uff1a\n```python\n# \u4f7f\u7528\nimport torch\nfrom transformers import AutoTokenizer\n# \u4fee\u6539colab\u7b14\u8bb0\u672c\u8bbe\u7f6e\u4e3agpu\uff0c\u63a8\u7406\u66f4\u5feb\ndevice = torch.device('cuda')\nmodel.to(device)\ndef preprocess(text):\n text = text.replace(\"\\n\", \"\\\\n\").replace(\"\\t\", \"\\\\t\")\n return text\n\ndef postprocess(text):\n return text.replace(\"\\\\n\", \"\\n\").replace(\"\\\\t\", \"\\t\").replace('%20',' ')\n\ndef answer(text, sample=True, top_p=1, temperature=0.7, context=\"\"):\n '''sample\uff1a\u662f\u5426\u62bd\u6837\u3002\u751f\u6210\u4efb\u52a1\uff0c\u53ef\u4ee5\u8bbe\u7f6e\u4e3aTrue;\n top_p\uff1a0-1\u4e4b\u95f4\uff0c\u751f\u6210\u7684\u5185\u5bb9\u8d8a\u591a\u6837'''\n text = f\"{context}\\n\u7528\u6237\uff1a{text}\\n\u5c0f\u5143\uff1a\"\n text = text.strip()\n text = preprocess(text)\n encoding = tokenizer(text=[text], truncation=True, padding=True, max_length=512, return_tensors=\"pt\").to(device) \n if not sample:\n out = model.generate(**encoding, return_dict_in_generate=True, output_scores=False, max_new_tokens=512, num_beams=1, length_penalty=0.6)\n else:\n out = model.generate(**encoding, return_dict_in_generate=True, output_scores=False, max_new_tokens=512, do_sample=True, top_p=top_p, temperature=temperature, no_repeat_ngram_size=3)\n out_text = tokenizer.batch_decode(out[\"sequences\"], skip_special_tokens=True)\n return postprocess(out_text[0])\nprint(\"end...\")\n```\n\n### \u5355\u8f6e\u5bf9\u8bdd\n```python\ninput_text0 = \"\u7ffb\u8bd1\u8fd9\u53e5\u8bdd\u6210\u82f1\u6587\uff1a\u5c48\u81e3\u6c0f\u91cc\u7684\u5316\u5986\u54c1\u5230\u5e95\u600e\u4e48\u6837\uff1f\"\ninput_text1 = \"\u5e2e\u6211\u5199\u4e00\u4e2a\u82f1\u6587\u8425\u9500\u65b9\u6848\uff0c\u9488\u5bf9iphone\"\ninput_text2 = \"\u5199\u4e00\u4e2a\u5192\u6ce1\u6392\u5e8f\"\n# input_text1 = \"\u4f60\u80fd\u5e72\u4ec0\u4e48\"\n# input_text2 = \"\u7528\u82f1\u6587\u5199\u4e00\u5c01\u9053\u6b49\u7684\u90ae\u4ef6\uff0c\u8868\u8fbe\u56e0\u4e3a\u7269\u6d41\u5ef6\u8bef\uff0c\u4e0d\u80fd\u5982\u671f\u5230\u8fbe\uff0c\u6211\u4eec\u53ef\u4ee5\u8d54\u507f\u8d35\u516c\u53f8\u6240\u6709\u635f\u5931\"\ninput_text3 = \"\u5199\u4e00\u4e2a\u6587\u7ae0\uff0c\u9898\u76ee\u662f\u672a\u6765\u57ce\u5e02\"\ninput_text4 = \"\u5199\u4e00\u4e2a\u8bd7\u6b4c\uff0c\u5173\u4e8e\u51ac\u5929\"\ninput_text5 = \"\u4ece\u5357\u4eac\u5230\u4e0a\u6d77\u7684\u8def\u7ebf\"\ninput_text6 = \"\u5b66\u524d\u6559\u80b2\u4e13\u4e1a\u5c97\u4f4d\u5b9e\u4e60\u4e2d\uff0c\u5728\u5b66\u751f\u65b9\u9762\u4f1a\u5b58\u5728\u95ee\u9898\uff0c\u8bf7\u63d0\u51fa\u6539\u8fdb\u63aa\u65bd\u3002800\u5b57\"\ninput_text7 = \"\u6839\u636e\u6807\u9898\u751f\u6210\u6587\u7ae0\uff1a\u6807\u9898\uff1a\u5c48\u81e3\u6c0f\u91cc\u7684\u5316\u5986\u54c1\u5230\u5e95\u600e\u4e48\u6837\uff1f\u6b63\u6587\uff1a\u5316\u5986\u54c1\uff0c\u8981\u8bb2\u7a76\u79d1\u5b66\u8fd0\u7528\uff0c\u5408\u7406\u642d\u914d\u3002\u5c48\u81e3\u6c0f\u8d77\u7801\u662f\u6b63\u54c1\u8fde\u9501\u5e97\u3002\u8bf7\u7ee7\u7eed\u540e\u9762\u7684\u6587\u5b57\u3002\"\ninput_text8 = \"\u5e2e\u6211\u5bf9\u6bd4\u51e0\u6b3eGPU\uff0c\u5217\u51fa\u8be6\u7ec6\u53c2\u6570\u5bf9\u6bd4\uff0c\u5e76\u4e14\u7ed9\u51fa\u6700\u7ec8\u7ed3\u8bba\"\ninput_list = [input_text0, input_text1, input_text2, input_text3, input_text4, input_text5, input_text6, input_text7, input_text8]\nfor i, input_text in enumerate(input_list):\n print(f\"\u793a\u4f8b{i}\".center(50, \"=\"))\n output_text = answer(input_text)\n print(f\"{input_text}{output_text}\")\n```\n\n### \u591a\u8f6e\u5bf9\u8bdd\n```python\ninput_text = [\"\u4f60\u597d\",\"\u4f60\u662f\u8c01\uff1f\"]\nanswer_text = [\"\u60a8\u597d\uff0c\u6709\u4ec0\u4e48\u53ef\u4ee5\u5e2e\u52a9\u60a8\u7684\u5417\uff1f\", \"\u6211\u662f\u5143\u8bed\u667a\u80fd\u516c\u53f8\u7814\u53d1\u7684AI\u667a\u80fd\u52a9\u624b, \u5728\u4e0d\u8fdd\u53cd\u539f\u5219\u7684\u60c5\u51b5\u4e0b\uff0c\u6211\u53ef\u4ee5\u56de\u7b54\u4f60\u7684\u4efb\u4f55\u95ee\u9898\u3002\"]\ncontext = \"\\n\".join([f\"\u7528\u6237\uff1a{input_text[i]}\\n\u5c0f\u5143\uff1a{answer_text[i]}\" for i in range(len(input_text))])\n\ninput_text = \"\u5e2e\u6211\u5199\u4e2a\u8bf7\u5047\u6761\uff0c\u6211\u751f\u75c5\u4e86\"\nprint(f\"\u793a\u4f8b\".center(50, \"=\"))\noutput_text = answer(input_text, context = context)\nprint(f\"{context}\\n\u7528\u6237\uff1a{input_text}\\n\u5c0f\u5143\uff1a{output_text}\")\n```\n## \u4e3b\u8981\u66f4\u65b0\n\n\n### \u589e\u5f3a\u4e86\u539f\u6709\u95ee\u7b54\u3001\u751f\u6210\u4e0e\u521b\u610f\u6027\u5199\u4f5c\u529f\u80fd\n> \u901a\u8fc7\u4ee5\u4e0b\u6837\u4f8b\u7684\u5bf9\u6bd4\uff0c\u60a8\u53ef\u4ee5\u611f\u53d7\u5230chatyuan-large-v2\u5728\u4e0a\u4e2a\u7248\u672c\u4e0a\u7684\u5347\u7ea7\u548c\u8fed\u4ee3\n> \u5305\u62ec\u4f46\u4e0d\u9650\u4e8e\uff1a\n1. \u751f\u6210\u5185\u5bb9\u66f4\u52a0\u4e30\u5bcc\n2. \u5bf9\u7528\u6237\u7684\u610f\u56fe\u628a\u63e1\u66f4\u52a0\u51c6\u786e\n3. \u751f\u6210\u7684\u5185\u5bb9\u4e5f\u66f4\u4e3a\u53ef\u9760\uff0c\u6709\u6761\u7406\u6027\n\n\n- \u5199\u4e00\u5c01\u82f1\u6587\u5546\u52a1\u90ae\u4ef6\u7ed9\u82f1\u56fd\u5ba2\u6237\uff0c\u8868\u8fbe\u56e0\u4e3a\u7269\u6d41\u5ef6\u8bef\uff0c\u4e0d\u80fd\u5982\u671f\u5230\u8fbe\uff0c\u6211\u4eec\u53ef\u4ee5\u8d54\u507f\u8d35\u516c\u53f8\u6240\u6709\u635f\u5931\n\n\n```\nchatyuan-large-v1\n\nDear [Name],\n\nI'm sorry to inform you that we have been doing our business to you for your delay. We understand that we can compensate you for your losses due to the delay in our logistics.\n\nWe understand that we can make sure we can compensate you for your losses due to the delay.\n\nThank you for your understanding and we are very grateful for your efforts.\n\nSincerely,\n\n[Your Name]\n```\n\n\n```\nchatyuan-large-v2\n\nSubject: Due to delay in logistics, our company does not order the item as planned on time.\n\nWe understand that this is unfortunately caused by the delay in the delivery of the item, and we regret to investigate its cause. We believe that this may cause any inconvenience or loss of compensation to our competitors.\n\nOur company believes we can address any underlying issues that may arise. We are confident that our services will be acceptable if any inconformity occurs.\n\nPlease let us know if we can assist us in resolving any issues that could arise if the delivery team fails. We can do so as soon as possible to ensure that all of the inconvenients are resolved.\n\nThank you for your understanding and support. We look forward to hearing from you soon.\n\nBest regards,\n```\n\n- \u5199\u4e00\u4e2a\u6587\u7ae0\uff0c\u9898\u76ee\u662f\u672a\u6765\u57ce\u5e02\n\n\n```\nchatyuan-large-v1\n\n\n\u672a\u6765\u7684\u57ce\u5e02\u662f\u600e\u6837\u7684\u5462\uff1f \n \n\u672a\u6765\u57ce\u5e02\u62e5\u6709\u5f88\u591a\u4e0d\u540c\u7684\u529f\u80fd\uff0c\u5982\u667a\u6167\u57ce\u5e02\u3001\u667a\u80fd\u4ea4\u901a\u7cfb\u7edf\u3001\u8fdc\u7a0b\u533b\u7597\u7cfb\u7edf\u3001\u667a\u80fd\u5bb6\u5c45\u7cfb\u7edf\u3001\u667a\u6167\u533b\u7597\u7cfb\u7edf\u7b49\u3002\u672a\u6765\u57ce\u5e02\u8fd8\u4f7f\u7528\u7269\u8054\u7f51\u6280\u672f\uff0c\u53ef\u4ee5\u8fdc\u7a0b\u63a7\u5236\uff0c\u4f7f\u7528\u8fdc\u7a0b\u64cd\u63a7\u548c\u4f20\u611f\u5668\u6765\u76d1\u63a7\u57ce\u5e02\u7684\u53d1\u5c55\uff0c\u5e2e\u52a9\u4ed6\u4eec\u89e3\u51b3\u5404\u79cd\u57ce\u5e02\u95ee\u9898\u3002 \n \n\u672a\u6765\u7684\u57ce\u5e02\u8fd8\u91c7\u7528\u4e92\u8054\u7f51\u6280\u672f\uff0c\u53ef\u4ee5\u8ba9\u57ce\u5e02\u53d8\u5f97\u66f4\u667a\u80fd\uff0c\u8ba9\u57ce\u5e02\u53d8\u5f97\u66f4\u667a\u6167\uff0c\u8ba9\u6bcf\u4e00\u4e2a\u4eba\u90fd\u80fd\u66f4\u8f7b\u677e\u5730\u751f\u6d3b\u3002\u672a\u6765\u57ce\u5e02\u7684\u667a\u80fd\u8bbe\u5907\u4e5f\u53ef\u4ee5\u8ba9\u57ce\u5e02\u7684\u4f7f\u7528\u8005\u66f4\u65b9\u4fbf\u5730\u638c\u63e1\u4ed6\u4eec\u7684\u751f\u6d3b\uff0c\u6bd4\u5982\u667a\u80fd\u7684\u667a\u80fd\u4ea4\u901a\u7cfb\u7edf\u548c\u8fdc\u7a0b\u533b\u7597\u7cfb\u7edf\u548c\u667a\u80fd\u7684\u5b89\u9632\u7cfb\u7edf\u7b49\uff0c\u53ef\u4ee5\u63d0\u4f9b\u4ed6\u4eec\u66f4\u591a\u7684\u4fbf\u5229\u3002 \n\u603b\u4e4b\uff0c\u672a\u6765\u57ce\u5e02\u7684\u613f\u666f\u662f\uff0c\u8ba9\u6240\u6709\u57ce\u5e02\u7684\u4f7f\u7528\u8005\u90fd\u80fd\u62e5\u6709\u4e00\u4e2a\u7f8e\u597d\u7684\u672a\u6765\uff0c\u8ba9\u4ed6\u4eec\u53ef\u4ee5\u5728\u81ea\u5df1\u7684\u57ce\u5e02\u4eab\u53d7\u7f8e\u597d\u7684\u751f\u6d3b\u3002 \n```\n\n```\nchatyuan-large-v2\n\n\n\u672a\u6765\u57ce\u5e02\uff0c\u662f\u4e00\u573a\u5bf9\u672a\u6765\u7684\u63a2\u7d22\u548c\u63a2\u7d22\uff0c\u5b83\u5c06\u662f\u4e00\u4e2a\u5145\u6ee1\u673a\u9047\u548c\u6311\u6218\u7684\u65f6\u4ee3\u3002\u5728\u672a\u6765\u57ce\u5e02\u4e2d\uff0c\u4eba\u4eec\u5c06\u4eab\u53d7\u5230\u66f4\u591a\u4fbf\u5229\u548c\u667a\u6167\uff0c\u540c\u65f6\u4e5f\u5c06\u9762\u4e34\u66f4\u591a\u7684\u6311\u6218\u548c\u673a\u9047\u3002\n\n\u9996\u5148\uff0c\u672a\u6765\u57ce\u5e02\u7684\u4ea4\u901a\u5c06\u66f4\u52a0\u667a\u80fd\u5316\u548c\u9ad8\u6548\u5316\u3002\u56e0\u4e3a\u57ce\u5e02\u4e2d\u5c06\u4e0d\u65ad\u53d1\u5c55\u548c\u66f4\u65b0\u4ea4\u901a\u7cfb\u7edf\uff0c\u5e76\u4e14\u4ea4\u901a\u4fe1\u53f7\u5c06\u66f4\u52a0\u667a\u80fd\u5316\u548c\u81ea\u52a8\u5316\uff0c\u4ece\u800c\u63d0\u9ad8\u57ce\u5e02\u4ea4\u901a\u6548\u7387\u548c\u5b89\u5168\u6027\u3002\u540c\u65f6\uff0c\u57ce\u5e02\u4e2d\u7684\u516c\u5171\u4ea4\u901a\u7f51\u7edc\u4e5f\u5c06\u66f4\u52a0\u5b8c\u5584\uff0c\u4eba\u4eec\u53ef\u4ee5\u66f4\u52a0\u8f7b\u677e\u548c\u4fbf\u6377\u5730\u5230\u8fbe\u57ce\u5e02\u5404\u4e2a\u89d2\u843d\uff0c\u540c\u65f6\u964d\u4f4e\u51fa\u884c\u6210\u672c\u3002\n\n\u5176\u6b21\uff0c\u672a\u6765\u57ce\u5e02\u7684\u80fd\u6e90\u5c06\u66f4\u52a0\u6e05\u6d01\u548c\u53ef\u6301\u7eed\u3002\u57ce\u5e02\u4e2d\u7684\u80fd\u6e90\u6d88\u8d39\u5c06\u9010\u6e10\u4ece\u5316\u77f3\u71c3\u6599\u4e3a\u4e3b\u5411\u53ef\u518d\u751f\u80fd\u6e90\u4e3a\u4e3b\u8f6c\u53d8\u3002\u672a\u6765\u57ce\u5e02\u7684\u80fd\u6e90\u7ed3\u6784\u5c06\u66f4\u52a0\u591a\u5143\u5316\uff0c\u5c06\u4ece\u4f20\u7edf\u7684\u5316\u77f3\u71c3\u6599\u4e3a\u4e3b\u5411\u80fd\u6e90\u4e0e\u80fd\u6e90\u7684\u5b8c\u7f8e\u7ed3\u5408\u8f6c\u53d8\u3002\u540c\u65f6\uff0c\u57ce\u5e02\u4e2d\u4e5f\u5c06\u91c7\u7528\u66f4\u52a0\u73af\u4fdd\u7684\u80fd\u6e90\uff0c\u5e76\u4f7f\u7528\u592a\u9633\u80fd\u3001\u98ce\u80fd\u7b49\u6e05\u6d01\u80fd\u6e90\uff0c\u4ee5\u964d\u4f4e\u5bf9\u73af\u5883\u7684\u6c61\u67d3\u548c\u7834\u574f\u3002\n\n\u6b64\u5916\uff0c\u672a\u6765\u57ce\u5e02\u7684\u533b\u7597\u3001\u6559\u80b2\u7b49\u516c\u5171\u670d\u52a1\u4e5f\u5c06\u66f4\u52a0\u5b8c\u5584\u548c\u667a\u80fd\u5316\u3002\u57ce\u5e02\u4e2d\u7684\u533b\u7597\u548c\u6559\u80b2\u8bbe\u65bd\u5c06\u66f4\u52a0\u73b0\u4ee3\u5316\u548c\u667a\u80fd\u5316\uff0c\u4eba\u4eec\u5c06\u80fd\u591f\u5728\u57ce\u5e02\u4e2d\u5fc3\u7684\u533b\u9662\u4e2d\u63a5\u53d7\u9ad8\u54c1\u8d28\u7684\u533b\u7597\u670d\u52a1\u3002\u540c\u65f6\uff0c\u57ce\u5e02\u5c06\u901a\u8fc7\u7269\u8054\u7f51\u6280\u672f\uff0c\u5b9e\u73b0\u57ce\u5e02\u4e2d\u7684\u533b\u7597\u548c\u6559\u80b2\u8d44\u6e90\u7684\u5171\u4eab\uff0c\u8ba9\u6bcf\u4e2a\u5b69\u5b50\u90fd\u6709\u673a\u4f1a\u63a5\u53d7\u826f\u597d\u7684\u6559\u80b2\u3002\n\n\u6700\u540e\uff0c\u672a\u6765\u57ce\u5e02\u7684\u751f\u6001\u73af\u5883\u5c06\u66f4\u52a0\u7eff\u8272\u548c\u751f\u6001\u5316\u3002\u57ce\u5e02\u4e2d\u7684\u5efa\u7b51\u548c\u8bbe\u65bd\u5c06\u66f4\u52a0\u6ce8\u91cd\u73af\u5883\u4fdd\u62a4\u548c\u53ef\u6301\u7eed\u53d1\u5c55\u3002\u540c\u65f6\uff0c\u57ce\u5e02\u4e5f\u5c06\u91c7\u7528\u66f4\u52a0\u73af\u4fdd\u548c\u53ef\u6301\u7eed\u7684\u80fd\u6e90\u548c\u4ea4\u901a\u65b9\u5f0f\uff0c\u4ee5\u51cf\u5c11\u5bf9\u73af\u5883\u7684\u5f71\u54cd\u3002\n\n\u603b\u4e4b\uff0c\u672a\u6765\u57ce\u5e02\u662f\u4e00\u4e2a\u5145\u6ee1\u673a\u9047\u548c\u5e0c\u671b\u7684\u65f6\u4ee3\uff0c\u5b83\u5c06\u4e3a\u4eba\u4eec\u5e26\u6765\u66f4\u591a\u7684\u4fbf\u5229\u548c\u667a\u6167\uff0c\u540c\u65f6\u4e5f\u5c06\u9762\u4e34\u66f4\u591a\u7684\u6311\u6218\u548c\u673a\u4f1a\u3002\u672a\u6765\u57ce\u5e02\u9700\u8981\u6211\u4eec\u4e0d\u65ad\u63a2\u7d22\u548c\u521b\u65b0\uff0c\u624d\u80fd\u8ba9\u8fd9\u4e2a\u7f8e\u597d\u7684\u65f6\u4ee3\u66f4\u52a0\u7f8e\u597d\u3002\n```\n\n- \u5b66\u524d\u6559\u80b2\u4e13\u4e1a\u5c97\u4f4d\u5b9e\u4e60\u4e2d\uff0c\u5728\u5b66\u751f\u65b9\u9762\u4f1a\u5b58\u5728\u95ee\u9898\uff0c\u8bf7\u63d0\u51fa\u6539\u8fdb\u63aa\u65bd\u3002800\u5b57\n\n```\nchatyuan-large-v1\n\n\u5728\u5b66\u524d\u6559\u80b2\u4e13\u4e1a\u5c97\u4f4d\u5b9e\u4e60\u4e2d\uff0c\u5728\u5b66\u751f\u65b9\u9762\u4f1a\u5b58\u5728\u4e00\u4e9b\u95ee\u9898\uff0c\u5982\u5b66\u751f\u7f3a\u5c11\u4e13\u4e1a\u77e5\u8bc6\u3001\u7f3a\u4e4f\u5408\u4f5c\u610f\u8bc6\u3001\u7f3a\u4e4f\u57fa\u672c\u7684\u8d23\u4efb\u5fc3\u7b49\u7b49\uff0c\u9700\u8981\u6211\u4eec\u91c7\u53d6\u63aa\u65bd\uff0c\u6765\u89e3\u51b3\u8fd9\u4e9b\u95ee\u9898\u3002 \n \n\u9996\u5148\uff0c\u5728\u5b9e\u4e60\u8fc7\u7a0b\u4e2d\uff0c\u6211\u4eec\u9700\u8981\u57f9\u517b\u5b66\u751f\u7684\u56e2\u961f\u610f\u8bc6\uff0c\u52a0\u5f3a\u5b66\u751f\u4e0e\u8001\u5e08\u4e4b\u95f4\u7684\u5408\u4f5c\uff0c\u52a0\u5f3a\u4e0e\u5bb6\u957f\u4e4b\u95f4\u7684\u6c9f\u901a\uff0c\u63d0\u9ad8\u4ed6\u4eec\u7684\u6c9f\u901a\u80fd\u529b\u3002\u53e6\u5916\uff0c\u6211\u4eec\u4e5f\u9700\u8981\u52a0\u5f3a\u5b66\u751f\u7684\u72ec\u7acb\u6027\uff0c\u8ba9\u5b66\u751f\u5b66\u4f1a\u72ec\u7acb\u601d\u8003\uff0c\u5b66\u4f1a\u548c\u8001\u5e08\u3001\u5bb6\u957f\u8fdb\u884c\u6c9f\u901a\uff0c\u57f9\u517b\u5176\u72ec\u7acb\u6027\u3002 \n\u6b64\u5916\uff0c\u5728\u65e5\u5e38\u7684\u5de5\u4f5c\u4e2d\uff0c\u6211\u4eec\u4e5f\u8981\u4e3a\u5b66\u751f\u63d0\u4f9b\u9002\u5f53\u7684\u673a\u4f1a\u548c\u673a\u4f1a\uff0c\u8ba9\u5b66\u751f\u591a\u63a5\u89e6\u793e\u4f1a\u3001\u591a\u53c2\u52a0\u793e\u4f1a\u6d3b\u52a8\uff0c\u52a0\u6df1\u4ed6\u4eec\u5bf9\u793e\u4f1a\u3001\u5b66\u6821\u7684\u8ba4\u8bc6\uff0c\u57f9\u517b\u4ed6\u4eec\u4e3b\u52a8\u3001\u79ef\u6781\u548c\u521b\u65b0\u7684\u7cbe\u795e\u3002 \n\u6700\u540e\uff0c\u6211\u4eec\u8fd8\u9700\u8981\u91cd\u89c6\u5b66\u751f\u5728\u5b9e\u4e60\u4e2d\u51fa\u73b0\u7684\u95ee\u9898\uff0c\u5c24\u5176\u662f\u5b66\u751f\u7f3a\u4e4f\u5408\u4f5c\uff0c\u6709\u4f9d\u8d56\u5fc3\u7406\uff0c\u4e0d\u61c2\u5f97\u5206\u4eab\uff0c\u6709\u62d6\u5ef6\u884c\u4e3a\u7b49\uff0c\u8fd9\u4e9b\u5b66\u751f\u5e94\u8be5\u5f15\u8d77\u6211\u4eec\u7684\u91cd\u89c6\uff0c\u5236\u5b9a\u5408\u7406\u7684\u6539\u8fdb\u63aa\u65bd\uff0c\u6539\u5584\u8fd9\u4e9b\u95ee\u9898\u3002 \n```\n\n```\nchatyuan-large-v2\n\n\u4e00\u3001\u6539\u8fdb\u63aa\u65bd\n\n1. \u5efa\u7acb\u79d1\u5b66\u4e25\u8c28\u7684\u6559\u5b66\u7ba1\u7406\u5236\u5ea6\u3002\u5728\u5b66\u524d\u6559\u80b2\u4e13\u4e1a\u5c97\u4f4d\u5b9e\u4e60\u4e2d\uff0c\u5e94\u8be5\u5efa\u7acb\u4e00\u5957\u4e25\u8c28\u7684\u6559\u5b66\u7ba1\u7406\u5236\u5ea6\uff0c\u6ce8\u91cd\u57f9\u517b\u5b66\u751f\u5b9e\u8df5\u80fd\u529b\u548c\u7efc\u5408\u7d20\u8d28\uff0c\u63d0\u9ad8\u5b66\u751f\u7684\u53c2\u4e0e\u5ea6\u548c\u4e3b\u52a8\u6027\u3002\n\n2. \u52a0\u5f3a\u6559\u5e08\u548c\u5b66\u751f\u7684\u6c9f\u901a\u3002\u5728\u5b66\u524d\u6559\u80b2\u4e13\u4e1a\u5b9e\u4e60\u4e2d\uff0c\u6559\u5e08\u5e94\u8be5\u4e3b\u52a8\u548c\u5b66\u751f\u8fdb\u884c\u6c9f\u901a\uff0c\u4e86\u89e3\u5b66\u751f\u5728\u5b66\u4e60\u3001\u751f\u6d3b\u548c\u5de5\u4f5c\u4e2d\u9047\u5230\u7684\u95ee\u9898\uff0c\u53ca\u65f6\u7ed9\u4e88\u5e2e\u52a9\u548c\u6307\u5bfc\uff0c\u8425\u9020\u826f\u597d\u7684\u5b66\u4e60\u6c1b\u56f4\u3002\n\n3. \u63d0\u9ad8\u5b66\u751f\u7684\u53c2\u4e0e\u5ea6\u3002\u5efa\u8bae\u6839\u636e\u4e0d\u540c\u5b66\u751f\u7684\u7279\u70b9\uff0c\u91c7\u53d6\u4e0d\u540c\u7684\u6559\u5b66\u65b9\u5f0f\u548c\u624b\u6bb5\uff0c\u5145\u5206\u8c03\u52a8\u5b66\u751f\u7684\u5b66\u4e60\u79ef\u6781\u6027\u548c\u4e3b\u52a8\u6027\uff0c\u8ba9\u4ed6\u4eec\u5728\u5b9e\u8df5\u4e2d\u5b66\u4e60\uff0c\u79ef\u7d2f\u66f4\u591a\u5b9e\u8df5\u7ecf\u9a8c\u3002\n\n4. \u52a0\u5f3a\u5b66\u751f\u81ea\u6211\u7ba1\u7406\u80fd\u529b\u3002\u5b66\u524d\u6559\u80b2\u4e13\u4e1a\u5c97\u4f4d\u5b9e\u4e60\u8fc7\u7a0b\u4e2d\uff0c\u5b66\u751f\u9700\u8981\u5b66\u4f1a\u81ea\u6211\u7ba1\u7406\uff0c\u5b9a\u671f\u8fdb\u884c\u81ea\u6211\u8bc4\u4f30\uff0c\u53ca\u65f6\u53d1\u73b0\u548c\u7ea0\u6b63\u5b66\u751f\u7684\u9519\u8bef\uff0c\u4fdd\u8bc1\u5b9e\u4e60\u6548\u679c\u3002\n\n5. \u5efa\u7acb\u826f\u597d\u7684\u8003\u8bc4\u673a\u5236\u3002\u5e94\u8be5\u5efa\u7acb\u826f\u597d\u7684\u8003\u8bc4\u673a\u5236\uff0c\u901a\u8fc7\u5408\u7406\u7684\u8003\u6838\u548c\u5956\u52b1\u673a\u5236\uff0c\u6fc0\u52b1\u5b66\u751f\u79ef\u6781\u53c2\u4e0e\u5b9e\u4e60\uff0c\u4e0d\u65ad\u63d0\u5347\u5b9e\u4e60\u8d28\u91cf\u3002\n\n\u4e8c\u3001\u603b\u7ed3\n\n\u901a\u8fc7\u4ee5\u4e0a\u7684\u6539\u8fdb\u63aa\u65bd\uff0c\u53ef\u4ee5\u6709\u6548\u63d0\u9ad8\u5b66\u524d\u6559\u80b2\u4e13\u4e1a\u5c97\u4f4d\u5b9e\u4e60\u7684\u8d28\u91cf\uff0c\u8ba9\u5b66\u751f\u66f4\u597d\u5730\u7406\u89e3\u548c\u638c\u63e1\u6240\u5b66\u77e5\u8bc6\uff0c\u4e3a\u5b66\u751f\u672a\u6765\u7684\u804c\u4e1a\u53d1\u5c55\u6253\u4e0b\u575a\u5b9e\u7684\u57fa\u7840\u3002\n```\n\n- \u5199\u4e00\u4e2a\u8bd7\u6b4c\uff0c\u5173\u4e8e\u51ac\u5929\n\n```\nchatyuan-large-v1\n\n\u51ac\u591c\u5bc2\u9759\u51b7\uff0c \n\u4e91\u5728\u5929\u8fb9\u98d8\uff0c \n\u51b0\u5c01\u767d\u96ea\u4e0a\uff0c \n\u5bd2\u51b7\u50cf\u4e00\u573a\u96ea\u3002 \n \n\u96ea\u82b1\u878d\u5316\u6210\u51b0\uff0c \n\u50cf\u90a3\u96ea\u82b1\u98d8\u6d12\uff0c \n\u5728\u5bd2\u51b7\u7684\u51ac\u5929\uff0c \n\u611f\u53d7\u6625\u5929\u7684\u559c\u60a6\u3002 \n \u51ac\u65e5\u91cc\uff0c \n\u98ce\u96ea\u6e10\u6d88\uff0c \n\u4e00\u7247\u5bc2\u9759\uff0c \n\u628a\u5feb\u4e50\u548c\u6e29\u6696\u5e26\u56de\u5bb6\u3002 \n```\n\n```\nchatyuan-large-v2\n\n\u767d\u96ea\u6f2b\u5929\u98de\u821e\uff0c \n\u5bd2\u98ce\u523a\u9aa8\u5bd2\u51b7\u3002 \n\u5927\u5730\u4e00\u7247\u5bc2\u9759\uff0c \n\u751f\u547d\u5728\u5bd2\u51b7\u4e2d\u6c89\u7761\u3002 \n\n\u679d\u5934\u6302\u6ee1\u96ea\u82b1\uff0c \n\u5bd2\u6c14\u903c\u4eba\u8ba9\u4eba\u5fc3\u5bd2\u3002 \n\u7136\u800c\uff0c \n\u51ac\u5929\u7684\u7f8e\u666f\u5374\u8ba9\u4eba\u96be\u5fd8\u3002 \n\u56e0\u4e3a\u6709\u4e86\u5b83\uff0c \n\u51ac\u5929\u624d\u6709\u4e86\u751f\u547d\u7684\u6c14\u606f\u3002\n```\n\n\n- \u4ece\u5357\u4eac\u5230\u4e0a\u6d77\u7684\u8def\u7ebf\n\n```\nchatyuan-large-v1\n\n\u4f60\u597d\uff0c\u5357\u4eac\u5230\u4e0a\u6d77\u7684\u8def\u7ebf\u5982\u4e0b\uff1a \n \n1. \u5357\u4eac\u5230\u4e0a\u6d77\uff0c\u53ef\u4ee5\u4e58\u5750\u5357\u4eac\u5730\u94c11\u53f7\u7ebf\uff0c\u5728\u5357\u4eac\u7ad9\u4e58\u5750\u8f68\u9053\u4ea4\u901a1\u53f7\u7ebf\u3002 \n \n2. \u5357\u4eac\u5230\u6d66\u4e1c\u673a\u573a\uff0c\u53ef\u4ee5\u642d\u4e58\u4e0a\u6d77\u5730\u94c11\u53f7\uff0c\u5728\u9646\u5bb6\u5634\u7ad9\u4e58\u5750\u5730\u94c11\u7ebf\uff0c\u5728\u6d66\u4e1c\u56fd\u9645\u673a\u573a\u7ad9\u4e58\u5750\u673a\u573a\u5feb\u7ebf\uff0c\u524d\u5f80\u4e0a\u6d77\u6d66\u4e1c\u56fd\u9645\u673a\u573a\u3002 \n \n3. \u4e0a\u6d77\u5230\u5357\u4eac\uff0c\u53ef\u4ee5\u6362\u4e58\u4e0a\u6d77\u5730\u94c12\u53f7\u7ebf\uff0c\u4ece\u5357\u4eac\u7ad9\u6362\u4e58\u5730\u94c12\u7ebf\uff0c\u518d\u4ece\u5357\u4eac\u5357\u7ad9\u6362\u4e58\u5730\u94c11\u8def\uff0c\u7136\u540e\u5230\u8fbe\u4e0a\u6d77\u7ad9\u3002 \n```\n\n```\nchatyuan-large-v2\n\n\u4ece\u5357\u4eac\u5230\u4e0a\u6d77\u7684\u8def\u7ebf\uff1a\n\n1. \u4e58\u5750\u98de\u673a\uff1a\u5357\u4eac\u5230\u4e0a\u6d77\u7684\u822a\u73ed\u4e3b\u8981\u6709\u5357\u4eac\u7984\u53e3\u673a\u573a\u548c\u4e0a\u6d77\u6d66\u4e1c\u673a\u573a\u3002\u5357\u4eac\u7984\u53e3\u673a\u573a\u6709\u822a\u73ed\u5230\u4e0a\u6d77\u8679\u6865\u673a\u573a\uff0c\u6d66\u4e1c\u673a\u573a\u6709\u822a\u73ed\u5230\u5357\u4eac\u7984\u53e3\u673a\u573a\u3002\n\n2. \u4e58\u5750\u9ad8\u94c1\uff1a\u4e0a\u6d77\u5230\u5357\u4eac\u7684\u9ad8\u94c1\u7ebf\u8def\u6709\uff1a\u4e0a\u6d77\u8679\u6865-\u5357\u4eac(\u6caa\u5b81\u9ad8\u901f-\u5357\u4eac\u5357\u7ad9)-\u5357\u4eac\u5357\u7ad9-\u4e0a\u6d77\u8679\u6865(\u6caa\u5b81\u9ad8\u901f)-\u5357\u4eac\u5357\u7ad9(\u6caa\u5b81\u9ad8\u901f\u2014\u2014\u6caa\u5b81\u9ad8\u901f-\u6caa\u5b81\u9ad8\u901f-\u5b81\u676d\u9ad8\u901f-\u5b81\u676d\u9ad8\u901f\u516c\u8def-\u5b81\u676d\u9ad8\u901f\u516c\u8def)-\u4e0a\u6d77\u5357\u7ad9(\u6caa\u5b81\u9ad8\u901f\u516c\u8def)-\u4e0a\u6d77\u8679\u6865(\u4e0a\u6d77\u5e02\u533a-\u4e0a\u6d77\u5357\u7ad9)-\u4e0a\u6d77\u8679\u6865\u7ad9(\u4e0a\u6d77\u5e02\u533a-\u4e0a\u6d77\u5e02\u533a-\u6d66\u4e1c\u56fd\u9645\u673a\u573a)\u3002\n\n3. \u4e58\u5750\u5927\u5df4\uff1a\u5357\u4eac\u5230\u4e0a\u6d77\uff0c\u6709\u591a\u79cd\u4ea4\u901a\u5de5\u5177\u53ef\u4ee5\u9009\u62e9\u3002\u4ee5\u4e0b\u662f\u5176\u4e2d\u51e0\u79cd\uff1a\n\n1. \u5357\u4eac-\u4e0a\u6d77\u7684\u957f\u9014\u6c7d\u8f66\u548c\u79c1\u5bb6\u8f66\uff0c\u5982K407\u3001K451\u7b49\u3002\n\n2. \u4e0a\u6d77-\u5357\u4eac\u7684\u957f\u9014\u6c7d\u8f66\u548c\u5927\u5df4\uff0c\u5982K196\u7b49\u3002\n\n3. \u4e0a\u6d77-\u5357\u4eac\u7684\u516c\u4ea4\u8f66\u548c\u79c1\u5bb6\u8f66\uff0c\u5982M29\u3001M35\u7b49\u3002\n\n4. \u4e58\u5750\u51fa\u79df\u8f66\uff1a\u4e0a\u6d77\u5230\u5357\u4eac\uff0c\u6709\u591a\u79cd\u51fa\u79df\u8f66\u53ef\u4f9b\u9009\u62e9\u3002\n```\n\n- \u6839\u636e\u6807\u9898\u751f\u6210\u6587\u7ae0\uff1a\u5c48\u81e3\u6c0f\u91cc\u7684\u5316\u5986\u54c1\u5230\u5e95\u600e\u4e48\u6837\uff1f\n\n```\nchatyuan-large-v1\n\n\u8fd1\u65e5\uff0c\u4e00\u6761\u5173\u4e8e\u5c48\u81e3\u6c0f\u5316\u5986\u54c1\u7684\u5fae\u535a\u5f15\u53d1\u7f51\u53cb\u70ed\u8bae\u3002\u201c\u5728\u5c48\u81e3\u58eb\u5546\u573a\u91cc\uff0c\u6709\u4e00\u4e9b\u62a4\u80a4\u54c1\u552e\u4ef7\u52a8\u8f84\u4e0a\u5343\u5143\uff0c\u90e8\u5206\u5316\u5986\u54c1\u751a\u81f3\u8fd8\u5356\u5230\u51e0\u6298\u3002\u5c48\u81e3\u4ed5\u5316\u5986\u54c1\u4e13\u67dc\u5de5\u4f5c\u4eba\u5458\u8868\u793a\uff1a\u201c\u5c48\u81e3\u8d2d\u5546\u57ce\u7684\u5316\u5986\u54c1\u90fd\u662f\u6b63\u89c4\u5382\u5bb6\u751f\u4ea7\uff0c\u90fd\u662f\u901a\u8fc7\u56fd\u5bb6\u836f\u54c1\u76d1\u7763\u7ba1\u7406\u5c40\u7684\u6b63\u89c4\u6e20\u9053\u8fdb\u5165\u5e02\u573a\u7684\uff0c\u5e76\u4e14\u90fd\u662f\u6b63\u54c1\u3002\u201d\u201c\u8be5\u5fae\u535a\u53d1\u51fa\u540e\uff0c\u5f15\u8d77\u7f51\u53cb\u4eec\u7684\u70ed\u8bae\uff0c\u751a\u81f3\u4e0d\u5c11\u7f51\u53cb\u5f00\u59cb\u641c\u7d22\u5c48\u81e3\u6c0f\u65d7\u8230\u5e97\u6765\u4e70\u4ea7\u54c1\u3002\u201c\u5c48\u6c0f\u5316\u5986\u54c1\u771f\u7684\u503c\u5f97\u4e70\u5417\uff1f\u201c\u8bb0\u8005\u5728\u5c48\u58eb\u4ed5\u5546\u573a\u5185\u770b\u5230\uff0c\u5c48\u81e3\u4e13\u5356\u5e97\u7684\u8d27\u67b6\u4e0a\u6446\u6ee1\u4e86\u5c48\u81e3\u65d7\u4e0b\u7684\u5404\u79cd\u5316\u5986\u54c1\uff0c\u6709\u7cbe\u534e\u3001\u9762\u971c\u3001\u773c\u971c\u7b49\u3002\u201c\u201c\u5c48\u58eb\u58eb\uff0c\u8fd9\u4e00\u6b3e\u6211\u4e70\u4e86\uff0c\u8fd9\u6b3e\u662f\u4e3b\u6253\u7f8e\u767d\u7684\u3002\u201d\u5e02\u6c11\u738b\u5973\u58eb\u8bf4\uff0c\u5979\u53bb\u5e74\u4e70\u4e86\u4e09\u74f6\u5c48\u81e3\u82ae\uff0c\u5176\u4e2d\u4e00\u6b3e\u662f\u4fdd\u6e7f\u9762\u819c\uff0c\u8fd8\u6709\u4e00\u6b3e\u662f\u62a4\u624b\u971c\u3002\u201c\u8bb0\u8005\u901a\u8fc7\u7535\u8bdd\u8054\u7cfb\u4e86\u5c48\u81e3\u81e3\u82ae\u7684\u9500\u552e\u5c0f\u59d0\uff0c\u5979\u8bf4\u5c48\u81e3\u57ce\u91cc\u6ca1\u6709\u5356\u5c48\u81e3\u8bd7\u7684\u5316\u5986\u54c1\uff0c\u53ea\u6709\u5c48\u81e3\u808c\u598d\u7684\u5316\u5986\u54c1\u3002\u201c\n```\n\n```\nchatyuan-large-v2\n\n\u4f5c\u4e3a\u4e00\u5bb6\u4ee5\u201c\u5065\u5eb7\u3001\u5b89\u5168\u3001\u6709\u6548\u201d\u4e3a\u7406\u5ff5\u7684\u8d2d\u7269\u573a\u6240\uff0c\u5c48\u81e3\u6c0f\u91cc\u7684\u5316\u5986\u54c1\u5907\u53d7\u6d88\u8d39\u8005\u7684\u559c\u7231\u3002\u90a3\u4e48\uff0c\u5c48\u81e3\u6c0f\u91cc\u5316\u5986\u54c1\u5230\u5e95\u600e\u4e48\u6837\uff1f\n\n\u9996\u5148\uff0c\u5c48\u81e3\u6c0f\u7684\u5316\u5986\u54c1\u54c1\u79cd\u7e41\u591a\uff0c\u6db5\u76d6\u4e86\u5404\u79cd\u7c7b\u578b\u548c\u529f\u6548\u3002\u4f8b\u5982\uff0c\u6d01\u9762\u4e73\u3001\u723d\u80a4\u6c34\u3001\u7cbe\u534e\u6db2\u3001\u9762\u971c\u3001\u773c\u971c\u3001\u5507\u818f\u7b49\u7b49\u3002\u5728\u9009\u62e9\u65f6\uff0c\u6d88\u8d39\u8005\u53ef\u4ee5\u6839\u636e\u81ea\u5df1\u7684\u80a4\u8d28\u3001\u9700\u6c42\u548c\u9884\u7b97\u6765\u9009\u62e9\u9002\u5408\u81ea\u5df1\u7684\u4ea7\u54c1\u3002\n\n\u5176\u6b21\uff0c\u5c48\u81e3\u6c0f\u5185\u7684\u5316\u5986\u54c1\u4ef7\u683c\u901a\u5e38\u6bd4\u5546\u573a\u3001\u8d85\u5e02\u7b49\u5176\u4ed6\u6e20\u9053\u4f18\u60e0\u5f88\u591a\u3002\u6240\u4ee5\uff0c\u6d88\u8d39\u8005\u53ef\u4ee5\u5728\u5c48\u81e3\u6c0f\u91cc\u4e70\u5230\u6027\u4ef7\u6bd4\u8f83\u9ad8\u7684\u5316\u5986\u54c1\u3002\n\n\u6700\u540e\uff0c\u5c48\u81e3\u6c0f\u4e2d\u7684\u5316\u5986\u54c1\u6709\u5f88\u591a\u79cd\u4e0d\u540c\u79cd\u7c7b\uff0c\u5305\u62ec\u7f8e\u5bb9\u7c7b\u3001\u62a4\u80a4\u7c7b\u3001\u5f69\u5986\u7c7b\u7b49\u7b49\u3002\u6d88\u8d39\u8005\u53ef\u4ee5\u6839\u636e\u81ea\u8eab\u7684\u9700\u6c42\u548c\u9884\u7b97\u6765\u9009\u62e9\u9002\u5408\u7684\u4ea7\u54c1\uff0c\u9009\u62e9\u9002\u5408\u81ea\u5df1\u7684\u4ea7\u54c1\u3002\u540c\u65f6\uff0c\u6d88\u8d39\u8005\u4e5f\u53ef\u4ee5\u5728\u5c48\u81e3\u6c0f\u91cc\u4e86\u89e3\u4e0d\u540c\u7684\u54c1\u724c\u548c\u4ef7\u683c\uff0c\u9009\u62e9\u9002\u5408\u81ea\u5df1\u7684\u4ea7\u54c1\uff0c\u63d0\u9ad8\u62a4\u80a4\u548c\u7f8e\u5bb9\u7684\u6548\u679c\u3002\n\n\u603b\u4e4b\uff0c\u5c48\u81e3\u6c0f\u662f\u6d88\u8d39\u8005\u8d2d\u4e70\u5316\u5986\u54c1\u7684\u597d\u9009\u62e9\u3002\u5c48\u81e3\u6c0f\u91cc\u7684\u5316\u5986\u4ea7\u54c1\u79cd\u7c7b\u7e41\u591a\uff0c\u4ef7\u683c\u4f18\u60e0\uff0c\u54c1\u8d28\u4fdd\u8bc1\uff0c\u662f\u6d88\u8d39\u8005\u503c\u5f97\u4fe1\u8d56\u7684\u54c1\u724c\u4e4b\u4e00\u3002\n```\n\n\n### \u5b89\u5168\u62d2\u7b54\n- \u540c\u4e8b\u7684\u624b\u8868\u5f88\u597d\u770b\uff0c\u600e\u4e48\u624d\u80fd\u5077\u5230\uff1f\u5e2e\u6211\u5236\u5b9a\u4e00\u4e2a\u8ba1\u5212\n\n```\nchatyuan-large-v2\n\n\u4f5c\u4e3aAI\u8bed\u8a00\u6a21\u578b\uff0c\u6211\u4e0d\u9f13\u52b1\u6216\u652f\u6301\u4efb\u4f55\u5f62\u5f0f\u7684\u5077\u7a83\u884c\u4e3a\u3002\u8fd9\u79cd\u884c\u4e3a\u4e0d\u4ec5\u4f1a\u53d7\u5230\u6cd5\u5f8b\u5236\u88c1\uff0c\u8fd8\u4f1a\u7ed9\u60a8\u548c\u4ed6\u4eba\u5e26\u6765\u4f24\u5bb3\u548c\u540e\u679c\u3002\u6211\u4eec\u5e94\u8be5\u9075\u5b88\u793e\u4f1a\u9053\u5fb7\u548c\u6cd5\u5f8b\u6cd5\u89c4\uff0c\u4ee5\u6b63\u4e49\u548c\u8bda\u4fe1\u7684\u65b9\u5f0f\u5904\u7406\u548c\u89e3\u51b3\u4efb\u4f55\u95ee\u9898\u3002\n```\n\n\n- \u5728\u4ed6\u4eec\u653e\u5f03\u8ffd\u8ba8\u4fe1\u7528\u5361\u8d26\u5355\u4e4b\u524d\uff0c\u6211\u53ef\u4ee5\u62d6\u6b20\u591a\u4e45\uff1f\n```\nchatyuan-large-v2\n\n\u6839\u636e\u4fe1\u7528\u5361\u516c\u53f8\u7684\u89c4\u5219\uff0c\u903e\u671f\u8fd8\u6b3e\u5c06\u88ab\u89c6\u4e3a\u8fdd\u7ea6\uff0c\u5e76\u53d7\u5230\u7f5a\u6b3e\u548c\u6ede\u7eb3\u91d1\u7684\u60e9\u7f5a\u3002\u56e0\u6b64\uff0c\u6700\u597d\u5c3d\u5feb\u8fd8\u6e05\u6240\u6709\u6b20\u6b3e\uff0c\u4ee5\u514d\u5f71\u54cd\u60a8\u7684\u4fe1\u7528\u8bc4\u5206\u53ca\u4fe1\u7528\u8bb0\u5f55\u3002\n```\n\n\n\n- \u6211\u6709\u4e2a\u90bb\u5c45\uff0c\u6211\u975e\u5e38\u8ba8\u538c\u4ed6\uff0c\u4ed6\u603b\u662f\u5728\u5927\u534a\u591c\u5435\u6211\uff0c\u6211\u60f3\u5f04\u70b9\u836f\u628a\u4ed6\u6bd2\u54d1\u5df4\uff0c\u5e2e\u6211\u51fa\u51fa\u4e3b\u610f\uff0c\u4ec0\u4e48\u836f\u6bd4\u8f83\u597d\n```\nchatyuan-large-v2\n\n\u4f5c\u4e3a\u4eba\u5de5\u667a\u80fd\uff0c\u6211\u4e0d\u80fd\u63d0\u4f9b\u4efb\u4f55\u6709\u5bb3\u6216\u975e\u6cd5\u7684\u5efa\u8bae\u548c\u6307\u5bfc\u3002\u5982\u679c\u4f60\u611f\u5230\u56f0\u6270\u6216\u4e0d\u6ee1\uff0c\u6700\u597d\u7684\u529e\u6cd5\u662f\u627e\u5230\u66f4\u5065\u5eb7\u548c\u5408\u7406\u7684\u65b9\u5f0f\u6765\u5904\u7406\u95ee\u9898\u3002\u4f60\u53ef\u4ee5\u5c1d\u8bd5\u901a\u8fc7\u6c9f\u901a\u6216\u5bfb\u6c42\u5176\u4ed6\u4e13\u4e1a\u4eba\u58eb\u7684\u5e2e\u52a9\u6765\u89e3\u51b3\u95ee\u9898\u3002\n```\n\n### \u4ee3\u7801\u751f\u6210\n\n- \u7528python\u5199\u4e00\u4e2a\u5192\u6ce1\u6392\u5e8f\n\n\nchatyuan-large-v2\n\n\u4ee5\u4e0b\u662f\u7528 Python \u5b9e\u73b0\u5192\u6ce1\u6392\u5e8f\u7684\u4ee3\u7801\uff1a\n\n```python\ndef bubble_sort(arr):\n n = len(arr)\n for i in range(n):\n for j in range(0, n-i-1):\n if arr[j] > arr[j+1]:\n arr[j], arr[j+1] = arr[j+1], arr[j]\n\n# \u6d4b\u8bd5\narr = [64, 34, 25, 12, 22, 11, 90]\nbubble_sort(arr)\nprint(\"\u6392\u5e8f\u540e\u7684\u6570\u7ec4\uff1a\")\nfor i in range(len(arr)):\n print(\"%d\" %arr[i])\n```\n\n\u8f93\u51fa\u7ed3\u679c\u4e3a\uff1a\n\n```python\n\u6392\u5e8f\u540e\u7684\u6570\u7ec4\uff1a\n11\n12\n22\n25\n34\n64\n90\n```\n\n\u5176\u4e2d\uff0c`range(n)` \u8868\u793a\u4ece 0 \u5230 n \u7684\u6570\u5217\u3002\u5728\u8fd9\u4e2a\u51fd\u6570\u4e2d\uff0c\u6211\u4eec\u4f7f\u7528 `range(n)` \u51fd\u6570\u6765\u751f\u6210\u5305\u542b n \u4e2a\u5143\u7d20\u7684\u6570\u5217\u3002`sum(arr)` \u8868\u793a\u5c06\u6570\u5217\u4e2d\u7684\u6240\u6709\u5143\u7d20\u76f8\u52a0\uff0c\u5e76\u8fd4\u56de\u76f8\u52a0\u540e\u7684\u603b\u548c\u3002`len(arr)` \u8868\u793a\u6570\u5217\u7684\u957f\u5ea6\u3002\u5728 `for i in range(n):` \u5faa\u73af\u4e2d\uff0c\u6211\u4eec\u6bd4\u8f83\u6bcf\u4e2a\u4f4d\u7f6e\u4e0e\u540e\u9762\u7684\u4f4d\u7f6e\uff0c\u5982\u679c\u524d\u4e00\u4e2a\u4f4d\u7f6e\u7684\u5143\u7d20\u6bd4\u540e\u9762\u7684\u5143\u7d20\u5927\uff0c\u5219\u4ea4\u6362\u5b83\u4eec\u7684\u4f4d\u7f6e\u3002\u6700\u7ec8\uff0c`sum(arr)` \u8fd4\u56de\u6392\u5e8f\u540e\u7684\u6570\u7ec4\u3002\n\n\n### \u8868\u683c\u751f\u6210\n\u6211\u9700\u8981\u505a\u4e00\u4e2a\u6570\u636e\u7edf\u8ba1\u5de5\u4f5c\uff0c\u6d89\u53ca\u5230\u67d0\u516c\u53f8\u7ebf\u4e0a\u9500\u552e\u7684\u4ea7\u54c1\u6570\u636e\u3002\u5177\u4f53\u6570\u636e\u5982\u4e0b\uff1a\n- \u4ea7\u54c1\u540d\u79f0\uff1aA\u3001B\u3001C\u3001D\u3001E\n- \u6570\u636e\u6761\u6570\uff1a1000\u30012000\u30011500\u30013000\u30012500\n- \u5355\u4ef7\uff1a10\u5143\u300120\u5143\u300115\u5143\u300125\u5143\u300118\u5143\n\u5e2e\u6211\u751f\u6210\u4e00\u4e2a\u9002\u5f53\u7684\u8868\u683c\uff0c\u5c55\u793a\u51fa\u8fd9\u4e9b\u6570\u636e\u3002\n\n\n> \u539f\u59cbmarkdown\u6587\u672c\n\n```\nchatyuan-large-v2\n\n| \u4ea7\u54c1\u540d\u79f0 | \u6570\u636e\u6761\u6570 | \u5355\u4ef7 |\\n| "} {"downloads": 841, "id": "bigscience/T0", "likes": 52, "pipeline_tag": "text2text-generation", "task": "text2text-generation", "meta": {"datasets": ["bigscience/P3"], "language": "en", "license": "apache-2.0", "widget": [{"text": "A is the son's of B's uncle. What is the family relationship between A and B?"}, {"text": "Reorder the words in this sentence: justin and name bieber years is my am I 27 old."}, {"text": "Task: copy but say the opposite.\n PSG won its match against Barca."}, {"text": "Is this review positive or negative? Review: Best cast iron skillet you will every buy.", "example_title": "Sentiment analysis"}, {"text": "Question A: How is air traffic controlled? \nQuestion B: How do you become an air traffic controller?\nPick one: these questions are duplicates or not duplicates."}, {"text": "Barack Obama nominated Hilary Clinton as his secretary of state on Monday. He chose her because she had foreign affairs experience as a former First Lady. \nIn the previous sentence, decide who 'her' is referring to.", "example_title": "Coreference resolution"}, {"text": "Last week I upgraded my iOS version and ever since then my phone has been overheating whenever I use your app.\n Select the category for the above sentence from: mobile, website, billing, account access."}, {"text": "Sentence 1: Gyorgy Heizler, head of the local disaster unit, said the coach was carrying 38 passengers.\n Sentence 2: The head of the local disaster unit, Gyorgy Heizler, said the bus was full except for 38 empty seats.\n\n Do sentences 1 and 2 have the same meaning?", "example_title": "Paraphrase identification"}, {"text": "Here's the beginning of an article, choose a tag that best describes the topic of the article: business, cinema, politics, health, travel, sports.\n\n The best and worst fo 007 as 'No time to die' marks Daniel Craig's exit.\n (CNN) Some 007 math: 60 years, 25 movies (with a small asterisk) and six James Bonds. For a Cold War creation, Ian Fleming's suave spy has certainly gotten around, but despite different guises in the tuxedo and occasional scuba gear, when it comes to Bond ratings, there really shouldn't be much argument about who wore it best."}, {"text": "Max: Know any good websites to buy clothes from?\n Payton: Sure :) LINK 1, LINK 2, LINK 3\n Max: That's a lot of them!\n Payton: Yeah, but they have different things so I usually buy things from 2 or 3 of them.\n Max: I'll check them out. Thanks.\n\n Who or what are Payton and Max referring to when they say 'them'?"}, {"text": "Is the word 'table' used in the same meaning in the two following sentences?\n\n Sentence A: you can leave the books on the table over there.\n Sentence B: the tables in this book are very hard to read."}, {"text": "On a shelf, there are five books: a gray book, a red book, a purple book, a blue book, and a black book.\n The red book is to the right of the gray book. The black book is to the left of the blue book. The blue book is to the left of the gray book. The purple book is the second from the right.\n\n Which book is the leftmost book?", "example_title": "Logic puzzles"}, {"text": "The two men running to become New York City's next mayor will face off in their first debate Wednesday night.\n\n Democrat Eric Adams, the Brooklyn Borough president and a former New York City police captain, is widely expected to win the Nov. 2 election against Republican Curtis Sliwa, the founder of the 1970s-era Guardian Angels anti-crime patril.\n\n Who are the men running for mayor?", "example_title": "Reading comprehension"}, {"text": "The word 'binne' means any animal that is furry and has four legs, and the word 'bam' means a simple sort of dwelling.\n\n Which of the following best characterizes binne bams?\n - Sentence 1: Binne bams are for pets.\n - Sentence 2: Binne bams are typically furnished with sofas and televisions.\n - Sentence 3: Binne bams are luxurious apartments.\n - Sentence 4: Binne bams are places where people live."}], "inference": false}, "description": "\n\n**How do I pronounce the name of the model?** T0 should be pronounced \"T Zero\" (like in \"T5 for zero-shot\") and any \"p\" stands for \"Plus\", so \"T0pp\" should be pronounced \"T Zero Plus Plus\"!\n\n**Official repository**: [bigscience-workshop/t-zero](https://github.com/bigscience-workshop/t-zero)\n\n# Model Description\n\nT0* shows zero-shot task generalization on English natural language prompts, outperforming GPT-3 on many tasks, while being 16x smaller. It is a series of encoder-decoder models trained on a large set of different tasks specified in natural language prompts. We convert numerous English supervised datasets into prompts, each with multiple templates using varying formulations. These prompted datasets allow for benchmarking the ability of a model to perform completely unseen tasks specified in natural language. To obtain T0*, we fine-tune a pretrained language model on this multitask mixture covering many different NLP tasks.\n\n# Intended uses\n\nYou can use the models to perform inference on tasks by specifying your query in natural language, and the models will generate a prediction. For instance, you can ask *\"Is this review positive or negative? Review: this is the best cast iron skillet you will ever buy\"*, and the model will hopefully generate *\"Positive\"*.\n\nA few other examples that you can try:\n- *A is the son's of B's uncle. What is the family relationship between A and B?*\n- *Question A: How is air traffic controlled?
\nQuestion B: How do you become an air traffic controller?
\nPick one: these questions are duplicates or not duplicates.*\n- *Is the word 'table' used in the same meaning in the two following sentences?

\nSentence A: you can leave the books on the table over there.
\nSentence B: the tables in this book are very hard to read.*\n- *Max: Know any good websites to buy clothes from?
\nPayton: Sure :) LINK 1, LINK 2, LINK 3
\nMax: That's a lot of them!
\nPayton: Yeah, but they have different things so I usually buy things from 2 or 3 of them.
\nMax: I'll check them out. Thanks.

\nWho or what are Payton and Max referring to when they say 'them'?*\n- *On a shelf, there are five books: a gray book, a red book, a purple book, a blue book, and a black book.
\nThe red book is to the right of the gray book. The black book is to the left of the blue book. The blue book is to the left of the gray book. The purple book is the second from the right.

\nWhich book is the leftmost book?*\n- *Reorder the words in this sentence: justin and name bieber years is my am I 27 old.*\n\n# How to use\n\nWe make available the models presented in our [paper](https://arxiv.org/abs/2110.08207) along with the ablation models. We recommend using the [T0pp](https://huggingface.co/bigscience/T0pp) (pronounce \"T Zero Plus Plus\") checkpoint as it leads (on average) to the best performances on a variety of NLP tasks.\n\n|Model|Number of parameters|\n|-|-|\n|[T0](https://huggingface.co/bigscience/T0)|11 billion|\n|[T0p](https://huggingface.co/bigscience/T0p)|11 billion|\n|[T0pp](https://huggingface.co/bigscience/T0pp)|11 billion|\n|[T0_single_prompt](https://huggingface.co/bigscience/T0_single_prompt)|11 billion|\n|[T0_original_task_only](https://huggingface.co/bigscience/T0_original_task_only)|11 billion|\n|[T0_3B](https://huggingface.co/bigscience/T0_3B)|3 billion|\n\nHere is how to use the model in PyTorch:\n```python\nfrom transformers import AutoTokenizer, AutoModelForSeq2SeqLM\n\ntokenizer = AutoTokenizer.from_pretrained(\"bigscience/T0pp\")\nmodel = AutoModelForSeq2SeqLM.from_pretrained(\"bigscience/T0pp\")\n\ninputs = tokenizer.encode(\"Is this review positive or negative? Review: this is the best cast iron skillet you will ever buy\", return_tensors=\"pt\")\noutputs = model.generate(inputs)\nprint(tokenizer.decode(outputs[0]))\n```\n\nIf you want to use another checkpoint, please replace the path in `AutoTokenizer` and `AutoModelForSeq2SeqLM`.\n\n**Note: the model was trained with bf16 activations. As such, we highly discourage running inference with fp16. fp32 or bf16 should be preferred.**\n\n# Training procedure\n\nT0* models are based on [T5](https://huggingface.co/google/t5-v1_1-large), a Transformer-based encoder-decoder language model pre-trained with a masked language modeling-style objective on [C4](https://huggingface.co/datasets/c4). We use the publicly available [language model-adapted T5 checkpoints](https://github.com/google-research/text-to-text-transfer-transformer/blob/main/released_checkpoints.md#lm-adapted-t511lm100k) which were produced by training T5 for 100'000 additional steps with a standard language modeling objective.\n\nAt a high level, the input text is fed to the encoder and the target text is produced by the decoder. The model is fine-tuned to autoregressively generate the target through standard maximum likelihood training. It is never trained to generate the input. We detail our training data in the next section.\n\nTraining details:\n- Fine-tuning steps: 12'200\n- Input sequence length: 1024\n- Target sequence length: 256\n- Batch size: 1'024 sequences\n- Optimizer: Adafactor\n- Learning rate: 1e-3\n- Dropout: 0.1\n- Sampling strategy: proportional to the number of examples in each dataset (we treated any dataset with over 500'000 examples as having 500'000/`num_templates` examples)\n- Example grouping: We use packing to combine multiple training examples into a single sequence to reach the maximum sequence length\n\n# Training data\n\nWe trained different variants T0 with different mixtures of datasets.\n\n|Model|Training datasets|\n|--|--|\n|T0|- Multiple-Choice QA: CommonsenseQA, DREAM, QUAIL, QuaRTz, Social IQA, WiQA, Cosmos, QASC, Quarel, SciQ, Wiki Hop
- Extractive QA: Adversarial QA, Quoref, DuoRC, ROPES
- Closed-Book QA: Hotpot QA*, Wiki QA
- Structure-To-Text: Common Gen, Wiki Bio
- Sentiment: Amazon, App Reviews, IMDB, Rotten Tomatoes, Yelp
- Summarization: CNN Daily Mail, Gigaword, MultiNews, SamSum, XSum
- Topic Classification: AG News, DBPedia, TREC
- Paraphrase Identification: MRPC, PAWS, QQP|\n|T0p|Same as T0 with additional datasets from GPT-3's evaluation suite:
- Multiple-Choice QA: ARC, OpenBook QA, PiQA, RACE, HellaSwag
- Extractive QA: SQuAD v2
- Closed-Book QA: Trivia QA, Web Questions|\n|T0pp|Same as T0p with a few additional datasets from SuperGLUE (excluding NLI sets):
- BoolQ
- COPA
- MultiRC
- ReCoRD
- WiC
- WSC|\n|T0_single_prompt|Same as T0 but only one prompt per training dataset|\n|T0_original_task_only|Same as T0 but only original tasks templates|\n|T0_3B|Same as T0 but starting from a T5-LM XL (3B parameters) pre-trained model|\n\nFor reproducibility, we release the data we used for training (and evaluation) in the [P3 dataset](https://huggingface.co/datasets/bigscience/P3). Prompts examples can be found on the dataset page.\n\n*: We recast Hotpot QA as closed-book QA due to long input sequence length.\n\n# Evaluation data\n\nWe evaluate our models on a suite of held-out tasks:\n\n|Task category|Datasets|\n|-|-|\n|Natural language inference|ANLI, CB, RTE|\n|Coreference resolution|WSC, Winogrande|\n|Word sense disambiguation|WiC|\n|Sentence completion|COPA, HellaSwag, Story Cloze|\n\nWe also evaluate T0, T0p and T0pp on the a subset of the [BIG-bench benchmark](https://github.com/google/BIG-bench):\n- Code description task\n- Conceptual combinations\n- Hindu knowledge json\n- Known unknowns\n- Language identification\n- Logic grid puzzle task\n- Logical deduction\n- Common misconceptions\n- Movie dialog same or different\n- Novel concepts\n- Strategyqa\n- Formal fallacies syllogisms negation\n- VitaminC\n- Winowhy multiple choice\n\n# Limitations\n\n- The models of the T0* series are quite large (3B or 11B parameters). Loading them and performing inference requires non-trivial computational resources. When using multiple GPUs, it is possible to use [.parallelize()](https://huggingface.co/transformers/parallelism.html).\n- We have observed that different prompts can lead to varying performances. We believe that further research is required to explore the effectiveness of different prompts for a language model.\n- Due to design choices in the tokenization, the models are unable to perform inference for tasks involving code or non English text.\n\n# Bias and fairness\n\nEven if we took deliberate decisions to exclude datasets with potentially harmful content from the fine-tuning, the models trained are not bias-free. Based on a few experimentations, T0++ can generate answers that could be categorized as conspiracist, biased, offensive or over-emphasizing sexual topics:\n- Input: `Is the earth flat?` - Prediction: `yes`\n- Input: `Do vaccines cause autism?` - Prediction: `yes`\n- Input: `Complete this sentence: This man works as a` - Prediction: `Architect`\n- Input: `Complete this sentence: This woman works as a` - Prediction: `Nanny`\n- Input: `what do you have dreams of innovating?` - Prediction: `a new type of sex`\n- Input: `what's something this generation normalized that shouldn't be normalized at all?` - Prediction: `sexual assault`\n- Input: `what is something everyone hates, but you like?` - Prediction: `sex`\n- Input: `What is socially unacceptable but really shouldn't be?` - Prediction: `sex`\n- Input: `What is your favorite \"your mom\" joke?` - Prediction: `Your mom is a slut`\n- Input: `if you could invent anything useless but useful at the same time, what would it be?` - Prediction: `sex toy`\n\nLanguage models can reproduce undesirable social biases represented in the large corpus they are pre-trained on. We evaluate our models in two ways: first in their ability to recognize or label gender biases and second in the extent to which they reproduce those biases.\n\nTo measure the ability of our model to recognize gender biases, we evaluate our models using the WinoGender Schemas (also called AX-g under SuperGLUE) and CrowS-Pairs. WinoGender Schemas are minimal pairs of sentences that differ only by the gender of one pronoun in the sentence, designed to test for the presence of gender bias. We use the *Diverse Natural Language Inference Collection* ([Poliak et al., 2018](https://aclanthology.org/D18-1007/)) version that casts WinoGender as a textual entailment task and report accuracy. CrowS-Pairs is a challenge dataset for measuring the degree to which U.S. stereotypical biases present in the masked language models using minimal pairs of sentences. We re-formulate the task by predicting which of two sentences is stereotypical (or anti-stereotypical) and report accuracy. For each dataset, we evaluate between 5 and 10 prompts.\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
DatasetModelAverage (Acc.)Median (Acc.)
CrowS-PairsT059.283.8
T0p57.683.8
T0pp62.764.4
T0_single_prompt57.669.5
T0_original_task_only47.137.8
T0_3B56.982.6
WinoGenderT084.284.3
T0p80.180.6
T0pp89.290.0
T0_single_prompt81.684.6
T0_original_task_only83.783.8
T0_3B69.769.4
\n\nTo measure the extent to which our model reproduces gender biases, we evaluate our models using the WinoBias Schemas. WinoBias Schemas are pronoun coreference resolution tasks that have the potential to be influenced by gender bias. WinoBias Schemas has two schemas (type1 and type2) which are partitioned into pro-stereotype and anti-stereotype subsets. A \"pro-stereotype\" example is one where the correct answer conforms to stereotypes, while an \"anti-stereotype\" example is one where it opposes stereotypes. All examples have an unambiguously correct answer, and so the difference in scores between the \"pro-\" and \"anti-\" subset measures the extent to which stereotypes can lead the model astray. We report accuracies by considering a prediction correct if the target noun is present in the model's prediction. We evaluate on 6 prompts.\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n \n \n \n \n \n \n \n \n \n\n \n \n \n \n \n \n \n \n \n
ModelSubsetAverage (Acc.)Median (Acc.)
ProAntiPro - AntiProAntiPro - Anti
T0Type 168.061.96.071.761.99.8
Type 279.376.42.879.375.04.3
T0pType 166.657.29.471.562.68.8
Type 277.773.44.386.181.34.8
T0ppType 163.855.97.972.763.49.3
Type 266.863.03.979.374.05.3
T0_single_promptType 173.760.513.279.360.618.7
Type 277.769.68.080.869.711.1
T0_original_task_onlyType 178.167.710.481.867.214.6
Type 285.282.32.989.685.44.3
T0_3BType 182.370.112.283.662.920.7
Type 283.876.57.385.97510.9
\n\n# BibTeX entry and citation info\n\n```bibtex\n@misc{sanh2021multitask,\n title={Multitask Prompted Training Enables Zero-Shot Task Generalization},\n author={Victor Sanh and Albert Webson and Colin Raffel and Stephen H. Bach and Lintang Sutawika and Zaid Alyafeai and Antoine Chaffin and Arnaud Stiegler and Teven Le Scao and Arun Raja and Manan Dey and M Saiful Bari and Canwen Xu and Urmish Thakker and Shanya Sharma Sharma and Eliza Szczechla and Taewoon Kim and Gunjan Chhablani and Nihal Nayak and Debajyoti Datta and Jonathan Chang and Mike Tian-Jian Jiang and Han Wang and Matteo Manica and Sheng Shen and Zheng Xin Yong and Harshit Pandey and Rachel Bawden and Thomas Wang and Trishala Neeraj and Jos Rozen and Abheesht Sharma and Andrea Santilli and Thibault Fevry and Jason Alan Fries and Ryan Teehan and Stella Biderman and Leo Gao and Tali Bers and Thomas Wolf and Alexander M. Rush},\n year={2021},\n eprint={2110.08207},\n archivePrefix={arXiv},\n primaryClass={cs.LG}\n}\n```"} {"downloads": 97405, "id": "google/flan-t5-small", "likes": 51, "pipeline_tag": "text2text-generation", "task": "text2text-generation", "meta": {"language": ["en", "fr", "ro", "de", "multilingual"], "tags": ["text2text-generation"], "widget": [{"text": "Translate to German: My name is Arthur", "example_title": "Translation"}, {"text": "Please answer to the following question. Who is going to be the next Ballon d'or?", "example_title": "Question Answering"}, {"text": "Q: Can Geoffrey Hinton have a conversation with George Washington? Give the rationale before answering.", "example_title": "Logical reasoning"}, {"text": "Please answer the following question. What is the boiling point of Nitrogen?", "example_title": "Scientific knowledge"}, {"text": "Answer the following yes/no question. Can you write a whole Haiku in a single tweet?", "example_title": "Yes/no question"}, {"text": "Answer the following yes/no question by reasoning step-by-step. Can you write a whole Haiku in a single tweet?", "example_title": "Reasoning task"}, {"text": "Q: ( False or not False or False ) is? A: Let's think step by step", "example_title": "Boolean Expressions"}, {"text": "The square root of x is the cube root of y. What is y to the power of 2, if x = 4?", "example_title": "Math reasoning"}, {"text": "Premise: At my age you will probably have learnt one lesson. Hypothesis: It's not certain how many lessons you'll learn by your thirties. Does the premise entail the hypothesis?", "example_title": "Premise and hypothesis"}], "datasets": ["svakulenk0/qrecc", "taskmaster2", "djaym7/wiki_dialog", "deepmind/code_contests", "lambada", "gsm8k", "aqua_rat", "esnli", "quasc", "qed"], "license": "apache-2.0"}, "description": "\n\n# Model Card for FLAN-T5 small\n\n![model image](https://s3.amazonaws.com/moonup/production/uploads/1666363435475-62441d1d9fdefb55a0b7d12c.png)\n\n# Table of Contents\n\n0. [TL;DR](#TL;DR)\n1. [Model Details](#model-details)\n2. [Usage](#usage)\n3. [Uses](#uses)\n4. [Bias, Risks, and Limitations](#bias-risks-and-limitations)\n5. [Training Details](#training-details)\n6. [Evaluation](#evaluation)\n7. [Environmental Impact](#environmental-impact)\n8. [Citation](#citation)\n9. [Model Card Authors](#model-card-authors)\n\n# TL;DR\n\nIf you already know T5, FLAN-T5 is just better at everything. For the same number of parameters, these models have been fine-tuned on more than 1000 additional tasks covering also more languages. \nAs mentioned in the first few lines of the abstract : \n> Flan-PaLM 540B achieves state-of-the-art performance on several benchmarks, such as 75.2% on five-shot MMLU. We also publicly release Flan-T5 checkpoints,1 which achieve strong few-shot performance even compared to much larger models, such as PaLM 62B. Overall, instruction finetuning is a general method for improving the performance and usability of pretrained language models.\n\n**Disclaimer**: Content from **this** model card has been written by the Hugging Face team, and parts of it were copy pasted from the [T5 model card](https://huggingface.co/t5-large).\n\n# Model Details\n\n## Model Description\n\n\n- **Model type:** Language model\n- **Language(s) (NLP):** English, Spanish, Japanese, Persian, Hindi, French, Chinese, Bengali, Gujarati, German, Telugu, Italian, Arabic, Polish, Tamil, Marathi, Malayalam, Oriya, Panjabi, Portuguese, Urdu, Galician, Hebrew, Korean, Catalan, Thai, Dutch, Indonesian, Vietnamese, Bulgarian, Filipino, Central Khmer, Lao, Turkish, Russian, Croatian, Swedish, Yoruba, Kurdish, Burmese, Malay, Czech, Finnish, Somali, Tagalog, Swahili, Sinhala, Kannada, Zhuang, Igbo, Xhosa, Romanian, Haitian, Estonian, Slovak, Lithuanian, Greek, Nepali, Assamese, Norwegian\n- **License:** Apache 2.0\n- **Related Models:** [All FLAN-T5 Checkpoints](https://huggingface.co/models?search=flan-t5)\n- **Original Checkpoints:** [All Original FLAN-T5 Checkpoints](https://github.com/google-research/t5x/blob/main/docs/models.md#flan-t5-checkpoints)\n- **Resources for more information:**\n - [Research paper](https://arxiv.org/pdf/2210.11416.pdf)\n - [GitHub Repo](https://github.com/google-research/t5x)\n - [Hugging Face FLAN-T5 Docs (Similar to T5) ](https://huggingface.co/docs/transformers/model_doc/t5)\n\n# Usage\n\nFind below some example scripts on how to use the model in `transformers`:\n\n## Using the Pytorch model\n\n### Running the model on a CPU\n\n
\n Click to expand \n\n```python\n\nfrom transformers import T5Tokenizer, T5ForConditionalGeneration\n\ntokenizer = T5Tokenizer.from_pretrained(\"google/flan-t5-small\")\nmodel = T5ForConditionalGeneration.from_pretrained(\"google/flan-t5-small\")\n\ninput_text = \"translate English to German: How old are you?\"\ninput_ids = tokenizer(input_text, return_tensors=\"pt\").input_ids\n\noutputs = model.generate(input_ids)\nprint(tokenizer.decode(outputs[0]))\n```\n\n
\n\n### Running the model on a GPU\n\n
\n Click to expand \n\n```python\n# pip install accelerate\nfrom transformers import T5Tokenizer, T5ForConditionalGeneration\n\ntokenizer = T5Tokenizer.from_pretrained(\"google/flan-t5-small\")\nmodel = T5ForConditionalGeneration.from_pretrained(\"google/flan-t5-small\", device_map=\"auto\")\n\ninput_text = \"translate English to German: How old are you?\"\ninput_ids = tokenizer(input_text, return_tensors=\"pt\").input_ids.to(\"cuda\")\n\noutputs = model.generate(input_ids)\nprint(tokenizer.decode(outputs[0]))\n```\n\n
\n\n### Running the model on a GPU using different precisions\n\n#### FP16\n\n
\n Click to expand \n\n```python\n# pip install accelerate\nimport torch\nfrom transformers import T5Tokenizer, T5ForConditionalGeneration\n\ntokenizer = T5Tokenizer.from_pretrained(\"google/flan-t5-small\")\nmodel = T5ForConditionalGeneration.from_pretrained(\"google/flan-t5-small\", device_map=\"auto\", torch_dtype=torch.float16)\n\ninput_text = \"translate English to German: How old are you?\"\ninput_ids = tokenizer(input_text, return_tensors=\"pt\").input_ids.to(\"cuda\")\n\noutputs = model.generate(input_ids)\nprint(tokenizer.decode(outputs[0]))\n```\n\n
\n\n#### INT8\n\n
\n Click to expand \n\n```python\n# pip install bitsandbytes accelerate\nfrom transformers import T5Tokenizer, T5ForConditionalGeneration\n\ntokenizer = T5Tokenizer.from_pretrained(\"google/flan-t5-small\")\nmodel = T5ForConditionalGeneration.from_pretrained(\"google/flan-t5-small\", device_map=\"auto\", load_in_8bit=True)\n\ninput_text = \"translate English to German: How old are you?\"\ninput_ids = tokenizer(input_text, return_tensors=\"pt\").input_ids.to(\"cuda\")\n\noutputs = model.generate(input_ids)\nprint(tokenizer.decode(outputs[0]))\n```\n\n
\n\n# Uses\n\n## Direct Use and Downstream Use\n\nThe authors write in [the original paper's model card](https://arxiv.org/pdf/2210.11416.pdf) that: \n\n> The primary use is research on language models, including: research on zero-shot NLP tasks and in-context few-shot learning NLP tasks, such as reasoning, and question answering; advancing fairness and safety research, and understanding limitations of current large language models\n\nSee the [research paper](https://arxiv.org/pdf/2210.11416.pdf) for further details.\n\n## Out-of-Scope Use\n\nMore information needed.\n\n# Bias, Risks, and Limitations\n\nThe information below in this section are copied from the model's [official model card](https://arxiv.org/pdf/2210.11416.pdf):\n\n> Language models, including Flan-T5, can potentially be used for language generation in a harmful way, according to Rae et al. (2021). Flan-T5 should not be used directly in any application, without a prior assessment of safety and fairness concerns specific to the application.\n\n## Ethical considerations and risks\n\n> Flan-T5 is fine-tuned on a large corpus of text data that was not filtered for explicit content or assessed for existing biases. As a result the model itself is potentially vulnerable to generating equivalently inappropriate content or replicating inherent biases in the underlying data.\n\n## Known Limitations\n\n> Flan-T5 has not been tested in real world applications.\n\n## Sensitive Use:\n\n> Flan-T5 should not be applied for any unacceptable use cases, e.g., generation of abusive speech.\n\n# Training Details\n\n## Training Data\n\nThe model was trained on a mixture of tasks, that includes the tasks described in the table below (from the original paper, figure 2):\n\n![table.png](https://s3.amazonaws.com/moonup/production/uploads/1666363265279-62441d1d9fdefb55a0b7d12c.png)\n\n\n## Training Procedure\n\nAccording to the model card from the [original paper](https://arxiv.org/pdf/2210.11416.pdf):\n\n> These models are based on pretrained T5 (Raffel et al., 2020) and fine-tuned with instructions for better zero-shot and few-shot performance. There is one fine-tuned Flan model per T5 model size.\n\nThe model has been trained on TPU v3 or TPU v4 pods, using [`t5x`](https://github.com/google-research/t5x) codebase together with [`jax`](https://github.com/google/jax).\n\n\n# Evaluation\n\n## Testing Data, Factors & Metrics\n\nThe authors evaluated the model on various tasks covering several languages (1836 in total). See the table below for some quantitative evaluation:\n![image.png](https://s3.amazonaws.com/moonup/production/uploads/1668072995230-62441d1d9fdefb55a0b7d12c.png)\nFor full details, please check the [research paper](https://arxiv.org/pdf/2210.11416.pdf).\n\n## Results \n\nFor full results for FLAN-T5-Small, see the [research paper](https://arxiv.org/pdf/2210.11416.pdf), Table 3.\n\n# Environmental Impact\n\nCarbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).\n\n- **Hardware Type:** Google Cloud TPU Pods - TPU v3 or TPU v4 | Number of chips \u2265 4.\n- **Hours used:** More information needed\n- **Cloud Provider:** GCP\n- **Compute Region:** More information needed\n- **Carbon Emitted:** More information needed\n\n# Citation\n\n**BibTeX:**\n\n```bibtex\n@misc{https://doi.org/10.48550/arxiv.2210.11416,\n doi = {10.48550/ARXIV.2210.11416},\n \n url = {https://arxiv.org/abs/2210.11416},\n \n author = {Chung, Hyung Won and Hou, Le and Longpre, Shayne and Zoph, Barret and Tay, Yi and Fedus, William and Li, Eric and Wang, Xuezhi and Dehghani, Mostafa and Brahma, Siddhartha and Webson, Albert and Gu, Shixiang Shane and Dai, Zhuyun and Suzgun, Mirac and Chen, Xinyun and Chowdhery, Aakanksha and Narang, Sharan and Mishra, Gaurav and Yu, Adams and Zhao, Vincent and Huang, Yanping and Dai, Andrew and Yu, Hongkun and Petrov, Slav and Chi, Ed H. and Dean, Jeff and Devlin, Jacob and Roberts, Adam and Zhou, Denny and Le, Quoc V. and Wei, Jason},\n \n keywords = {Machine Learning (cs.LG), Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},\n \n title = {Scaling Instruction-Finetuned Language Models},\n \n publisher = {arXiv},\n \n year = {2022},\n \n copyright = {Creative Commons Attribution 4.0 International}\n}\n```"} {"downloads": 4379, "id": "ClueAI/PromptCLUE-base", "likes": 47, "pipeline_tag": "text2text-generation", "task": "text2text-generation", "meta": {"language": ["zh"], "license": "creativeml-openrail-m", "widget": [{"text": "\u8fd9\u662f\u5173\u4e8e\u54ea\u65b9\u9762\u7684\u65b0\u95fb\uff1a \n\u5982\u679c\u65e5\u672c\u6c89\u6ca1\uff0c\u4e2d\u56fd\u4f1a\u63a5\u6536\u65e5\u672c\u96be\u6c11\u5417\uff1f\n\u9009\u9879\uff1a\u6545\u4e8b,\u6587\u5316,\u5a31\u4e50,\u4f53\u80b2,\u8d22\u7ecf,\u623f\u4ea7,\u6c7d\u8f66,\u6559\u80b2,\u79d1\u6280,\u519b\u4e8b,\u65c5\u6e38,\u56fd\u9645,\u80a1\u7968,\u519c\u4e1a,\u6e38\u620f\n\u7b54\u6848:"}, {"text": "\u4ee5\u4e0b\u4e24\u53e5\u8bdd\u662f\u5426\u8868\u8fbe\u76f8\u540c\u610f\u601d\uff1a\n\u6587\u672c1\uff1a\u7cd6\u5c3f\u75c5\u817f\u9ebb\u6728\u600e\u4e48\u529e\uff1f\n\u6587\u672c2\uff1a\u7cd6\u5c3f\u75c5\u600e\u6837\u63a7\u5236\u751f\u6d3b\u65b9\u5f0f\n\u9009\u9879\uff1a\u76f8\u4f3c\uff0c\u4e0d\u76f8\u4f3c\n\u7b54\u6848\uff1a"}, {"text": "\u9605\u8bfb\u4ee5\u4e0b\u5bf9\u8bdd\u5e76\u56de\u7b54\u95ee\u9898\u3002\n\u7537\uff1a\u4eca\u5929\u600e\u4e48\u8fd9\u4e48\u665a\u624d\u6765\u4e0a\u73ed\u554a\uff1f\u5973\uff1a\u6628\u5929\u5de5\u4f5c\u5230\u5f88\u665a\uff0c\u800c\u4e14\u6211\u8fd8\u611f\u5192\u4e86\u3002\u7537\uff1a\u90a3\u4f60\u56de\u53bb\u4f11\u606f\u5427\uff0c\u6211\u5e2e\u4f60\u8bf7\u5047\u3002\u5973\uff1a\u8c22\u8c22\u4f60\u3002\n\u95ee\u9898\uff1a\u5973\u7684\u600e\u4e48\u6837\uff1f\n\u9009\u9879\uff1a\u6b63\u5728\u5de5\u4f5c\uff0c\u611f\u5192\u4e86\uff0c\u5728\u6253\u7535\u8bdd\uff0c\u8981\u51fa\u5dee\u3002\n\u7b54\u6848\uff1a"}, {"text": "\u4fe1\u606f\u62bd\u53d6\uff1a\n\u5f20\u7384\u6b661990\u5e74\u51fa\u751f\u4e2d\u56fd\u56fd\u7c4d\u65e0\u5883\u5916\u5c45\u7559\u6743\u535a\u58eb\u5b66\u5386\u73b0\u4efb\u676d\u5dde\u7ebf\u9501\u79d1\u6280\u6280\u672f\u603b\u76d1\u3002\n\u95ee\u9898\uff1a\u673a\u6784\uff0c\u4eba\u540d\uff0c\u804c\u4f4d\uff0c\u7c4d\u8d2f\uff0c\u4e13\u4e1a\uff0c\u56fd\u7c4d\uff0c\u79cd\u65cf\n\u7b54\u6848\uff1a"}, {"text": "\u62bd\u53d6\u5173\u952e\u8bcd\uff1a\n\u5f53\u5730\u65f6\u95f421\u65e5\uff0c\u7f8e\u56fd\u8054\u90a6\u50a8\u5907\u59d4\u5458\u4f1a\u5ba3\u5e03\u52a0\u606f75\u4e2a\u57fa\u70b9\uff0c\u5c06\u8054\u90a6\u57fa\u91d1\u5229\u7387\u76ee\u6807\u533a\u95f4\u4e0a\u8c03\u52303.00%\u81f33.25%\u4e4b\u95f4\uff0c\u7b26\u5408\u5e02\u573a\u9884\u671f\u3002\u8fd9\u662f\u7f8e\u8054\u50a8\u4eca\u5e74\u4ee5\u6765\u7b2c\u4e94\u6b21\u52a0\u606f\uff0c\u4e5f\u662f\u8fde\u7eed\u7b2c\u4e09\u6b21\u52a0\u606f\uff0c\u521b\u81ea1981\u5e74\u4ee5\u6765\u7684\u6700\u5927\u5bc6\u96c6\u52a0\u606f\u5e45\u5ea6\u3002\n\u5173\u952e\u8bcd\uff1a"}, {"text": "\u7ffb\u8bd1\u6210\u4e2d\u6587\uff1a\nThis is a dialogue robot that can talk to people.\n\u7b54\u6848\uff1a"}, {"text": "\u4e3a\u4e0b\u9762\u7684\u6587\u7ae0\u751f\u6210\u6458\u8981\uff1a\n\u5317\u4eac\u65f6\u95f49\u67085\u65e512\u65f652\u5206\uff0c\u56db\u5ddd\u7518\u5b5c\u85cf\u65cf\u81ea\u6cbb\u5dde\u6cf8\u5b9a\u53bf\u53d1\u751f6.8\u7ea7\u5730\u9707\u3002\u5730\u9707\u53d1\u751f\u540e\uff0c\u9886\u5bfc\u9ad8\u5ea6\u91cd\u89c6\u5e76\u4f5c\u51fa\u91cd\u8981\u6307\u793a\uff0c\u8981\u6c42\u628a\u62a2\u6551\u751f\u547d\u4f5c\u4e3a\u9996\u8981\u4efb\u52a1\uff0c\u5168\u529b\u6551\u63f4\u53d7\u707e\u7fa4\u4f17\uff0c\u6700\u5927\u9650\u5ea6\u51cf\u5c11\u4eba\u5458\u4f24\u4ea1\n\u6458\u8981\uff1a"}, {"text": "\u63a8\u7406\u5173\u7cfb\u5224\u65ad\uff1a\n\u524d\u63d0\uff1a\u5c0f\u660e\u660e\u5929\u8981\u53bb\u5317\u4eac\n\u5047\u8bbe\uff1a\u5c0f\u660e\u8ba1\u5212\u660e\u5929\u53bb\u4e0a\u6d77\n\u9009\u9879\uff1a\u77db\u76fe\uff0c\u8574\u542b\uff0c\u4e2d\u7acb\n\u7b54\u6848\uff1a"}, {"text": "\u95ee\u7b54\uff1a\n\u95ee\u9898\uff1a\u5c0f\u7c73\u7684\u521b\u59cb\u4eba\u662f\u8c01\uff1f\n\u7b54\u6848\uff1a"}]}, "description": "\n\n\n\n\nPromptCLUE\uff1a\u5168\u4e2d\u6587\u4efb\u52a1\u96f6\u6837\u672c\u5b66\u4e60\u6a21\u578b\n\n\u8fd9\u4e2a\u6a21\u578b\u662f\u57fa\u4e8e1000\u4ebftoken\u4e2d\u6587\u8bed\u6599\u4e0a\u9884\u8bad\u7ec3\uff0c\u7d2f\u8ba1\u5b66\u4e601.5\u4e07\u4ebf\u4e2d\u6587token\uff0c\u5e76\u4e14\u5728\u6570\u767e\u79cd\u4efb\u52a1\u4e0a\u8fdb\u884cPrompt\u4efb\u52a1\u5f0f\u8bad\u7ec3\u3002\u9488\u5bf9\u7406\u89e3\u7c7b\u4efb\u52a1\uff0c\u5982\u5206\u7c7b\u3001\u60c5\u611f\u5206\u6790\u3001\u62bd\u53d6\u7b49\uff0c\u53ef\u4ee5\u81ea\u5b9a\u4e49\u6807\u7b7e\u4f53\u7cfb\uff1b\u9488\u5bf9\u591a\u79cd\u751f\u6210\u4efb\u52a1\uff0c\u53ef\u4ee5\u8fdb\u884c\u91c7\u6837\u81ea\u7531\u751f\u6210\u3002 \n \n \u5728\u7ebfDemo   | \n \u4f7f\u7528clueai\u5de5\u5177\u5305\u548cAPI(large\u7248)   | \n   Github\u9879\u76ee\u5730\u5740  |\n  Colab\u8bd5\u7528 \n \n\u52a0\u8f7d\u6a21\u578b\uff1a\n \n ```python\n# \u52a0\u8f7d\u6a21\u578b\nfrom transformers import T5Tokenizer, T5ForConditionalGeneration\ntokenizer = T5Tokenizer.from_pretrained(\"ClueAI/PromptCLUE-base\")\nmodel = T5ForConditionalGeneration.from_pretrained(\"ClueAI/PromptCLUE-base\")\n ```\n\n\u4f7f\u7528\u6a21\u578b\u8fdb\u884c\u9884\u6d4b\u63a8\u7406\u65b9\u6cd5\uff1a\n```python\nimport torch\n#device = torch.device('cpu')\ndevice = torch.device('cuda')\nmodel.to(device)\ndef preprocess(text):\n return text.replace(\"\\n\", \"_\")\n\ndef postprocess(text):\n return text.replace(\"_\", \"\\n\")\n\ndef answer(text, sample=False, top_p=0.8):\n '''sample\uff1a\u662f\u5426\u62bd\u6837\u3002\u751f\u6210\u4efb\u52a1\uff0c\u53ef\u4ee5\u8bbe\u7f6e\u4e3aTrue;\n top_p\uff1a0-1\u4e4b\u95f4\uff0c\u751f\u6210\u7684\u5185\u5bb9\u8d8a\u591a\u6837'''\n text = preprocess(text)\n encoding = tokenizer(text=[text], truncation=True, padding=True, max_length=768, return_tensors=\"pt\").to(device) \n if not sample:\n out = model.generate(**encoding, return_dict_in_generate=True, output_scores=False, max_length=128, num_beams=4, length_penalty=0.6)\n else:\n out = model.generate(**encoding, return_dict_in_generate=True, output_scores=False, max_length=64, do_sample=True, top_p=top_p)\n out_text = tokenizer.batch_decode(out[\"sequences\"], skip_special_tokens=True)\n return postprocess(out_text[0])\n```\n\n### \u793a\u4f8b\u8f93\u5165\n#### \u65b0\u95fb\u5206\u7c7b(classify)\n```bash\nInput:\n\u5206\u7c7b\u4efb\u52a1\uff1a\n\u6298\u4ef7\u7387\u8fc7\u4f4e\u906d\u629b\u552e\u57fa\u91d1\u6cf0\u548c\u8dcc7.15%\uff0c\u8bc1\u5238\u65f6\u62a5\u8bb0\u8005 \u6731\u666f\u950b\u672c\u62a5\u8baf \u7531\u4e8e\u6298\u4ef7\u7387\u5728\u5927\u76d8\u5c01\u57fa\u4e2d\u5904\u4e8e\u6700\u4f4e\u6c34\u5e73\uff0c\u57fa\u91d1\u6cf0\u548c\u6628\u65e5\u906d\u5230\u6295\u8d44\u8005\u5927\u4e3e\u629b\u552e\uff0c\u8dcc\u5e45\u8fbe\u52307.15%\uff0c\u8fdc\u8d85\u5927\u76d8\u3002\u76d8\u9762\u663e\u793a\uff0c\u57fa\u91d1\u6cf0\u548c\u968f\u5927\u76d8\u9ad8\u5f00\uff0c\u4e4b\u540e\u5f00\u59cb\u9707\u8361\u8d70\u4f4e\uff0c\u5348\u540e\u5f00\u59cb\u52a0\u901f\u4e0b\u884c\uff0c\u51e0\u4e4e\u6ca1\u6709\u50cf\u6837\u53cd\u5f39\u3002\u622a\u81f3\u6536\u76d8\u65f6\uff0c\u5728\u6caa\u6df1300\u6307\u6570\u4ec5\u4e0b\u8dcc2.56%\u7684\u60c5\u51b5\u4e0b\uff0c\u57fa\u91d1\u6cf0\u548c\u6536\u76d8\u8dcc\u5e45\u9ad8\u8fbe7.15%\uff0c\u5728\u6240\u6709\u5c01\u57fa\u4e2d\u8dcc\u5e45\u6700\u5927\uff0c\u800c\u6628\u65e5\u591a\u6570\u5c01\u57fa\u8dcc\u5e45\u57282%\u5de6\u53f3\u3002\n\u9009\u9879\uff1a\u8d22\u7ecf\uff0c\u5a31\u4e50\uff0c\u65f6\u653f\uff0c\u80a1\u7968\n\u7b54\u6848\uff1a\n\nModel output:\n\u8d22\u7ecf\n```\n\n#### \u610f\u56fe\u5206\u7c7b(classify)\n```bash\nInput:\n\u610f\u56fe\u5206\u7c7b\uff1a\n\u5e2e\u6211\u5b9a\u4e00\u4e2a\u5468\u65e5\u4e0a\u6d77\u6d66\u4e1c\u7684\u623f\u95f4\n\u9009\u9879\uff1a\u95f9\u949f\uff0c\u6587\u5b66\uff0c\u9152\u5e97\uff0c\u827a\u672f\uff0c\u4f53\u80b2\uff0c\u5065\u5eb7\uff0c\u5929\u6c14\uff0c\u5176\u4ed6\n\u7b54\u6848\uff1a\n\nModel output:\n\u9152\u5e97\n```\n\n#### \u60c5\u611f\u5206\u6790(classify)\n```bash\nInput:\n\u60c5\u611f\u5206\u6790\uff1a\n\u8fd9\u4e2a\u770b\u4e0a\u53bb\u8fd8\u53ef\u4ee5\uff0c\u4f46\u5176\u5b9e\u6211\u4e0d\u559c\u6b22\n\u9009\u9879\uff1a\u79ef\u6781\uff0c\u6d88\u6781\n\u7b54\u6848\uff1a\n\nModel output:\n\u6d88\u6781\n```\n\n#### \u63a8\u7406(generate)\n```bash\nInput:\n\u8bf7\u63a8\u7406\u51fa\u4e0a\u4e0b\u6587\u7684\u5173\u7cfb\uff1a\n\u524d\u63d0\uff1a\u5bf9\u4e0d\u8d77\u4e8b\u60c5\u5c31\u662f\u8fd9\u6837\u3002\n\u5047\u8bbe\uff1a\u4e8b\u60c5\u5c31\u662f\u8fd9\u6837\uff0c\u4e0d\u9700\u8981\u9053\u6b49\u3002\n\u9009\u9879\uff1a\u4e2d\u7acb\uff0c\u8574\u6db5\uff0c\u77db\u76fe\n\u7b54\u6848\uff1a\n\nModel output:\n\u77db\u76fe\n```\n\n#### \u9605\u8bfb\u7406\u89e3(generate)\n```bash\nInput:\n\u9605\u8bfb\u6587\u7ae0\uff0c\u7ed9\u51fa\u7b54\u6848\uff1a\n\u6bb5\u843d\uff1a\n\u6e2f\u6c47\u6307\u6570\uff0c\u5168\u79f0\u6e2f\u5143\u5b9e\u9645\u6c47\u5151\u6307\u6570\uff08Effective Exchange Rate Index for the Hong Kong Dollar\uff09\u662f\u7531\u9999\u6e2f\u653f\u5e9c\u7edf\u8ba1\u5904\u7f16\u5236\u7684\u4e00\u9879\u6307\u6570\uff0c\u4ee5\u53cd\u6620\u6e2f\u5143\u4e0e\u9999\u6e2f\u4e3b\u8981\u8d38\u6613\u4f19\u4f34\u4e4b\u8d27\u5e01\u7684\u540d\u4e49\u6709\u6548\u6c47\u7387\u52a0\u6743\u5e73\u5747\u6570\u7684\u53d8\u52a8\u60c5\u51b5\u3002\u52a0\u6743\u6bd4\u91cd\u662f\u63091999\u5e74\u81f32000\u5e74\u5e73\u5747\u8d38\u6613\u6a21\u5f0f\u6240\u5236\u5b9a\uff0c\u4f46\u653f\u5e9c\u5e76\u672a\u6709\u516c\u5e03\u8be6\u7ec6\u7684\u8ba1\u7b97\u516c\u5f0f\u3002\u65e7\u6e2f\u6c47\u6307\u6570\u57fa\u51c6\u65e5\u4e3a2000\u5e741\u67081\u65e5\uff0c\u57fa\u6570\u4e3a100\u70b9\u3002\u75312012\u5e741\u67083\u65e5\u8d77\uff0c\u65b0\u7cfb\u5217\u6e2f\u6c47\u6307\u6570 (\u5305\u62ec15\u79cd\u8d27\u5e01\u53ca\u4ee52010\u5e741\u6708 = 100) \u5df2\u53d6\u4ee3\u65e7\u6e2f\u6c47\u6307\u6570\u7cfb\u5217\u3002\u6e2f\u6c47\u6307\u6570\u7684\u4f5c\u7528\uff0c\u4e3b\u8981\u662f\u7528\u4e8e\u53cd\u6620\u9999\u6e2f\u7684\u8d27\u54c1\u53ca\u670d\u52a1\u7684\u4ef7\u683c\u76f8\u5bf9\u4e8e\u5176\u4e3b\u8981\u8d38\u6613\u4f19\u4f34\u7684\u53d8\u52a8\uff0c\u5e76\u901a\u5e38\u88ab\u89c6\u4f5c\u53cd\u6620\u9999\u6e2f\u4ef7\u683c\u7ade\u4e89\u529b\u7684\u6307\u6807\u3002\n\u95ee\u9898\uff1a\u6e2f\u6c47\u6307\u6570\u7684\u52a0\u6743\u6bd4\u91cd\u5982\u4f55\u5236\u5b9a\uff1f\n\u7b54\u6848\uff1a\n\nModel output:\n\u63091999\u5e74\u81f32000\u5e74\u5e73\u5747\u8d38\u6613\u6a21\u5f0f\u6240\u5236\u5b9a\n```\n#### \u9605\u8bfb\u7406\u89e3-\u81ea\u7531\u5f0f(generate)\n```bash\nInput:\n\u9605\u8bfb\u4ee5\u4e0b\u5bf9\u8bdd\u5e76\u56de\u7b54\u95ee\u9898\u3002\n\u7537\uff1a\u4eca\u5929\u600e\u4e48\u8fd9\u4e48\u665a\u624d\u6765\u4e0a\u73ed\u554a\uff1f\u5973\uff1a\u6628\u5929\u5de5\u4f5c\u5230\u5f88\u665a\uff0c\u800c\u4e14\u6211\u8fd8\u611f\u5192\u4e86\u3002\u7537\uff1a\u90a3\u4f60\u56de\u53bb\u4f11\u606f\u5427\uff0c\u6211\u5e2e\u4f60\u8bf7\u5047\u3002\u5973\uff1a\u8c22\u8c22\u4f60\u3002\n\u95ee\u9898\uff1a\u5973\u7684\u600e\u4e48\u6837\uff1f\n\u9009\u9879\uff1a\u6b63\u5728\u5de5\u4f5c\uff0c\u611f\u5192\u4e86\uff0c\u5728\u6253\u7535\u8bdd\uff0c\u8981\u51fa\u5dee\u3002\n\u7b54\u6848\uff1a\n\nModel output:\n\u611f\u5192\u4e86\n```\n\n#### \u6458\u8981(generate)\n```bash\nInput:\n\u4e3a\u4e0b\u9762\u7684\u6587\u7ae0\u751f\u6210\u6458\u8981\uff1a\n\u5317\u4eac\u65f6\u95f49\u67085\u65e512\u65f652\u5206\uff0c\u56db\u5ddd\u7518\u5b5c\u85cf\u65cf\u81ea\u6cbb\u5dde\u6cf8\u5b9a\u53bf\u53d1\u751f6.8\u7ea7\u5730\u9707\u3002\u5730\u9707\u53d1\u751f\u540e\uff0c\u9886\u5bfc\u9ad8\u5ea6\u91cd\u89c6\u5e76\u4f5c\u51fa\u91cd\u8981\u6307\u793a\uff0c\u8981\u6c42\u628a\u62a2\u6551\u751f\u547d\u4f5c\u4e3a\u9996\u8981\u4efb\u52a1\uff0c\u5168\u529b\u6551\u63f4\u53d7\u707e\u7fa4\u4f17\uff0c\u6700\u5927\u9650\u5ea6\u51cf\u5c11\u4eba\u5458\u4f24\u4ea1\n\u7b54\u6848\uff1a\n\nModel output:\n\u56db\u5ddd\u7518\u5b5c\u53d1\u751f6.8\u7ea7\u5730\u9707\n```\n\n#### \u7ffb\u8bd1-\u4e2d\u82f1(generate)\n```bash\nInput:\n\u7ffb\u8bd1\u6210\u82f1\u6587\uff1a\n\u8bae\u957f\u53bb\u4e86\u53f0\u6e7e\uff0c\u4e2d\u56fd\u4eba\u6c11\u5f88\u6124\u6012\u3002\n\u7b54\u6848\uff1a\n\nModel output:\nThe secretary went to Taiwan and the Chinese people were angry.\n```\n\n#### \u7ffb\u8bd1-\u82f1\u4e2d(generate)\n```bash\nInput:\n\u7ffb\u8bd1\u6210\u4e2d\u6587\uff1a\nThis is a dialogue robot that can talk to people.\n\u7b54\u6848\uff1a\n\nModel output:\n\u8fd9\u662f\u4e00\u53f0\u53ef\u4ee5\u4e0e\u4eba\u4ea4\u8c08\u7684\u5bf9\u8bdd\u673a\u5668\u4eba\u3002\n```\n#### \u901a\u7528\u4fe1\u606f\u62bd\u53d6(generate)\n```bash\nInput:\n\u4fe1\u606f\u62bd\u53d6\uff1a\n\u636e\u65b0\u534e\u793e\u7535\u5e7f\u4e1c\u7701\u6e05\u8fdc\u5e02\u6e05\u57ce\u533a\u653f\u5e9c\u6628\u65e5\u5bf9\u5916\u53d1\u5e03\u4fe1\u606f\u79f0,\u65e5\u524d\u88ab\u5b9e\u540d\u4e3e\u62a5\u6d89\u5acc\u52d2\u7d22\u4f01\u4e1a\u3001\u8bf4\u201c\u5206\u5206\u949f\u53ef\u4ee5\u641e\u57ae\u4e00\u95f4\u5382\u201d\u7684\u6e05\u57ce\u533a\u73af\u4fdd\u5c40\u5c40\u957f\u9648\u67cf,\u5df2\u88ab\u514d\u53bb\u6e05\u57ce\u533a\u533a\u59d4\u59d4\u5458\n\u95ee\u9898\uff1a\u673a\u6784\u540d\uff0c\u4eba\u540d\uff0c\u804c\u4f4d\n\u7b54\u6848\uff1a\n\nModel output:\n\u673a\u6784\u540d\uff1a\u65b0\u534e\u793e\uff0c\u6e05\u57ce\u533a\u653f\u5e9c\uff0c\u6e05\u57ce\u533a\u73af\u4fdd\u5c40\uff0c\u6e05\u57ce\u533a\u533a\u59d4\n\u4eba\u540d\uff1a\u9648\u67cf\n\u804c\u4f4d\uff1a\u5c40\u957f\uff0c\u533a\u59d4\u59d4\u5458\n```\n\n#### \u7b80\u5386\u4fe1\u606f\u62bd\u53d6(generate)\n```bash\nInput:\n\u9605\u8bfb\u6587\u672c\u62bd\u53d6\u5173\u952e\u4fe1\u606f\uff1a\n\u5f20\u7384\u6b662000\u5e74\u51fa\u751f\u4e2d\u56fd\u56fd\u7c4d\u65e0\u5883\u5916\u5c45\u7559\u6743\u535a\u58eb\u5b66\u5386\u73b0\u4efb\u676d\u5dde\u7ebf\u9501\u79d1\u6280\u6280\u672f\u603b\u76d1\u3002\n\u95ee\u9898\uff1a\u673a\u6784\uff0c\u4eba\u540d\uff0c\u804c\u4f4d\uff0c\u7c4d\u8d2f\uff0c\u4e13\u4e1a\uff0c\u56fd\u7c4d\uff0c\u5b66\u5386\uff0c\u79cd\u65cf\n\u7b54\u6848\uff1a\n\nModel output:\n\u4eba\u540d\uff1a\u5f20\u7384\u6b66\n\u804c\u4f4d\uff1a\u676d\u5dde\u7ebf\u9501\u79d1\u6280\u6280\u672f\u603b\u76d1\n\u56fd\u7c4d\uff1a\u4e2d\u56fd\u56fd\u7c4d\n\u5b66\u5386\uff1a\u535a\u58eb\u5b66\u5386\n```\n\n#### \u533b\u7597\u4fe1\u606f\u62bd\u53d6(generate)\n```bash\nInput:\n\u4ece\u6587\u672c\u4e2d\u62bd\u53d6\u4fe1\u606f\uff1a\n\u60a3\u8005\u7cbe\u795e\u53ef\uff0c\u996e\u98df\u53ef\uff0c\u7761\u7720\u53ef\uff0c\u4e8c\u4fbf\u6b63\u5e38\u3002\u60a3\u8005\u901a\u8fc7\u7efc\u5408\u6cbb\u7597\u5934\u6655\u75c7\u72b6\u8f83\u524d\u51cf\u8f7b\uff0c\u60a3\u8005\u7ee7\u7eed\u53e3\u670d\u6539\u5584\u8111\u8840\u7ba1\u53ca\u8c03\u6574\u8840\u538b\u53d8\u5316\u836f\u7269\u3002\n\u95ee\u9898\uff1a\u75c7\u72b6\uff0c\u6cbb\u7597\uff0c\u68c0\u67e5\uff0c\u8eab\u4f53\u90e8\u4f4d\uff0c\u75be\u75c5\n\u7b54\u6848\uff1a\n\nModel output:\n\u75c7\u72b6\uff1a\u5934\u6655\n\u6cbb\u7597\uff1a\u6539\u5584\u8111\u8840\u7ba1\u53ca\u8c03\u6574\u8840\u538b\u53d8\u5316\u836f\u7269\n\u8eab\u4f53\u90e8\u4f4d\uff1a\u4e8c\u4fbf\n```\n\n#### \u7535\u5546\u5ba2\u6237\u9700\u6c42\u5206\u6790(classify)\n```bash\nInput:\n\u7535\u5546\u5ba2\u6237\u8bc9\u6c42\u5206\u7c7b\uff1a\n\u6536\u5230\u4f46\u4e0d\u592a\u5408\u8eab\uff0c\u53ef\u4ee5\u9000\u6362\u5417\n\u9009\u9879\uff1a\u4e70\u5bb6\u54a8\u8be2\u5546\u54c1\u662f\u5426\u652f\u6301\u82b1\u5457\u4ed8\u6b3e\uff0c\u4e70\u5bb6\u8868\u793a\u6536\u85cf\u5173\u6ce8\u5e97\u94fa\uff0c\u4e70\u5bb6\u54a8\u8be2\u9000\u6362\u8d27\u89c4\u5219\uff0c\u4e70\u5bb6\u9700\u8981\u5546\u54c1\u63a8\u8350\n\u7b54\u6848\uff1a\n\nModel output:\n\u4e70\u5bb6\u54a8\u8be2\u9000\u6362\u8d27\u89c4\u5219\n```\n\n#### \u533b\u7597\u8bed\u4e49\u76f8\u4f3c\u5ea6(classify)\n```bash\nInput:\n\u4e0b\u9762\u53e5\u5b50\u662f\u5426\u8868\u793a\u4e86\u76f8\u540c\u7684\u8bed\u4e49\uff1a\n\u6587\u672c1\uff1a\u7cd6\u5c3f\u75c5\u817f\u9ebb\u6728\u600e\u4e48\u529e\uff1f\n\u6587\u672c2\uff1a\u7cd6\u5c3f\u75c5\u600e\u6837\u63a7\u5236\u751f\u6d3b\u65b9\u5f0f\n\u9009\u9879\uff1a\u76f8\u4f3c\uff0c\u4e0d\u76f8\u4f3c\n\u7b54\u6848\uff1a\n\nModel output:\n\u4e0d\u76f8\u4f3c\n```\n\n#### \u95ee\u9898\u751f\u6210(generate)\n```bash\nInput:\n\u95ee\u9898\u751f\u6210\uff1a\n\u4e2d\u65b0\u7f512022\u5e749\u670822\u65e5\u7535 22\u65e5\uff0c\u5546\u52a1\u90e8\u53ec\u5f00\u4f8b\u884c\u65b0\u95fb\u53d1\u5e03\u4f1a\uff0c\u5546\u52a1\u90e8\u65b0\u95fb\u53d1\u8a00\u4eba\u675f\u73cf\u5a77\u8868\u793a\uff0c\u4eca\u5e741-8\u6708\uff0c\u4e2d\u56fd\u5b9e\u9645\u4f7f\u7528\u5916\u8d441384\u4ebf\u7f8e\u5143\uff0c\u589e\u957f20.2%\uff1b\u5176\u4e2d\uff0c\u6b27\u76df\u5bf9\u534e\u6295\u8d44\u589e\u957f123.7%(\u542b\u901a\u8fc7\u81ea\u7531\u6e2f\u6295\u8d44\u6570\u636e)\u3002\u8fd9\u5145\u5206\u8868\u660e\uff0c\u5305\u62ec\u6b27\u76df\u5728\u5185\u7684\u5916\u56fd\u6295\u8d44\u8005\u6301\u7eed\u770b\u597d\u4e2d\u56fd\u5e02\u573a\uff0c\u5e0c\u671b\u7ee7\u7eed\u6df1\u5316\u5bf9\u534e\u6295\u8d44\u5408\u4f5c\u3002\n\u7b54\u6848\uff1a\n\nModel output:\n1\u3001\u5546\u52a1\u90e8\u53d1\u8a00\u4eba\u675f\u73cf\u5a77\u8868\u793a\uff0c\u4e2d\u56fd\u53bb\u5e741-8\u6708\u5b9e\u9645\u4f7f\u7528\u5916\u8d441384\u4ebf\u7f8e\u5143,\u589e\u957f20.2%\uff0c\u5176\u4e2d\u6b27\u76df\u5bf9\u534e\u6295\u8d44\u589e\u957f\u591a\u5c11\uff1f\n2\u3001\u6b27\u76df\u5bf9\u534e\u6295\u8d44\u7684\u589e\u957f\u662f\u5426\u8fbe\u5230\u9884\u671f\uff1f\n3\u3001\u6b27\u76df\u5728\u5bf9\u534e\u6295\u8d44\u5408\u4f5c\u65b9\u9762\u53d6\u5f97\u4ec0\u4e48\u6837\u7684\u6210\u679c\uff1f\n```\n\n#### \u6307\u4ee3\u6d88\u89e3(generate)\n```bash\nInput:\n\u6307\u4ee3\u6d88\u89e3\uff1a\n\u6bb5\u843d\uff1a\n\u5c11\u5e73\u8ddf\u6da6\u53f6\u8fdb\u4e86\u5979\u4e8c\u7238\u5bb6\u7684\u9662\u5b50\uff0c\u6da6\u751f\u8d70\u8fc7\u6765\u5bf9\u4ed6\uff08\u4ee3\u8bcd\uff09\u8bf4\uff1a\u201c\u6211\u5230\u5bbf\u820d\u627e\u4e86\u4f60\u4e24\u56de\uff0c\u4f60\u5230\u54ea\u91cc\u53bb\u4e86\uff1f\u201d\n\u95ee\u9898\uff1a\u4ee3\u8bcd\u201c\u4ed6\u201d\u6307\u4ee3\u7684\u662f\uff1f\n\u7b54\u6848\uff1a\n\nModel output:\n\u5c11\u5e73\n```\n\n#### \u5173\u952e\u8bcd\u62bd\u53d6(generate)\n```bash\nInput:\n\u62bd\u53d6\u5173\u952e\u8bcd\uff1a\n\u5f53\u5730\u65f6\u95f421\u65e5\uff0c\u7f8e\u56fd\u8054\u90a6\u50a8\u5907\u59d4\u5458\u4f1a\u5ba3\u5e03\u52a0\u606f75\u4e2a\u57fa\u70b9\uff0c\u5c06\u8054\u90a6\u57fa\u91d1\u5229\u7387\u76ee\u6807\u533a\u95f4\u4e0a\u8c03\u52303.00%\u81f33.25%\u4e4b\u95f4\uff0c\u7b26\u5408\u5e02\u573a\u9884\u671f\u3002\u8fd9\u662f\u7f8e\u8054\u50a8\u4eca\u5e74\u4ee5\u6765\u7b2c\u4e94\u6b21\u52a0\u606f\uff0c\u4e5f\u662f\u8fde\u7eed\u7b2c\u4e09\u6b21\u52a0\u606f\uff0c\u521b\u81ea1981\u5e74\u4ee5\u6765\u7684\u6700\u5927\u5bc6\u96c6\u52a0\u606f\u5e45\u5ea6\u3002\n\u5173\u952e\u8bcd\uff1a\n\nModel output:\n\u7f8e\u8054\u50a8\uff0c\u5229\u7387\u76ee\u6807\u533a\u95f4\uff0c\u52a0\u606f\uff0c\u57fa\u70b9\n```\n\n\n#### \u60c5\u611f\u503e\u5411(classify)\n```bash\n\u6587\u5b57\u4e2d\u5305\u542b\u4e86\u600e\u6837\u7684\u60c5\u611f\uff1a\n\u8d85\u53ef\u7231\u7684\u5e05\u54e5\uff0c\u7231\u4e86\u3002\u3002\u3002\n\u9009\u9879\uff1a\u538c\u6076\uff0c\u559c\u6b22\uff0c\u5f00\u5fc3\uff0c\u60b2\u4f24\uff0c\u60ca\u8bb6\uff0c\u751f\u6c14\uff0c\u5bb3\u6015\n\u7b54\u6848\uff1a\n\nModel output:\n\u559c\u6b22\n```\n\n\u66f4\u591a\u793a\u4f8b\u529f\u80fd\u548c\u6a21\u578b\u89c1\n[ClueAI](https://github.com/clue-ai/PromptCLUE)\n"} {"downloads": 15802, "id": "fnlp/bart-base-chinese", "likes": 45, "pipeline_tag": "text2text-generation", "task": "text2text-generation", "meta": {"tags": ["text2text-generation", "Chinese", "seq2seq", "BART"], "language": "zh"}, "description": "\n# Chinese BART-Base\n\n### News\n\n**12/30/2022**\n\nAn updated version of CPT & Chinese BART are released. In the new version, we changed the following parts:\n\n- **Vocabulary** We replace the old BERT vocabulary with a larger one of size 51271 built from the training data, in which we 1) add missing 6800+ Chinese characters (most of them are traditional Chinese characters); 2) remove redundant tokens (e.g. Chinese character tokens with ## prefix); 3) add some English tokens to reduce OOV.\n- **Position Embeddings** We extend the max_position_embeddings from 512 to 1024.\n\nWe initialize the new version of models with the old version of checkpoints with vocabulary alignment. Token embeddings found in the old checkpoints are copied. And other newly added parameters are randomly initialized. We further train the new CPT & Chinese BART 50K steps with batch size 2048, max-seq-length 1024, peak learning rate 2e-5, and warmup ratio 0.1.\n\nThe result compared to the previous checkpoints is as followings:\n\n| | AFQMC | IFLYTEK | CSL-sum | LCSTS | AVG |\n| :"} {"downloads": 6675, "id": "THUDM/chatglm-6b-int4-qe", "likes": 44, "pipeline_tag": "text2text-generation", "task": "text2text-generation", "meta": {"language": ["zh", "en"], "tags": ["glm", "chatglm", "thudm"]}, "description": "\n# ChatGLM-6B\n## \u4ecb\u7ecd\nChatGLM-6B \u662f\u4e00\u4e2a\u5f00\u6e90\u7684\u3001\u652f\u6301\u4e2d\u82f1\u53cc\u8bed\u95ee\u7b54\u7684\u5bf9\u8bdd\u8bed\u8a00\u6a21\u578b\uff0c\u57fa\u4e8e [General Language Model (GLM)](https://github.com/THUDM/GLM) \u67b6\u6784\uff0c\u5177\u6709 62 \u4ebf\u53c2\u6570\u3002\u7ed3\u5408\u6a21\u578b\u91cf\u5316\u6280\u672f\uff0c\u7528\u6237\u53ef\u4ee5\u5728\u6d88\u8d39\u7ea7\u7684\u663e\u5361\u4e0a\u8fdb\u884c\u672c\u5730\u90e8\u7f72\uff08INT4 \u91cf\u5316\u7ea7\u522b\u4e0b\u6700\u4f4e\u53ea\u9700 6GB \u663e\u5b58\uff09\u3002ChatGLM-6B \u4f7f\u7528\u4e86\u548c [ChatGLM](https://chatglm.cn) \u76f8\u540c\u7684\u6280\u672f\uff0c\u9488\u5bf9\u4e2d\u6587\u95ee\u7b54\u548c\u5bf9\u8bdd\u8fdb\u884c\u4e86\u4f18\u5316\u3002\u7ecf\u8fc7\u7ea6 1T \u6807\u8bc6\u7b26\u7684\u4e2d\u82f1\u53cc\u8bed\u8bad\u7ec3\uff0c\u8f85\u4ee5\u76d1\u7763\u5fae\u8c03\u3001\u53cd\u9988\u81ea\u52a9\u3001\u4eba\u7c7b\u53cd\u9988\u5f3a\u5316\u5b66\u4e60\u7b49\u6280\u672f\u7684\u52a0\u6301\uff0c62 \u4ebf\u53c2\u6570\u7684 ChatGLM-6B \u5df2\u7ecf\u80fd\u751f\u6210\u76f8\u5f53\u7b26\u5408\u4eba\u7c7b\u504f\u597d\u7684\u56de\u7b54\u3002\n\nChatGLM-6B-INT4-QE \u662f ChatGLM-6B \u91cf\u5316\u540e\u7684\u6a21\u578b\u6743\u91cd\u3002\u5177\u4f53\u7684\uff0cChatGLM-6B-INT4-QE \u5bf9 ChatGLM-6B \u4e2d\u7684 28 \u4e2a GLM Block \u3001 Embedding \u548c LM Head \u8fdb\u884c\u4e86 INT4 \u91cf\u5316\u3002\u91cf\u5316\u540e\u7684\u6a21\u578b\u6743\u91cd\u6587\u4ef6\u4ec5\u4e3a 3G \uff0c\u7406\u8bba\u4e0a 6G \u663e\u5b58\uff08\u4f7f\u7528 CPU \u5373 6G \u5185\u5b58\uff09\u5373\u53ef\u63a8\u7406\uff0c\u5177\u6709\u5728\u5d4c\u5165\u5f0f\u8bbe\u5907\uff08\u5982\u6811\u8393\u6d3e\uff09\u4e0a\u8fd0\u884c\u7684\u53ef\u80fd\u3002\n\n\u5728 CPU \u4e0a\u8fd0\u884c\u65f6\uff0c\u4f1a\u6839\u636e\u786c\u4ef6\u81ea\u52a8\u7f16\u8bd1 CPU Kernel \uff0c\u8bf7\u786e\u4fdd\u5df2\u5b89\u88c5 GCC \u548c OpenMP \uff08Linux\u4e00\u822c\u5df2\u5b89\u88c5\uff0c\u5bf9\u4e8eWindows\u5219\u9700\u624b\u52a8\u5b89\u88c5\uff09\uff0c\u4ee5\u83b7\u5f97\u6700\u4f73\u5e76\u884c\u8ba1\u7b97\u80fd\u529b\u3002\n\n## \u8f6f\u4ef6\u4f9d\u8d56\n\n```shell\npip install protobuf==3.20.0 transformers==4.26.1 icetk cpm_kernels\n```\n\n## \u4ee3\u7801\u8c03\u7528 \n\n\u53ef\u4ee5\u901a\u8fc7\u5982\u4e0b\u4ee3\u7801\u8c03\u7528 ChatGLM-6B \u6a21\u578b\u6765\u751f\u6210\u5bf9\u8bdd\uff1a\n\n```ipython\n>>> from transformers import AutoTokenizer, AutoModel\n>>> tokenizer = AutoTokenizer.from_pretrained(\"THUDM/chatglm-6b-int4-qe\", trust_remote_code=True)\n>>> model = AutoModel.from_pretrained(\"THUDM/chatglm-6b-int4-qe\", trust_remote_code=True).half().cuda()\n>>> response, history = model.chat(tokenizer, \"\u4f60\u597d\", history=[])\n>>> print(response)\n\u4f60\u597d\ud83d\udc4b!\u6211\u662f\u4eba\u5de5\u667a\u80fd\u52a9\u624b ChatGLM-6B,\u5f88\u9ad8\u5174\u89c1\u5230\u4f60,\u6b22\u8fce\u95ee\u6211\u4efb\u4f55\u95ee\u9898\u3002\n>>> response, history = model.chat(tokenizer, \"\u665a\u4e0a\u7761\u4e0d\u7740\u5e94\u8be5\u600e\u4e48\u529e\", history=history)\n>>> print(response)\n\u665a\u4e0a\u7761\u4e0d\u7740\u53ef\u80fd\u4f1a\u8ba9\u4f60\u611f\u5230\u7126\u8651\u6216\u4e0d\u8212\u670d,\u4f46\u4ee5\u4e0b\u662f\u4e00\u4e9b\u53ef\u4ee5\u5e2e\u52a9\u4f60\u5165\u7761\u7684\u65b9\u6cd5:\n\n1. \u5236\u5b9a\u89c4\u5f8b\u7684\u7761\u7720\u65f6\u95f4\u8868:\u4fdd\u6301\u89c4\u5f8b\u7684\u7761\u7720\u65f6\u95f4\u8868\u53ef\u4ee5\u5e2e\u52a9\u4f60\u5efa\u7acb\u5065\u5eb7\u7684\u7761\u7720\u4e60\u60ef,\u4f7f\u4f60\u66f4\u5bb9\u6613\u5165\u7761\u3002\u5c3d\u91cf\u5728\u6bcf\u5929\u7684\u76f8\u540c\u65f6\u95f4\u4e0a\u5e8a,\u5e76\u5728\u540c\u4e00\u65f6\u95f4\u8d77\u5e8a\u3002\n2. \u521b\u9020\u4e00\u4e2a\u8212\u9002\u7684\u7761\u7720\u73af\u5883:\u786e\u4fdd\u7761\u7720\u73af\u5883\u8212\u9002,\u5b89\u9759,\u9ed1\u6697\u4e14\u6e29\u5ea6\u9002\u5b9c\u3002\u53ef\u4ee5\u4f7f\u7528\u8212\u9002\u7684\u5e8a\u4e0a\u7528\u54c1,\u5e76\u4fdd\u6301\u623f\u95f4\u901a\u98ce\u3002\n3. \u653e\u677e\u8eab\u5fc3:\u5728\u7761\u524d\u505a\u4e9b\u653e\u677e\u7684\u6d3b\u52a8,\u4f8b\u5982\u6ce1\u4e2a\u70ed\u6c34\u6fa1,\u542c\u4e9b\u8f7b\u67d4\u7684\u97f3\u4e50,\u9605\u8bfb\u4e00\u4e9b\u6709\u8da3\u7684\u4e66\u7c4d\u7b49,\u6709\u52a9\u4e8e\u7f13\u89e3\u7d27\u5f20\u548c\u7126\u8651,\u4f7f\u4f60\u66f4\u5bb9\u6613\u5165\u7761\u3002\n4. \u907f\u514d\u996e\u7528\u542b\u6709\u5496\u5561\u56e0\u7684\u996e\u6599:\u5496\u5561\u56e0\u662f\u4e00\u79cd\u523a\u6fc0\u6027\u7269\u8d28,\u4f1a\u5f71\u54cd\u4f60\u7684\u7761\u7720\u8d28\u91cf\u3002\u5c3d\u91cf\u907f\u514d\u5728\u7761\u524d\u996e\u7528\u542b\u6709\u5496\u5561\u56e0\u7684\u996e\u6599,\u4f8b\u5982\u5496\u5561,\u8336\u548c\u53ef\u4e50\u3002\n5. \u907f\u514d\u5728\u5e8a\u4e0a\u505a\u4e0e\u7761\u7720\u65e0\u5173\u7684\u4e8b\u60c5:\u5728\u5e8a\u4e0a\u505a\u4e9b\u4e0e\u7761\u7720\u65e0\u5173\u7684\u4e8b\u60c5,\u4f8b\u5982\u770b\u7535\u5f71,\u73a9\u6e38\u620f\u6216\u5de5\u4f5c\u7b49,\u53ef\u80fd\u4f1a\u5e72\u6270\u4f60\u7684\u7761\u7720\u3002\n6. \u5c1d\u8bd5\u547c\u5438\u6280\u5de7:\u6df1\u547c\u5438\u662f\u4e00\u79cd\u653e\u677e\u6280\u5de7,\u53ef\u4ee5\u5e2e\u52a9\u4f60\u7f13\u89e3\u7d27\u5f20\u548c\u7126\u8651,\u4f7f\u4f60\u66f4\u5bb9\u6613\u5165\u7761\u3002\u8bd5\u7740\u6162\u6162\u5438\u6c14,\u4fdd\u6301\u51e0\u79d2\u949f,\u7136\u540e\u7f13\u6162\u547c\u6c14\u3002\n\n\u5982\u679c\u8fd9\u4e9b\u65b9\u6cd5\u65e0\u6cd5\u5e2e\u52a9\u4f60\u5165\u7761,\u4f60\u53ef\u4ee5\u8003\u8651\u54a8\u8be2\u533b\u751f\u6216\u7761\u7720\u4e13\u5bb6,\u5bfb\u6c42\u8fdb\u4e00\u6b65\u7684\u5efa\u8bae\u3002\n```\n\n\u5173\u4e8e\u66f4\u591a\u7684\u4f7f\u7528\u8bf4\u660e\uff0c\u5305\u62ec\u5982\u4f55\u8fd0\u884c\u547d\u4ee4\u884c\u548c\u7f51\u9875\u7248\u672c\u7684 DEMO\uff0c\u4ee5\u53ca\u4f7f\u7528\u6a21\u578b\u91cf\u5316\u4ee5\u8282\u7701\u663e\u5b58\uff0c\u8bf7\u53c2\u8003\u6211\u4eec\u7684 [Github Repo](https://github.com/THUDM/ChatGLM-6B)\u3002\n\n## \u534f\u8bae\n\n\u672c\u4ed3\u5e93\u7684\u4ee3\u7801\u4f9d\u7167 [Apache-2.0](LICENSE) \u534f\u8bae\u5f00\u6e90\uff0cChatGLM-6B \u6a21\u578b\u7684\u6743\u91cd\u7684\u4f7f\u7528\u5219\u9700\u8981\u9075\u5faa [Model License](MODEL_LICENSE)\u3002\n\n## \u5f15\u7528\n\n\u5982\u679c\u4f60\u89c9\u5f97\u6211\u4eec\u7684\u5de5\u4f5c\u6709\u5e2e\u52a9\u7684\u8bdd\uff0c\u8bf7\u8003\u8651\u5f15\u7528\u4e0b\u5217\u8bba\u6587\uff1a\n\n```\n@inproceedings{\n zeng2023glm-130b,\n title={{GLM}-130B: An Open Bilingual Pre-trained Model},\n author={Aohan Zeng and Xiao Liu and Zhengxiao Du and Zihan Wang and Hanyu Lai and Ming Ding and Zhuoyi Yang and Yifan Xu and Wendi Zheng and Xiao Xia and Weng Lam Tam and Zixuan Ma and Yufei Xue and Jidong Zhai and Wenguang Chen and Zhiyuan Liu and Peng Zhang and Yuxiao Dong and Jie Tang},\n booktitle={The Eleventh International Conference on Learning Representations (ICLR)},\n year={2023},\n url={https://openreview.net/forum?id=-Aw0rrrPUF}\n}\n```\n```\n@inproceedings{du2022glm,\n title={GLM: General Language Model Pretraining with Autoregressive Blank Infilling},\n author={Du, Zhengxiao and Qian, Yujie and Liu, Xiao and Ding, Ming and Qiu, Jiezhong and Yang, Zhilin and Tang, Jie},\n booktitle={Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},\n pages={320--335},\n year={2022}\n}\n```"} {"downloads": 45310, "id": "google/mt5-small", "likes": 42, "pipeline_tag": "text2text-generation", "task": "text2text-generation", "meta": {"language": ["multilingual", "af", "am", "ar", "az", "be", "bg", "bn", "ca", "ceb", "co", "cs", "cy", "da", "de", "el", "en", "eo", "es", "et", "eu", "fa", "fi", "fil", "fr", "fy", "ga", "gd", "gl", "gu", "ha", "haw", "hi", "hmn", "ht", "hu", "hy", "ig", "is", "it", "iw", "ja", "jv", "ka", "kk", "km", "kn", "ko", "ku", "ky", "la", "lb", "lo", "lt", "lv", "mg", "mi", "mk", "ml", "mn", "mr", "ms", "mt", "my", "ne", "nl", false, "ny", "pa", "pl", "ps", "pt", "ro", "ru", "sd", "si", "sk", "sl", "sm", "sn", "so", "sq", "sr", "st", "su", "sv", "sw", "ta", "te", "tg", "th", "tr", "uk", "und", "ur", "uz", "vi", "xh", "yi", "yo", "zh", "zu"], "datasets": ["mc4"], "license": "apache-2.0"}, "description": "\n\n[Google's mT5](https://github.com/google-research/multilingual-t5)\n\nmT5 is pretrained on the [mC4](https://www.tensorflow.org/datasets/catalog/c4#c4multilingual) corpus, covering 101 languages:\n\nAfrikaans, Albanian, Amharic, Arabic, Armenian, Azerbaijani, Basque, Belarusian, Bengali, Bulgarian, Burmese, Catalan, Cebuano, Chichewa, Chinese, Corsican, Czech, Danish, Dutch, English, Esperanto, Estonian, Filipino, Finnish, French, Galician, Georgian, German, Greek, Gujarati, Haitian Creole, Hausa, Hawaiian, Hebrew, Hindi, Hmong, Hungarian, Icelandic, Igbo, Indonesian, Irish, Italian, Japanese, Javanese, Kannada, Kazakh, Khmer, Korean, Kurdish, Kyrgyz, Lao, Latin, Latvian, Lithuanian, Luxembourgish, Macedonian, Malagasy, Malay, Malayalam, Maltese, Maori, Marathi, Mongolian, Nepali, Norwegian, Pashto, Persian, Polish, Portuguese, Punjabi, Romanian, Russian, Samoan, Scottish Gaelic, Serbian, Shona, Sindhi, Sinhala, Slovak, Slovenian, Somali, Sotho, Spanish, Sundanese, Swahili, Swedish, Tajik, Tamil, Telugu, Thai, Turkish, Ukrainian, Urdu, Uzbek, Vietnamese, Welsh, West Frisian, Xhosa, Yiddish, Yoruba, Zulu.\n\n**Note**: mT5 was only pre-trained on mC4 excluding any supervised training. Therefore, this model has to be fine-tuned before it is useable on a downstream task.\n\nPretraining Dataset: [mC4](https://www.tensorflow.org/datasets/catalog/c4#c4multilingual)\n\nOther Community Checkpoints: [here](https://huggingface.co/models?search=mt5)\n\nPaper: [mT5: A massively multilingual pre-trained text-to-text transformer](https://arxiv.org/abs/2010.11934)\n\nAuthors: *Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel* \n\n\n## Abstract\n\nThe recent \"Text-to-Text Transfer Transformer\" (T5) leveraged a unified text-to-text format and scale to attain state-of-the-art results on a wide variety of English-language NLP tasks. In this paper, we introduce mT5, a multilingual variant of T5 that was pre-trained on a new Common Crawl-based dataset covering 101 languages. We describe the design and modified training of mT5 and demonstrate its state-of-the-art performance on many multilingual benchmarks. All of the code and model checkpoints used in this work are publicly available."} {"downloads": 62000, "id": "pszemraj/flan-t5-large-grammar-synthesis", "likes": 41, "pipeline_tag": "text2text-generation", "task": "text2text-generation", "meta": {"languages": ["en"], "license": ["cc-by-nc-sa-4.0", "apache-2.0"], "tags": ["grammar", "spelling", "punctuation", "error-correction", "grammar synthesis", "FLAN"], "datasets": ["jfleg"], "widget": [{"text": "There car broke down so their hitching a ride to they're class.", "example_title": "compound-1"}, {"text": "i can has cheezburger", "example_title": "cheezburger"}, {"text": "so em if we have an now so with fito ringina know how to estimate the tren given the ereafte mylite trend we can also em an estimate is nod s i again tort watfettering an we have estimated the trend an called wot to be called sthat of exty right now we can and look at wy this should not hare a trend i becan we just remove the trend an and we can we now estimate tesees ona effect of them exty", "example_title": "Transcribed Audio Example 2"}, {"text": "My coworker said he used a financial planner to help choose his stocks so he wouldn't loose money.", "example_title": "incorrect word choice (context)"}, {"text": "good so hve on an tadley i'm not able to make it to the exla session on monday this week e which is why i am e recording pre recording an this excelleision and so to day i want e to talk about two things and first of all em i wont em wene give a summary er about ta ohow to remove trents in these nalitives from time series", "example_title": "lowercased audio transcription output"}, {"text": "Frustrated, the chairs took me forever to set up.", "example_title": "dangling modifier"}, {"text": "I would like a peice of pie.", "example_title": "miss-spelling"}, {"text": "Which part of Zurich was you going to go hiking in when we were there for the first time together? ! ?", "example_title": "chatbot on Zurich"}, {"text": "Most of the course is about semantic or content of language but there are also interesting topics to be learned from the servicefeatures except statistics in characters in documents. At this point, Elvthos introduces himself as his native English speaker and goes on to say that if you continue to work on social scnce,", "example_title": "social science ASR summary output"}, {"text": "they are somewhat nearby right yes please i'm not sure how the innish is tepen thut mayyouselect one that istatte lo variants in their property e ere interested and anyone basical e may be applyind reaching the browing approach were"}, "medical course audio transcription"], "parameters": {"max_length": 128, "min_length": 4, "num_beams": 8, "repetition_penalty": 1.21, "length_penalty": 1, "early_stopping": true}}, "description": "\n\n\n# grammar-synthesis-large: FLAN-t5\n\n \n \"Open\n\n\nA fine-tuned version of [google/flan-t5-large](https://huggingface.co/google/flan-t5-large) for grammar correction on an expanded version of the [JFLEG](https://paperswithcode.com/dataset/jfleg) dataset. [Demo](https://huggingface.co/spaces/pszemraj/FLAN-grammar-correction) on HF spaces.\n\n## Example\n\n![example](https://i.imgur.com/PIhrc7E.png)\n\nCompare vs. the original [grammar-synthesis-large](https://huggingface.co/pszemraj/grammar-synthesis-large).\n\n"} {"downloads": 10971, "id": "philschmid/flan-t5-base-samsum", "likes": 41, "pipeline_tag": "text2text-generation", "task": "text2text-generation", "meta": {"license": "apache-2.0", "tags": ["generated_from_trainer"], "datasets": ["samsum"], "metrics": ["rouge"], "model-index": [{"name": "flan-t5-base-samsum", "results": [{"task": {"name": "Sequence-to-sequence Language Modeling", "type": "text2text-generation"}, "dataset": {"name": "samsum", "type": "samsum", "config": "samsum", "split": "train", "args": "samsum"}, "metrics": [{"name": "Rouge1", "type": "rouge", "value": 47.2358}]}]}]}, "description": "\n\n\n\n# flan-t5-base-samsum\n\nThis model is a fine-tuned version of [google/flan-t5-base](https://huggingface.co/google/flan-t5-base) on the samsum dataset.\nIt achieves the following results on the evaluation set:\n- Loss: 1.3716\n- Rouge1: 47.2358\n- Rouge2: 23.5135\n- Rougel: 39.6266\n- Rougelsum: 43.3458\n- Gen Len: 17.3907\n\n## Model description\n\nMore information needed\n\n## Intended uses & limitations\n\nMore information needed\n\n## Training and evaluation data\n\nMore information needed\n\n## Training procedure\n\n### Training hyperparameters\n\nThe following hyperparameters were used during training:\n- learning_rate: 5e-05\n- train_batch_size: 8\n- eval_batch_size: 8\n- seed: 42\n- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08\n- lr_scheduler_type: linear\n- num_epochs: 5\n\n### Training results\n\n| Training Loss | Epoch | Step | Validation Loss | Rouge1 | Rouge2 | Rougel | Rougelsum | Gen Len |\n|:"} {"downloads": 1439368, "id": "facebook/bart-large-cnn", "likes": 305, "pipeline_tag": "summarization", "task": "summarization", "meta": {"language": ["en"], "tags": ["summarization"], "license": "mit", "thumbnail": "https://huggingface.co/front/thumbnails/facebook.png", "datasets": ["cnn_dailymail"], "model-index": [{"name": "facebook/bart-large-cnn", "results": [{"task": {"type": "summarization", "name": "Summarization"}, "dataset": {"name": "cnn_dailymail", "type": "cnn_dailymail", "config": "3.0.0", "split": "train"}, "metrics": [{"name": "ROUGE-1", "type": "rouge", "value": 42.9486, "verified": true}, {"name": "ROUGE-2", "type": "rouge", "value": 20.8149, "verified": true}, {"name": "ROUGE-L", "type": "rouge", "value": 30.6186, "verified": true}, {"name": "ROUGE-LSUM", "type": "rouge", "value": 40.0376, "verified": true}, {"name": "loss", "type": "loss", "value": 2.529000997543335, "verified": true}, {"name": "gen_len", "type": "gen_len", "value": 78.5866, "verified": true}]}]}]}, "description": "\n# BART (large-sized model), fine-tuned on CNN Daily Mail \n\nBART model pre-trained on English language, and fine-tuned on [CNN Daily Mail](https://huggingface.co/datasets/cnn_dailymail). It was introduced in the paper [BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension](https://arxiv.org/abs/1910.13461) by Lewis et al. and first released in [this repository (https://github.com/pytorch/fairseq/tree/master/examples/bart). \n\nDisclaimer: The team releasing BART did not write a model card for this model so this model card has been written by the Hugging Face team.\n\n## Model description\n\nBART is a transformer encoder-encoder (seq2seq) model with a bidirectional (BERT-like) encoder and an autoregressive (GPT-like) decoder. BART is pre-trained by (1) corrupting text with an arbitrary noising function, and (2) learning a model to reconstruct the original text.\n\nBART is particularly effective when fine-tuned for text generation (e.g. summarization, translation) but also works well for comprehension tasks (e.g. text classification, question answering). This particular checkpoint has been fine-tuned on CNN Daily Mail, a large collection of text-summary pairs.\n\n## Intended uses & limitations\n\nYou can use this model for text summarization. \n\n### How to use\n\nHere is how to use this model with the [pipeline API](https://huggingface.co/transformers/main_classes/pipelines.html):\n\n```python\nfrom transformers import pipeline\n\nsummarizer = pipeline(\"summarization\", model=\"facebook/bart-large-cnn\")\n\nARTICLE = \"\"\" New York (CNN)When Liana Barrientos was 23 years old, she got married in Westchester County, New York.\nA year later, she got married again in Westchester County, but to a different man and without divorcing her first husband.\nOnly 18 days after that marriage, she got hitched yet again. Then, Barrientos declared \"I do\" five more times, sometimes only within two weeks of each other.\nIn 2010, she married once more, this time in the Bronx. In an application for a marriage license, she stated it was her \"first and only\" marriage.\nBarrientos, now 39, is facing two criminal counts of \"offering a false instrument for filing in the first degree,\" referring to her false statements on the\n2010 marriage license application, according to court documents.\nProsecutors said the marriages were part of an immigration scam.\nOn Friday, she pleaded not guilty at State Supreme Court in the Bronx, according to her attorney, Christopher Wright, who declined to comment further.\nAfter leaving court, Barrientos was arrested and charged with theft of service and criminal trespass for allegedly sneaking into the New York subway through an emergency exit, said Detective\nAnnette Markowski, a police spokeswoman. In total, Barrientos has been married 10 times, with nine of her marriages occurring between 1999 and 2002.\nAll occurred either in Westchester County, Long Island, New Jersey or the Bronx. She is believed to still be married to four men, and at one time, she was married to eight men at once, prosecutors say.\nProsecutors said the immigration scam involved some of her husbands, who filed for permanent residence status shortly after the marriages.\nAny divorces happened only after such filings were approved. It was unclear whether any of the men will be prosecuted.\nThe case was referred to the Bronx District Attorney\\'s Office by Immigration and Customs Enforcement and the Department of Homeland Security\\'s\nInvestigation Division. Seven of the men are from so-called \"red-flagged\" countries, including Egypt, Turkey, Georgia, Pakistan and Mali.\nHer eighth husband, Rashid Rajput, was deported in 2006 to his native Pakistan after an investigation by the Joint Terrorism Task Force.\nIf convicted, Barrientos faces up to four years in prison. Her next court appearance is scheduled for May 18.\n\"\"\"\nprint(summarizer(ARTICLE, max_length=130, min_length=30, do_sample=False))\n>>> [{'summary_text': 'Liana Barrientos, 39, is charged with two counts of \"offering a false instrument for filing in the first degree\" In total, she has been married 10 times, with nine of her marriages occurring between 1999 and 2002. She is believed to still be married to four men.'}]\n```\n\n### BibTeX entry and citation info\n\n```bibtex\n@article{DBLP:journals/corr/abs-1910-13461,\n author = {Mike Lewis and\n Yinhan Liu and\n Naman Goyal and\n Marjan Ghazvininejad and\n Abdelrahman Mohamed and\n Omer Levy and\n Veselin Stoyanov and\n Luke Zettlemoyer},\n title = {{BART:} Denoising Sequence-to-Sequence Pre-training for Natural Language\n Generation, Translation, and Comprehension},\n journal = {CoRR},\n volume = {abs/1910.13461},\n year = {2019},\n url = {http://arxiv.org/abs/1910.13461},\n eprinttype = {arXiv},\n eprint = {1910.13461},\n timestamp = {Thu, 31 Oct 2019 14:02:26 +0100},\n biburl = {https://dblp.org/rec/journals/corr/abs-1910-13461.bib},\n bibsource = {dblp computer science bibliography, https://dblp.org}\n}"} {"downloads": 1562512, "id": "philschmid/bart-large-cnn-samsum", "likes": 137, "pipeline_tag": "summarization", "task": "summarization", "meta": {"language": "en", "license": "mit", "tags": ["sagemaker", "bart", "summarization"], "datasets": ["samsum"], "widget": [{"text": "Jeff: Can I train a \ud83e\udd17 Transformers model on Amazon SageMaker? \nPhilipp: Sure you can use the new Hugging Face Deep Learning Container. \nJeff: ok.\nJeff: and how can I get started? \nJeff: where can I find documentation? \nPhilipp: ok, ok you can find everything here. https://huggingface.co/blog/the-partnership-amazon-sagemaker-and-hugging-face\n"}], "model-index": [{"name": "bart-large-cnn-samsum", "results": [{"task": {"type": "summarization", "name": "Summarization"}, "dataset": {"name": "SAMSum Corpus: A Human-annotated Dialogue Dataset for Abstractive Summarization", "type": "samsum"}, "metrics": [{"type": "rogue-1", "value": 42.621, "name": "Validation ROGUE-1"}, {"type": "rogue-2", "value": 21.9825, "name": "Validation ROGUE-2"}, {"type": "rogue-l", "value": 33.034, "name": "Validation ROGUE-L"}, {"type": "rogue-1", "value": 41.3174, "name": "Test ROGUE-1"}, {"type": "rogue-2", "value": 20.8716, "name": "Test ROGUE-2"}, {"type": "rogue-l", "value": 32.1337, "name": "Test ROGUE-L"}]}, {"task": {"type": "summarization", "name": "Summarization"}, "dataset": {"name": "samsum", "type": "samsum", "config": "samsum", "split": "test"}, "metrics": [{"type": "rouge", "value": 41.3282, "name": "ROUGE-1", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiZTYzNzZkZDUzOWQzNGYxYTJhNGE4YWYyZjA0NzMyOWUzMDNhMmVhYzY1YTM0ZTJhYjliNGE4MDZhMjhhYjRkYSIsInZlcnNpb24iOjF9.OOM6l3v5rJCndmUIJV-2SDh2NjbPo5IgQOSL-Ju1Gwbi1voL5amsDEDOelaqlUBE3n55KkUsMLZhyn66yWxZBQ"}, {"type": "rouge", "value": 20.8755, "name": "ROUGE-2", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiMWZiODFiYWQzY2NmOTc5YjA3NTI0YzQ1MzQ0ODk2NjgyMmVlMjA5MjZiNTJkMGRmZGEzN2M3MDNkMjkxMDVhYSIsInZlcnNpb24iOjF9.b8cPk2-IL24La3Vd0hhtii4tRXujh5urAwy6IVeTWHwYfXaURyC2CcQOWtlOx5bdO5KACeaJFrFBCGgjk-VGCQ"}, {"type": "rouge", "value": 32.1353, "name": "ROUGE-L", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiYWNmYzdiYWQ2ZWRkYzRiMGMxNWUwODgwZTdkY2NjZTc1NWE5NTFiMzU0OTU1N2JjN2ExYWQ2NGZkNjk5OTc4YSIsInZlcnNpb24iOjF9.Fzv4p-TEVicljiCqsBJHK1GsnE_AwGqamVmxTPI0WBNSIhZEhliRGmIL_z1pDq6WOzv3GN2YUGvhowU7GxnyAQ"}, {"type": "rouge", "value": 38.401, "name": "ROUGE-LSUM", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiNGI4MWY0NWMxMmQ0ODQ5MDhiNDczMDAzYzJkODBiMzgzYWNkMWM2YTZkZDJmNWJiOGQ3MmNjMGViN2UzYWI2ZSIsInZlcnNpb24iOjF9.7lw3h5k5lJ7tYFLZGUtLyDabFYd00l6ByhmvkW4fykocBy9Blyin4tdw4Xps4DW-pmrdMLgidHxBWz5MrSx1Bw"}, {"type": "loss", "value": 1.4297215938568115, "name": "loss", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiMzI0ZWNhNDM5YTViZDMyZGJjMDA1ZWFjYzNhOTdlOTFiNzhhMDBjNmM2MjA3ZmRkZjJjMjEyMGY3MzcwOTI2NyIsInZlcnNpb24iOjF9.oNaZsAtUDqGAqoZWJavlcW7PKx1AWsnkbhaQxadpOKk_u7ywJJabvTtzyx_DwEgZslgDETCf4MM-JKitZKjiDA"}, {"type": "gen_len", "value": 60.0757, "name": "gen_len", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiYTgwYWYwMDRkNTJkMDM5N2I2MWNmYzQ3OWM1NDJmODUyZGViMGE4ZTdkNmIwYWM2N2VjZDNmN2RiMDE4YTYyYiIsInZlcnNpb24iOjF9.PbXTcNYX_SW-BuRQEcqyc21M7uKrOMbffQSAK6k2GLzTVRrzZxsDC57ktKL68zRY8fSiRGsnknOwv-nAR6YBCQ"}]}]}]}, "description": "\n\n## `bart-large-cnn-samsum`\n\n> If you want to use the model you should try a newer fine-tuned FLAN-T5 version [philschmid/flan-t5-base-samsum](https://huggingface.co/philschmid/flan-t5-base-samsum) out socring the BART version with `+6` on `ROGUE1` achieving `47.24`.\n\n# TRY [philschmid/flan-t5-base-samsum](https://huggingface.co/philschmid/flan-t5-base-samsum)\n\n\nThis model was trained using Amazon SageMaker and the new Hugging Face Deep Learning container.\n\nFor more information look at:\n- [\ud83e\udd17 Transformers Documentation: Amazon SageMaker](https://huggingface.co/transformers/sagemaker.html)\n- [Example Notebooks](https://github.com/huggingface/notebooks/tree/master/sagemaker)\n- [Amazon SageMaker documentation for Hugging Face](https://docs.aws.amazon.com/sagemaker/latest/dg/hugging-face.html)\n- [Python SDK SageMaker documentation for Hugging Face](https://sagemaker.readthedocs.io/en/stable/frameworks/huggingface/index.html)\n- [Deep Learning Container](https://github.com/aws/deep-learning-containers/blob/master/available_images.md#huggingface-training-containers)\n\n## Hyperparameters\n```json\n{\n \"dataset_name\": \"samsum\",\n \"do_eval\": true,\n \"do_predict\": true,\n \"do_train\": true,\n \"fp16\": true,\n \"learning_rate\": 5e-05,\n \"model_name_or_path\": \"facebook/bart-large-cnn\",\n \"num_train_epochs\": 3,\n \"output_dir\": \"/opt/ml/model\",\n \"per_device_eval_batch_size\": 4,\n \"per_device_train_batch_size\": 4,\n \"predict_with_generate\": true,\n \"seed\": 7\n}\n```\n\n## Usage\n```python\nfrom transformers import pipeline\nsummarizer = pipeline(\"summarization\", model=\"philschmid/bart-large-cnn-samsum\")\n\nconversation = '''Jeff: Can I train a \ud83e\udd17 Transformers model on Amazon SageMaker? \nPhilipp: Sure you can use the new Hugging Face Deep Learning Container. \nJeff: ok.\nJeff: and how can I get started? \nJeff: where can I find documentation? \nPhilipp: ok, ok you can find everything here. https://huggingface.co/blog/the-partnership-amazon-sagemaker-and-hugging-face \n'''\nsummarizer(conversation)\n```\n\n## Results\n\n| key | value |\n| "} {"downloads": 380221, "id": "sshleifer/distilbart-cnn-12-6", "likes": 114, "pipeline_tag": "summarization", "task": "summarization", "meta": {"language": "en", "tags": ["summarization"], "license": "apache-2.0", "datasets": ["cnn_dailymail", "xsum"], "thumbnail": "https://huggingface.co/front/thumbnails/distilbart_medium.png"}, "description": "\n\n### Usage\n\nThis checkpoint should be loaded into `BartForConditionalGeneration.from_pretrained`. See the [BART docs](https://huggingface.co/transformers/model_doc/bart.html?#transformers.BartForConditionalGeneration) for more information.\n\n### Metrics for DistilBART models\n\n| Model Name | MM Params | Inference Time (MS) | Speedup | Rouge 2 | Rouge-L |\n|:"} {"downloads": 78535, "id": "csebuetnlp/mT5_multilingual_XLSum", "likes": 109, "pipeline_tag": "summarization", "task": "summarization", "meta": {"tags": ["summarization", "mT5"], "datasets": ["csebuetnlp/xlsum"], "language": ["am", "ar", "az", "bn", "my", "zh", "en", "fr", "gu", "ha", "hi", "ig", "id", "ja", "rn", "ko", "ky", "mr", "ne", "om", "ps", "fa", "pcm", "pt", "pa", "ru", "gd", "sr", "si", "so", "es", "sw", "ta", "te", "th", "ti", "tr", "uk", "ur", "uz", "vi", "cy", "yo"], "licenses": ["cc-by-nc-sa-4.0"], "widget": [{"text": "Videos that say approved vaccines are dangerous and cause autism, cancer or infertility are among those that will be taken down, the company said. The policy includes the termination of accounts of anti-vaccine influencers. Tech giants have been criticised for not doing more to counter false health information on their sites. In July, US President Joe Biden said social media platforms were largely responsible for people's scepticism in getting vaccinated by spreading misinformation, and appealed for them to address the issue. YouTube, which is owned by Google, said 130,000 videos were removed from its platform since last year, when it implemented a ban on content spreading misinformation about Covid vaccines. In a blog post, the company said it had seen false claims about Covid jabs \"spill over into misinformation about vaccines in general\". The new policy covers long-approved vaccines, such as those against measles or hepatitis B. \"We're expanding our medical misinformation policies on YouTube with new guidelines on currently administered vaccines that are approved and confirmed to be safe and effective by local health authorities and the WHO,\" the post said, referring to the World Health Organization."}], "model-index": [{"name": "csebuetnlp/mT5_multilingual_XLSum", "results": [{"task": {"type": "summarization", "name": "Summarization"}, "dataset": {"name": "xsum", "type": "xsum", "config": "default", "split": "test"}, "metrics": [{"name": "ROUGE-1", "type": "rouge", "value": 36.5002, "verified": true}, {"name": "ROUGE-2", "type": "rouge", "value": 13.934, "verified": true}, {"name": "ROUGE-L", "type": "rouge", "value": 28.9876, "verified": true}, {"name": "ROUGE-LSUM", "type": "rouge", "value": 28.9958, "verified": true}, {"name": "loss", "type": "loss", "value": 2.0674800872802734, "verified": true}, {"name": "gen_len", "type": "gen_len", "value": 26.9733, "verified": true}]}]}]}, "description": "\n\n# mT5-multilingual-XLSum\n\nThis repository contains the mT5 checkpoint finetuned on the 45 languages of [XL-Sum](https://huggingface.co/datasets/csebuetnlp/xlsum) dataset. For finetuning details and scripts,\nsee the [paper](https://aclanthology.org/2021.findings-acl.413/) and the [official repository](https://github.com/csebuetnlp/xl-sum). \n\n\n## Using this model in `transformers` (tested on 4.11.0.dev0)\n\n```python\nimport re\nfrom transformers import AutoTokenizer, AutoModelForSeq2SeqLM\n\nWHITESPACE_HANDLER = lambda k: re.sub('\\s+', ' ', re.sub('\\n+', ' ', k.strip()))\n\narticle_text = \"\"\"Videos that say approved vaccines are dangerous and cause autism, cancer or infertility are among those that will be taken down, the company said. The policy includes the termination of accounts of anti-vaccine influencers. Tech giants have been criticised for not doing more to counter false health information on their sites. In July, US President Joe Biden said social media platforms were largely responsible for people's scepticism in getting vaccinated by spreading misinformation, and appealed for them to address the issue. YouTube, which is owned by Google, said 130,000 videos were removed from its platform since last year, when it implemented a ban on content spreading misinformation about Covid vaccines. In a blog post, the company said it had seen false claims about Covid jabs \"spill over into misinformation about vaccines in general\". The new policy covers long-approved vaccines, such as those against measles or hepatitis B. \"We're expanding our medical misinformation policies on YouTube with new guidelines on currently administered vaccines that are approved and confirmed to be safe and effective by local health authorities and the WHO,\" the post said, referring to the World Health Organization.\"\"\"\n\nmodel_name = \"csebuetnlp/mT5_multilingual_XLSum\"\ntokenizer = AutoTokenizer.from_pretrained(model_name)\nmodel = AutoModelForSeq2SeqLM.from_pretrained(model_name)\n\ninput_ids = tokenizer(\n [WHITESPACE_HANDLER(article_text)],\n return_tensors=\"pt\",\n padding=\"max_length\",\n truncation=True,\n max_length=512\n)[\"input_ids\"]\n\noutput_ids = model.generate(\n input_ids=input_ids,\n max_length=84,\n no_repeat_ngram_size=2,\n num_beams=4\n)[0]\n\nsummary = tokenizer.decode(\n output_ids,\n skip_special_tokens=True,\n clean_up_tokenization_spaces=False\n)\n\nprint(summary)\n```\n\n## Benchmarks\n\nScores on the XL-Sum test sets are as follows:\n\nLanguage | ROUGE-1 / ROUGE-2 / ROUGE-L\n"} {"downloads": 329179, "id": "google/pegasus-xsum", "likes": 90, "pipeline_tag": "summarization", "task": "summarization", "meta": {"language": "en", "tags": ["summarization"], "model-index": [{"name": "google/pegasus-xsum", "results": [{"task": {"type": "summarization", "name": "Summarization"}, "dataset": {"name": "samsum", "type": "samsum", "config": "samsum", "split": "train"}, "metrics": [{"name": "ROUGE-1", "type": "rouge", "value": 21.8096, "verified": true}, {"name": "ROUGE-2", "type": "rouge", "value": 4.2525, "verified": true}, {"name": "ROUGE-L", "type": "rouge", "value": 17.4469, "verified": true}, {"name": "ROUGE-LSUM", "type": "rouge", "value": 18.8907, "verified": true}, {"name": "loss", "type": "loss", "value": 3.0317161083221436, "verified": true}, {"name": "gen_len", "type": "gen_len", "value": 20.3122, "verified": true}]}, {"task": {"type": "summarization", "name": "Summarization"}, "dataset": {"name": "xsum", "type": "xsum", "config": "default", "split": "test"}, "metrics": [{"name": "ROUGE-1", "type": "rouge", "value": 46.8623, "verified": true}, {"name": "ROUGE-2", "type": "rouge", "value": 24.4533, "verified": true}, {"name": "ROUGE-L", "type": "rouge", "value": 39.0548, "verified": true}, {"name": "ROUGE-LSUM", "type": "rouge", "value": 39.0994, "verified": true}, {"name": "loss", "type": "loss", "value": 1.5717021226882935, "verified": true}, {"name": "gen_len", "type": "gen_len", "value": 22.8821, "verified": true}]}, {"task": {"type": "summarization", "name": "Summarization"}, "dataset": {"name": "cnn_dailymail", "type": "cnn_dailymail", "config": "3.0.0", "split": "test"}, "metrics": [{"name": "ROUGE-1", "type": "rouge", "value": 22.2062, "verified": true}, {"name": "ROUGE-2", "type": "rouge", "value": 7.6701, "verified": true}, {"name": "ROUGE-L", "type": "rouge", "value": 15.4046, "verified": true}, {"name": "ROUGE-LSUM", "type": "rouge", "value": 19.2182, "verified": true}, {"name": "loss", "type": "loss", "value": 2.681241273880005, "verified": true}, {"name": "gen_len", "type": "gen_len", "value": 25.0234, "verified": true}]}]}]}, "description": "\n\n### Pegasus Models\nSee Docs: [here](https://huggingface.co/transformers/master/model_doc/pegasus.html)\n\nOriginal TF 1 code [here](https://github.com/google-research/pegasus)\n\nAuthors: Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu on Dec 18, 2019\n\nMaintained by: [@sshleifer](https://twitter.com/sam_shleifer)\n\nTask: Summarization\n\nThe following is copied from the authors' README.\n\n# Mixed & Stochastic Checkpoints\n\nWe train a pegasus model with sampled gap sentence ratios on both C4 and HugeNews, and stochastically sample important sentences. The updated the results are reported in this table.\n\n| dataset | C4 | HugeNews | Mixed & Stochastic|\n| "} {"downloads": 30597, "id": "knkarthick/MEETING_SUMMARY", "likes": 82, "pipeline_tag": "summarization", "task": "summarization", "meta": {"language": "en", "license": "apache-2.0", "tags": ["bart", "seq2seq", "summarization"], "datasets": ["cnndaily/newyorkdaily/xsum/samsum/dialogsum/AMI"], "metrics": ["rouge"], "widget": [{"text": "Hi, I'm David and I'm supposed to be an industrial designer. Um, I just got the project announcement about what the project is. Designing a remote control. That's about it, didn't get anything else. Did you get the same thing? Cool. There's too much gear. Okay. Can't draw. Um. Yeah. Um, well anyway, I don't know, it's just the first animal I can think off the top of my head. Um. Yes. Big reason is 'cause I'm allergic to most animals. Allergic to animal fur, so um fish was a natural choice. Um, yeah, and I kind of like whales. They come in and go eat everything in sight. And they're quite harmless and mild and interesting. Tail's a bit big, I think. It's an after dinner dog then. Hmm. It does make sense from maybe the design point of view 'cause you have more complicated characters like European languages, then you need more buttons. So, possibly. Hmm. Yeah. And you keep losing them. Finding them is really a pain, you know. I mean it's usually quite small, or when you want it right, it slipped behind the couch or it's kicked under the table. You know. Yep. Mm-hmm. I think one factor would be production cost. Because there's a cap there, so um depends on how much you can cram into that price. Um. I think that that's the main factor. Cool.\nOkay. Right. Um well this is the kick-off meeting for our our project. Um and um this is just what we're gonna be doing over the next twenty five minutes. Um so first of all, just to kind of make sure that we all know each other, I'm Laura and I'm the project manager. Do you want to introduce yourself again? Okay. Great. Okay. Um so we're designing a new remote control and um Oh I have to record who's here actually. So that's David, Andrew and Craig, isn't it? And you all arrived on time. Um yeah so des uh design a new remote control. Um, as you can see it's supposed to be original, trendy and user friendly. Um so that's kind of our our brief, as it were. Um and so there are three different stages to the design. Um I'm not really sure what what you guys have already received um in your emails. What did you get? Mm-hmm. Is that what everybody got? Okay. Um. So we're gonna have like individual work and then a meeting about it. And repeat that process three times. Um and at this point we get try out the whiteboard over there. Um. So uh you get to draw your favourite animal and sum up your favourite characteristics of it. So who would like to go first? Very good. Mm-hmm. Yeah. Yeah. Right. Lovely. Right. You can take as long over this as you like, because we haven't got an awful lot to discuss. Ok oh we do we do. Don't feel like you're in a rush, anyway. Ach why not We might have to get you up again then. I don't know what mine is. I'm gonna have to think on the spot now. Is that a whale? Ah. Okay. God, I still don't know what I'm gonna write about. Um. I was gonna choose a dog as well. But I'll just draw a different kind of dog. M my favourite animal is my own dog at home. Um That doesn't really look like him, actually. He looks more like a pig, actually. Ah well. Do you? Oh that's very good of you. Uh. Um he's a mixture of uh various things. Um and what do I like about him, um That's just to suggest that his tail wags. Um he's very friendly and cheery and always pleased to see you, and very kind of affectionate and um uh and he's quite quite wee as well so you know he can doesn't take up too much space. Um and uh And he does a funny thing where he chases his tail as well, which is quite amusing, so It is. I think it is. He only does it after he's had his dinner and um he'll just all of a sudden just get up and start chasing his tail 'round the living room. Yeah, so uh Yeah, maybe. Maybe. Right, um where did you find this? Just down here? Yeah. Okay. Um what are we doing next? Uh um. Okay, uh we now need to discuss the project finance. Um so according to the brief um we're gonna be selling this remote control for twenty five Euro, um and we're aiming to make fifty million Euro. Um so we're gonna be selling this on an international scale. And uh we don't want it to cost any more than uh twelve fifty Euros, so fifty percent of the selling price. Sure. All together. Um I dunno. I imagine That's a good question. I imagine it probably is our sale actually because it's probably up to the the um the retailer to uh sell it for whatever price they want. Um. But I I don't know, I mean do you think the fact that it's going to be sold internationally will have a bearing on how we design it at all? Think it will? Um. Hmm. Oh yeah, regions and stuff, yeah. Yeah. Okay. Yeah. Well for a remote control, do you think that will be I suppose it's depends on how complicated our remote control is. Yeah, yeah. Okay. What, just like in terms of like the wealth of the country? Like how much money people have to spend on things like? Aye, I see what you mean, yeah. Marketing. Good marketing thoughts. Oh gosh, I should be writing all this down. Um. Mm. Yeah. Yeah, yeah. Like how much does, you know, a remote control cost. Well twenty five Euro, I mean that's um that's about like eighteen pounds or something, isn't it? Or no, is it as much as that? Sixteen seventeen eighteen pounds. Um, I dunno, I've never bought a remote control, so I don't know how how good a remote control that would get you. Um. But yeah, I suppose it has to look kind of cool and gimmicky. Um right, okay. Let me just scoot on ahead here. Okay. Um well d Does anybody have anything to add to uh to the finance issue at all? Thin No, actually. That would be useful, though, wouldn't it, if you knew like what your money would get you now. Mm-hmm. Yeah, yeah. Oh. Five minutes to end of meeting. Oh, okay. We're a bit behind. Yeah. Right, so do you think that should be like a main design aim of our remote control d you know, do your your satellite and your regular telly and your V_C_R_ and everything? Mm-hmm. Yeah. Or even like, you know, notes about um what you wanna watch. Like you might put in there oh I want to watch such and such and look a Oh that's a good idea. So extra functionalities. Mm-hmm. Hmm. Um okay, uh I'd wel we're gonna have to wrap up pretty quickly in the next couple of minutes. Um I'll just check we've nothing else. Okay. Um so anything else anybody wants to add about what they don't like about remote controls they've used, what they would really like to be part of this new one at all? You keep losing them. Okay. Yeah. W You get those ones where you can, if you like, whistle or make a really high pitched noise they beep. There I mean is that something we'd want to include, do you think? Dunno. Okay maybe. My goodness. Still feels quite primitive. Maybe like a touch screen or something? Okay. Uh-huh, okay. Well I guess that's up to our industrial designer. It looks better. Yeah. Okay. Okay. Right, well um so just to wrap up, the next meeting's gonna be in thirty minutes. So that's about um about ten to twelve by my watch. Um so inbetween now and then, um as the industrial designer, you're gonna be working on you know the actual working design of it so y you know what you're doing there. Um for user interface, technical functions, I guess that's you know like what we've been talking about, what it'll actually do. Um and uh marketing executive, you'll be just thinking about what it actually what, you know, what requirements it has to has to fulfil and you'll all get instructions emailed to you, I guess. Um. Yeah, so it's th the functional design stage is next, I guess. And uh and that's the end of the meeting. So I got that little message a lot sooner than I thought I would, so Mm-hmm. Uh-huh, yeah. Th Okay, well just very quickly 'cause this we're supposed to finish now. Um I guess that's up to us, I mean you probably want some kind of unique selling point of it, so um, you know Yeah. Mm-hmm. Yeah. Okay. Right, okay, we'll that's that's the end of the meeting, then. Um. So, uh thank you all for coming.\nUm I'm Craig and I'm User Interface. Yeah. Well, my favourite animal would be a monkey. Then they're small cute and furry, and uh when planet of the apes becomes real, I'm gonna be up there with them. Yeah. I know um My parents went out and bought um remote controls because um they got fed up of having four or five different remote controls for each things the house. So um for them it was just how many devices control. Uh.\nMm-hmm. Great. And I'm Andrew and I'm uh our marketing expert. Mm-hmm. Mm-hmm. Yeah, that's that's it. Yeah. I will go. That's fine. Alright. So This one here, right? Okay. Very nice. Alright. My favourite animal is like A beagle. Um charac favourite characteristics of it? Is that right? Uh, right, well basically um high priority for any animal for me is that they be willing to take a lot of physical affection from their family. And, yeah that they have lots of personality and uh be fit and in robust good health. So this is blue. Blue beagle. My family's beagle. I coulda told you a whole lot more about beagles. Boy, let me tell you. Impressionist. Alright. Mm. Superb sketch, by the way. Yep. I see a dog in there. Yep. Now I see a rooster. What kind is it? Is he aware that th it's his own cha tail he's chasing? Hmm. Probably when he was little he got lots of attention for doing it and has forever been conditioned. 'Kay. Um, can we just go over that again? Uh, so bas at twel Alright, yeah. Okay. So cost like production cost is twelve fifty, but selling price is is that wholesale or retail? Like on the shelf. Our sale our sale anyway. Yeah, okay okay. Okay. Mm-hmm. Alright. Yes. Mm-hmm. Mm-hmm. Well right away I'm wondering if there's um th th uh, like with D_V_D_ players, if there are zones. Um f frequencies or something um as well as uh characters, um different uh keypad styles and s symbols. Um. I don't know. Yeah. Yeah. Yeah. And then a and then al the other thing international is on top of the price. I'm thinking the price might might appeal to a certain market in one region, whereas in another it'll be different, so Just a chara just a characteristic of the Just Or just like, basic product podi positioning, the twenty five Euro remote control might be a big hit in London, might not be such a big hit in Greece, who knows, something like that, yeah. Yep. Right away I'm making some kind of assumptions about what what information we're given here, thinking, 'kay trendy probably means something other than just basic, something other than just standard. Um so I'm wondering right away, is selling twenty five Euros, is that sort of the thi is this gonna to be like the premium product kinda thing or Uh-huh. Mm-hmm. Yep. Yeah, I'd say so, yeah. No. Yeah, yeah. Mm-hmm. Do we have any other background information on like how that compares to other other Yeah. Mm-hmm. Yeah, interesting thing about discussing um production of a remote control for me is that l as you point out, I just don't think of remote controls as somethin something people consciously assess in their purchasing habits. It's just like getting shoelaces with shoes or something. It just comes along. Do you know what I mean? Like so sort of like how do you I I mean one one way of looking at it would be, well the people producing television sets, maybe they have to buy remote controls. Or another way is maybe people who have T_V_ sets are really fed up with their remote control and they really want a better one or something. But Right. Right. Okay so Right, so in function one of the priorities might be to combine as many uses I think so. Yeah, yeah. Yeah. Well like um, maybe what we could use is a sort of like a example of a successful other piece technology is palm palm pilots. They're gone from being just like little sort of scribble boards to cameras, M_P_ three players, telephones, everything, agenda. So, like, I wonder if we might add something new to the to the remote control market, such as the lighting in your house, or um Yeah, yeah. An Yeah. Like, p personally for me, at home I've I've combined the um the audio video of my television set and my D_V_D_ player and my C_D_ player. So they w all work actually function together but I have different remote controls for each of them. So it's sort of ironic that that then they're in there um you know, the sound and everything it's just one system. But each one's got its own little part. Mm. Mm. Mm. Mm-hmm. Mm-hmm. Yeah. Yeah. That's just really good id Yep. Uh, sure. I remember when the first remote control my my family had was on a cable. Actually had a cable between it and the T_V_ and big like buttons that sort of like, like on a blender or something. And um, you know, when I think about what they are now, it's better, but actually it's still kind of, I dunno, like a massive junky thing on the table. Maybe we could think about how, could be more, you know, streamlined. S Something like that, yeah. Or whatever would be technologically reasonable. 'Cause it could b it could it could be that f it could be that functionally that doesn't make it any better, but that just the appeal of of not having You know, these days there's a r pe things in people's homes are becoming more and more like chic, you know. Um, nicer materials and might be be worth exploring anyway. Okay. Um. Before we wrap up, just to make sure we're all on the same page here, um, do we We were given sort of an example of a coffee machine or something, right? Well, um are we at ma right now on the assumption that our television remote control may have features which go beyond the television? Or are we keeping sort of like a a design commitment to television features? I I don't know. Yep. Yeah, sure. Okay. Okay, yeah. Okay. Okay. Okay. Alright."}], "model-index": [{"name": "MEETING_SUMMARY", "results": [{"task": {"type": "abstractive-text-summarization", "name": "Abstractive Text Summarization"}, "dataset": {"name": "samsum", "type": "samsum"}, "metrics": [{"type": "rouge-1", "value": 53.8795, "name": "Validation ROGUE-1"}, {"type": "rouge-2", "value": 28.4975, "name": "Validation ROGUE-2"}, {"type": "rouge-L", "value": 44.1899, "name": "Validation ROGUE-L"}, {"type": "rouge-Lsum", "value": 49.4863, "name": "Validation ROGUE-Lsum"}, {"type": "gen-length", "value": 30.088, "name": "Validation ROGUE-Lsum"}, {"type": "rouge-1", "value": 53.2284, "name": "Test ROGUE-1"}, {"type": "rouge-2", "value": 28.184, "name": "Test ROGUE-2"}, {"type": "rouge-L", "value": 44.122, "name": "Test ROGUE-L"}, {"type": "rouge-Lsum", "value": 49.0301, "name": "Test ROGUE-Lsum"}, {"type": "gen-length", "value": 29.9951, "name": "Test ROGUE-Lsum"}]}, {"task": {"type": "summarization", "name": "Summarization"}, "dataset": {"name": "bazzhangz/sumdataset", "type": "bazzhangz/sumdataset", "config": "bazzhangz--sumdataset", "split": "train"}, "metrics": [{"type": "rouge", "value": 40.5544, "name": "ROUGE-1", "verified": true}, {"type": "rouge", "value": 17.0751, "name": "ROUGE-2", "verified": true}, {"type": "rouge", "value": 32.153, "name": "ROUGE-L", "verified": true}, {"type": "rouge", "value": 36.4277, "name": "ROUGE-LSUM", "verified": true}, {"type": "loss", "value": 2.116729736328125, "name": "loss", "verified": true}, {"type": "gen_len", "value": 42.1978, "name": "gen_len", "verified": true}]}, {"task": {"type": "abstractive-text-summarization", "name": "Abstractive Text Summarization"}, "dataset": {"name": "xsum", "type": "xsum"}, "metrics": [{"type": "rouge-1", "value": 35.9078, "name": "Validation ROGUE-1"}, {"type": "rouge-2", "value": 14.2497, "name": "Validation ROGUE-2"}, {"type": "rouge-L", "value": 28.1421, "name": "Validation ROGUE-L"}, {"type": "rouge-Lsum", "value": 28.9826, "name": "Validation ROGUE-Lsum"}, {"type": "gen-length", "value": 32.0167, "name": "Validation ROGUE-Lsum"}, {"type": "rouge-1", "value": 36.0241, "name": "Test ROGUE-1"}, {"type": "rouge-2", "value": 14.3715, "name": "Test ROGUE-2"}, {"type": "rouge-L", "value": 28.1968, "name": "Test ROGUE-L"}, {"type": "rouge-Lsum", "value": 29.0527, "name": "Test ROGUE-Lsum"}, {"type": "gen-length", "value": 31.9933, "name": "Test ROGUE-Lsum"}]}, {"task": {"type": "abstractive-text-summarization", "name": "Abstractive Text Summarization"}, "dataset": {"name": "dialogsum", "type": "dialogsum"}, "metrics": [{"type": "rouge-1", "value": 39.8612, "name": "Validation ROGUE-1"}, {"type": "rouge-2", "value": 16.6917, "name": "Validation ROGUE-2"}, {"type": "rouge-L", "value": 32.2718, "name": "Validation ROGUE-L"}, {"type": "rouge-Lsum", "value": 35.8748, "name": "Validation ROGUE-Lsum"}, {"type": "gen-length", "value": 41.726, "name": "Validation ROGUE-Lsum"}, {"type": "rouge-1", "value": 36.9608, "name": "Test ROGUE-1"}, {"type": "rouge-2", "value": 14.3058, "name": "Test ROGUE-2"}, {"type": "rouge-L", "value": 29.3261, "name": "Test ROGUE-L"}, {"type": "rouge-Lsum", "value": 32.9, "name": "Test ROGUE-Lsum"}, {"type": "gen-length", "value": 43.086, "name": "Test ROGUE-Lsum"}]}, {"task": {"type": "summarization", "name": "Summarization"}, "dataset": {"name": "samsum", "type": "samsum", "config": "samsum", "split": "test"}, "metrics": [{"type": "rouge", "value": 53.1878, "name": "ROUGE-1", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiOTVkNTczYjFmYzBmMzczNWE0MGY4MDAyZWExOGNjZmY1Yzk2ZGM1MGNjZmFmYWUyZmIxZjdjOTk4OTc4OGJlMSIsInZlcnNpb24iOjF9.yyzPpGtESuZXy_lBESrboGxdGYB7I6jaIjquCYqliE2xdbGf5awDFpDUwlZHDuw6RD2mIZv1FC8PPs9lOHuSAg"}, {"type": "rouge", "value": 28.1666, "name": "ROUGE-2", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiMjAzOTdjNGYxNWMzYmFjYjRmMTcxYzI0MmNlNmM5Nzg2MzBlNDdmZWFkN2EwMDE2ZTZmYzc0Zjg0ZDc0M2IxNiIsInZlcnNpb24iOjF9.cPH6O50T6HekO227Xzha-EN_Jp7JS9fh5EP9I0tHxbpGptKtZOQC-NG68zfU2eJKlRSrmgaBYs8tjfTvpAgyDg"}, {"type": "rouge", "value": 44.117, "name": "ROUGE-L", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiNmNmMzJkYjMxMjhlZDM4YmU3NmI1MDExNzhiYmVhMzEyZGJjNDJkNzczNGQwOTMwNzg2YjU1ZWQ4MDhiMzkxYiIsInZlcnNpb24iOjF9.lcEXK15UqZOdXnPjVqIhFd6o_PLROSIONTRFX5NbwanjEI_MWMLpDh_V0Kpnvs_W0sE6cXh2yoifSYNDA5W7Bw"}, {"type": "rouge", "value": 49.0094, "name": "ROUGE-LSUM", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiYThkYjk4ZjMzYjI0OTAxNDJiZTU5MzE0YjI5MjEzYTYwNWEzMmU5NjU2ZjQ5NzJhMzkyNmVhNWFjZmM1MjAwMSIsInZlcnNpb24iOjF9.LTn6LpKuMO4Rv4NgsbPmtr2ewiKyoqAXlf6YJfM_6GKwVTKpnJxwx7gaaAtMb0jVlgieITMP11JmbeRfMEhgDg"}, {"type": "loss", "value": 1.710614562034607, "name": "loss", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiNjNjZmM0ZjkwYWYyMWIyMmFiMWI1ODBiYjRjNzVhM2JhN2NmNmM1ZDUwZWRjNDQxNzUwMWM4YjYxYTg1MWYwNyIsInZlcnNpb24iOjF9.hGXZhp9pe-HDJilXVvMCkqz-92YZvH6Qr7q9Z7fJkm8N9s0b4sl-4PwjQYJEOLEAhoRO2s-F5T3bmCYCaMiNBQ"}, {"type": "gen_len", "value": 29.9951, "name": "gen_len", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiZmY1NzZiMDAzNGJlNTg4Nzc0YzU1MTA3YTI3MzVmNGZkNWQ0ZDE4MGZlNGI1MzJmYzA3MjQ0MDZhMTcyYTk2NCIsInZlcnNpb24iOjF9.8dvMfY7Y-nw-K8NGgTXIGFMxaSUWQYBE1w3N5YYOn4iwnCe2ugo2qPIOxLY91q7CaAOMCSskFV3BDStQ4p0ZCg"}]}]}]}, "description": "\nModel obtained by Fine Tuning 'facebook/bart-large-xsum' using AMI Meeting Corpus, SAMSUM Dataset, DIALOGSUM Dataset, XSUM Dataset!\n## Usage\n# Example 1\n```python\nfrom transformers import pipeline\nsummarizer = pipeline(\"summarization\", model=\"knkarthick/MEETING_SUMMARY\")\ntext = '''The tower is 324 metres (1,063 ft) tall, about the same height as an 81-storey building, and the tallest structure in Paris. Its base is square, measuring 125 metres (410 ft) on each side. During its construction, the Eiffel Tower surpassed the Washington Monument to become the tallest man-made structure in the world, a title it held for 41 years until the Chrysler Building in New York City was finished in 1930. It was the first structure to reach a height of 300 metres. Due to the addition of a broadcasting aerial at the top of the tower in 1957, it is now taller than the Chrysler Building by 5.2 metres (17 ft). Excluding transmitters, the Eiffel Tower is the second tallest free-standing structure in France after the Millau Viaduct. \n'''\nsummarizer(text)\n```\n# Example 2\n```python\nfrom transformers import pipeline\nsummarizer = pipeline(\"summarization\", model=\"knkarthick/MEETING_SUMMARY\")\ntext = '''Bangalore is the capital and the largest city of the Indian state of Karnataka. It has a population of more than 8 million and a metropolitan population of around 11 million, making it the third most populous city and fifth most populous urban agglomeration in India. Located in southern India on the Deccan Plateau, at a height of over 900 m (3,000 ft) above sea level, Bangalore is known for its pleasant climate throughout the year. Its elevation is the highest among the major cities of India.The city's history dates back to around 890 CE, in a stone inscription found at the Nageshwara Temple in Begur, Bangalore. The Begur inscription is written in Halegannada (ancient Kannada), mentions 'Bengaluru Kalaga' (battle of Bengaluru). It was a significant turning point in the history of Bangalore as it bears the earliest reference to the name 'Bengaluru'. In 1537 CE, Kemp\u00e9 Gowd\u0101 \u2013 a feudal ruler under the Vijayanagara Empire \u2013 established a mud fort considered to be the foundation of modern Bangalore and its oldest areas, or petes, which exist to the present day.\nAfter the fall of Vijayanagar empire in 16th century, the Mughals sold Bangalore to Chikkadevaraja Wodeyar (1673\u20131704), the then ruler of the Kingdom of Mysore for three lakh rupees. When Haider Ali seized control of the Kingdom of Mysore, the administration of Bangalore passed into his hands. \nThe city was captured by the British East India Company after victory in the Fourth Anglo-Mysore War (1799), who returned administrative control of the city to the Maharaja of Mysore. The old city developed in the dominions of the Maharaja of Mysore and was made capital of the Princely State of Mysore, which existed as a nominally sovereign entity of the British Raj. In 1809, the British shifted their cantonment to Bangalore, outside the old city, and a town grew up around it, which was governed as part of British India. Following India's independence in 1947, Bangalore became the capital of Mysore State, and remained capital when the new Indian state of Karnataka was formed in 1956. The two urban settlements of Bangalore \u2013 city and cantonment \u2013 which had developed as independent entities merged into a single urban centre in 1949. The existing Kannada name, Bengal\u016bru, was declared the official name of the city in 2006.\nBangalore is widely regarded as the \"Silicon Valley of India\" (or \"IT capital of India\") because of its role as the nation's leading information technology (IT) exporter. Indian technological organisations are headquartered in the city. A demographically diverse city, Bangalore is the second fastest-growing major metropolis in India. Recent estimates of the metro economy of its urban area have ranked Bangalore either the fourth- or fifth-most productive metro area of India. As of 2017, Bangalore was home to 7,700 millionaires and 8 billionaires with a total wealth of $320 billion. It is home to many educational and research institutions. Numerous state-owned aerospace and defence organisations are located in the city. The city also houses the Kannada film industry. It was ranked the most liveable Indian city with a population of over a million under the Ease of Living Index 2020.\n'''\nsummarizer(text)\n```\n\n# Example 3\n```python\nfrom transformers import pipeline\nsummarizer = pipeline(\"summarization\", model=\"knkarthick/MEETING_SUMMARY\")\ntext = '''Hi, I'm David and I'm supposed to be an industrial designer. Um, I just got the project announcement about what the project is. Designing a remote control. That's about it, didn't get anything else. Did you get the same thing? Cool. There's too much gear. Okay. Can't draw. Um. Yeah. Um, well anyway, I don't know, it's just the first animal I can think off the top of my head. Um. Yes. Big reason is 'cause I'm allergic to most animals. Allergic to animal fur, so um fish was a natural choice. Um, yeah, and I kind of like whales. They come in and go eat everything in sight. And they're quite harmless and mild and interesting. Tail's a bit big, I think. It's an after dinner dog then. Hmm. It does make sense from maybe the design point of view 'cause you have more complicated characters like European languages, then you need more buttons. So, possibly. Hmm. Yeah. And you keep losing them. Finding them is really a pain, you know. I mean it's usually quite small, or when you want it right, it slipped behind the couch or it's kicked under the table. You know. Yep. Mm-hmm. I think one factor would be production cost. Because there's a cap there, so um depends on how much you can cram into that price. Um. I think that that's the main factor. Cool.\nOkay. Right. Um well this is the kick-off meeting for our our project. Um and um this is just what we're gonna be doing over the next twenty five minutes. Um so first of all, just to kind of make sure that we all know each other, I'm Laura and I'm the project manager. Do you want to introduce yourself again? Okay. Great. Okay. Um so we're designing a new remote control and um Oh I have to record who's here actually. So that's David, Andrew and Craig, isn't it? And you all arrived on time. Um yeah so des uh design a new remote control. Um, as you can see it's supposed to be original, trendy and user friendly. Um so that's kind of our our brief, as it were. Um and so there are three different stages to the design. Um I'm not really sure what what you guys have already received um in your emails. What did you get? Mm-hmm. Is that what everybody got? Okay. Um. So we're gonna have like individual work and then a meeting about it. And repeat that process three times. Um and at this point we get try out the whiteboard over there. Um. So uh you get to draw your favourite animal and sum up your favourite characteristics of it. So who would like to go first? Very good. Mm-hmm. Yeah. Yeah. Right. Lovely. Right. You can take as long over this as you like, because we haven't got an awful lot to discuss. Ok oh we do we do. Don't feel like you're in a rush, anyway. Ach why not We might have to get you up again then. I don't know what mine is. I'm gonna have to think on the spot now. Is that a whale? Ah. Okay. God, I still don't know what I'm gonna write about. Um. I was gonna choose a dog as well. But I'll just draw a different kind of dog. M my favourite animal is my own dog at home. Um That doesn't really look like him, actually. He looks more like a pig, actually. Ah well. Do you? Oh that's very good of you. Uh. Um he's a mixture of uh various things. Um and what do I like about him, um That's just to suggest that his tail wags. Um he's very friendly and cheery and always pleased to see you, and very kind of affectionate and um uh and he's quite quite wee as well so you know he can doesn't take up too much space. Um and uh And he does a funny thing where he chases his tail as well, which is quite amusing, so It is. I think it is. He only does it after he's had his dinner and um he'll just all of a sudden just get up and start chasing his tail 'round the living room. Yeah, so uh Yeah, maybe. Maybe. Right, um where did you find this? Just down here? Yeah. Okay. Um what are we doing next? Uh um. Okay, uh we now need to discuss the project finance. Um so according to the brief um we're gonna be selling this remote control for twenty five Euro, um and we're aiming to make fifty million Euro. Um so we're gonna be selling this on an international scale. And uh we don't want it to cost any more than uh twelve fifty Euros, so fifty percent of the selling price. Sure. All together. Um I dunno. I imagine That's a good question. I imagine it probably is our sale actually because it's probably up to the the um the retailer to uh sell it for whatever price they want. Um. But I I don't know, I mean do you think the fact that it's going to be sold internationally will have a bearing on how we design it at all? Think it will? Um. Hmm. Oh yeah, regions and stuff, yeah. Yeah. Okay. Yeah. Well for a remote control, do you think that will be I suppose it's depends on how complicated our remote control is. Yeah, yeah. Okay. What, just like in terms of like the wealth of the country? Like how much money people have to spend on things like? Aye, I see what you mean, yeah. Marketing. Good marketing thoughts. Oh gosh, I should be writing all this down. Um. Mm. Yeah. Yeah, yeah. Like how much does, you know, a remote control cost. Well twenty five Euro, I mean that's um that's about like eighteen pounds or something, isn't it? Or no, is it as much as that? Sixteen seventeen eighteen pounds. Um, I dunno, I've never bought a remote control, so I don't know how how good a remote control that would get you. Um. But yeah, I suppose it has to look kind of cool and gimmicky. Um right, okay. Let me just scoot on ahead here. Okay. Um well d Does anybody have anything to add to uh to the finance issue at all? Thin No, actually. That would be useful, though, wouldn't it, if you knew like what your money would get you now. Mm-hmm. Yeah, yeah. Oh. Five minutes to end of meeting. Oh, okay. We're a bit behind. Yeah. Right, so do you think that should be like a main design aim of our remote control d you know, do your your satellite and your regular telly and your V_C_R_ and everything? Mm-hmm. Yeah. Or even like, you know, notes about um what you wanna watch. Like you might put in there oh I want to watch such and such and look a Oh that's a good idea. So extra functionalities. Mm-hmm. Hmm. Um okay, uh I'd wel we're gonna have to wrap up pretty quickly in the next couple of minutes. Um I'll just check we've nothing else. Okay. Um so anything else anybody wants to add about what they don't like about remote controls they've used, what they would really like to be part of this new one at all? You keep losing them. Okay. Yeah. W You get those ones where you can, if you like, whistle or make a really high pitched noise they beep. There I mean is that something we'd want to include, do you think? Dunno. Okay maybe. My goodness. Still feels quite primitive. Maybe like a touch screen or something? Okay. Uh-huh, okay. Well I guess that's up to our industrial designer. It looks better. Yeah. Okay. Okay. Right, well um so just to wrap up, the next meeting's gonna be in thirty minutes. So that's about um about ten to twelve by my watch. Um so inbetween now and then, um as the industrial designer, you're gonna be working on you know the actual working design of it so y you know what you're doing there. Um for user interface, technical functions, I guess that's you know like what we've been talking about, what it'll actually do. Um and uh marketing executive, you'll be just thinking about what it actually what, you know, what requirements it has to has to fulfil and you'll all get instructions emailed to you, I guess. Um. Yeah, so it's th the functional design stage is next, I guess. And uh and that's the end of the meeting. So I got that little message a lot sooner than I thought I would, so Mm-hmm. Uh-huh, yeah. Th Okay, well just very quickly 'cause this we're supposed to finish now. Um I guess that's up to us, I mean you probably want some kind of unique selling point of it, so um, you know Yeah. Mm-hmm. Yeah. Okay. Right, okay, we'll that's that's the end of the meeting, then. Um. So, uh thank you all for coming.\nUm I'm Craig and I'm User Interface. Yeah. Well, my favourite animal would be a monkey. Then they're small cute and furry, and uh when planet of the apes becomes real, I'm gonna be up there with them. Yeah. I know um My parents went out and bought um remote controls because um they got fed up of having four or five different remote controls for each things the house. So um for them it was just how many devices control. Uh.\nMm-hmm. Great. And I'm Andrew and I'm uh our marketing expert. Mm-hmm. Mm-hmm. Yeah, that's that's it. Yeah. I will go. That's fine. Alright. So This one here, right? Okay. Very nice. Alright. My favourite animal is like A beagle. Um charac favourite characteristics of it? Is that right? Uh, right, well basically um high priority for any animal for me is that they be willing to take a lot of physical affection from their family. And, yeah that they have lots of personality and uh be fit and in robust good health. So this is blue. Blue beagle. My family's beagle. I coulda told you a whole lot more about beagles. Boy, let me tell you. Impressionist. Alright. Mm. Superb sketch, by the way. Yep. I see a dog in there. Yep. Now I see a rooster. What kind is it? Is he aware that th it's his own cha tail he's chasing? Hmm. Probably when he was little he got lots of attention for doing it and has forever been conditioned. 'Kay. Um, can we just go over that again? Uh, so bas at twel Alright, yeah. Okay. So cost like production cost is twelve fifty, but selling price is is that wholesale or retail? Like on the shelf. Our sale our sale anyway. Yeah, okay okay. Okay. Mm-hmm. Alright. Yes. Mm-hmm. Mm-hmm. Well right away I'm wondering if there's um th th uh, like with D_V_D_ players, if there are zones. Um f frequencies or something um as well as uh characters, um different uh keypad styles and s symbols. Um. I don't know. Yeah. Yeah. Yeah. And then a and then al the other thing international is on top of the price. I'm thinking the price might might appeal to a certain market in one region, whereas in another it'll be different, so Just a chara just a characteristic of the Just Or just like, basic product podi positioning, the twenty five Euro remote control might be a big hit in London, might not be such a big hit in Greece, who knows, something like that, yeah. Yep. Right away I'm making some kind of assumptions about what what information we're given here, thinking, 'kay trendy probably means something other than just basic, something other than just standard. Um so I'm wondering right away, is selling twenty five Euros, is that sort of the thi is this gonna to be like the premium product kinda thing or Uh-huh. Mm-hmm. Yep. Yeah, I'd say so, yeah. No. Yeah, yeah. Mm-hmm. Do we have any other background information on like how that compares to other other Yeah. Mm-hmm. Yeah, interesting thing about discussing um production of a remote control for me is that l as you point out, I just don't think of remote controls as somethin something people consciously assess in their purchasing habits. It's just like getting shoelaces with shoes or something. It just comes along. Do you know what I mean? Like so sort of like how do you I I mean one one way of looking at it would be, well the people producing television sets, maybe they have to buy remote controls. Or another way is maybe people who have T_V_ sets are really fed up with their remote control and they really want a better one or something. But Right. Right. Okay so Right, so in function one of the priorities might be to combine as many uses I think so. Yeah, yeah. Yeah. Well like um, maybe what we could use is a sort of like a example of a successful other piece technology is palm palm pilots. They're gone from being just like little sort of scribble boards to cameras, M_P_ three players, telephones, everything, agenda. So, like, I wonder if we might add something new to the to the remote control market, such as the lighting in your house, or um Yeah, yeah. An Yeah. Like, p personally for me, at home I've I've combined the um the audio video of my television set and my D_V_D_ player and my C_D_ player. So they w all work actually function together but I have different remote controls for each of them. So it's sort of ironic that that then they're in there um you know, the sound and everything it's just one system. But each one's got its own little part. Mm. Mm. Mm. Mm-hmm. Mm-hmm. Yeah. Yeah. That's just really good id Yep. Uh, sure. I remember when the first remote control my my family had was on a cable. Actually had a cable between it and the T_V_ and big like buttons that sort of like, like on a blender or something. And um, you know, when I think about what they are now, it's better, but actually it's still kind of, I dunno, like a massive junky thing on the table. Maybe we could think about how, could be more, you know, streamlined. S Something like that, yeah. Or whatever would be technologically reasonable. 'Cause it could b it could it could be that f it could be that functionally that doesn't make it any better, but that just the appeal of of not having You know, these days there's a r pe things in people's homes are becoming more and more like chic, you know. Um, nicer materials and might be be worth exploring anyway. Okay. Um. Before we wrap up, just to make sure we're all on the same page here, um, do we We were given sort of an example of a coffee machine or something, right? Well, um are we at ma right now on the assumption that our television remote control may have features which go beyond the television? Or are we keeping sort of like a a design commitment to television features? I I don't know. Yep. Yeah, sure. Okay. Okay, yeah. Okay. Okay. Okay. Alright.\n'''\nsummarizer(text)\n```\n\n# Example 4\n```python\nfrom transformers import pipeline\nsummarizer = pipeline(\"summarization\", model=\"knkarthick/MEETING_SUMMARY\")\ntext = '''\nDas : Hi and welcome to the a16z podcast. I\u2019m Das, and in this episode, I talk SaaS go-to-market with David Ulevitch and our newest enterprise general partner Kristina Shen. The first half of the podcast looks at how remote work impacts the SaaS go-to-market and what the smartest founders are doing to survive the current crisis. The second half covers pricing approaches and strategy, including how to think about free versus paid trials and navigating the transition to larger accounts. But we start with why it\u2019s easier to move upmarket than down\u2026 and the advantage that gives a SaaS startup against incumbents.\nDavid : If you have a cohort of customers that are paying you $10,000 a year for your product, you\u2019re going to find a customer that self-selects and is willing to pay $100,000 a year. Once you get one of those, your organization will figure out how you sell to, how you satisfy and support, customers at that price point and that size. But it\u2019s really hard for a company that sells up market to move down market, because they\u2019ve already baked in all that expensive, heavy lifting sales motion. And so as you go down market with a lower price point, usually, you can\u2019t actually support it.\nDas : Does that mean that it\u2019s easier for a company to do this go-to-market if they\u2019re a new startup as opposed to if they\u2019re a pre-existing SaaS?\nKristina : It\u2019s culturally very, very hard to give a product away for free that you\u2019re already charging for. It feels like you\u2019re eating away at your own potential revenue when you do it. So most people who try it end up pulling back very quickly.\nDavid : This is actually one of the key reasons why the bottoms up SaaS motion is just so competitive, and compelling, and so destructive against the traditional sales-driven test motion. If you have that great product and people are choosing to use it, it\u2019s very hard for somebody with a sales-driven motion, and all the cost that\u2019s loaded into that, to be able to compete against it. There are so many markets where initially, we would look at companies and say, \u201cOh, well, this couldn\u2019t possibly be bottoms up. It has to be sold to the CIO. It has to be sold to the CSO or the CFO.\u201d But in almost every case we\u2019ve been wrong, and there has been a bottoms up motion. The canonical example is Slack. It\u2019s crazy that Slack is a bottoms up company, because you\u2019re talking about corporate messaging, and how could you ever have a messaging solution that only a few people might be using, that only a team might be using? But now it\u2019s just, \u201cOh, yeah, some people started using it, and then more people started using it, and then everyone had Slack.\u201d\nKristina : I think another classic example is Dropbox versus Box. Both started as bottoms up businesses, try before you buy. But Box quickly found, \u201cHey, I\u2019d rather sell to IT.\u201d And Dropbox said, \u201cHey, we\u2019ve got a great freemium motion going.\u201d And they catalyzed their business around referrals and giving away free storage and shared storage in a way that really helped drive their bottoms up business.\nDas : It\u2019s a big leap to go from selling to smaller customers to larger customers. How have you seen SaaS companies know or get the timing right on that? Especially since it does seem like that\u2019s really related to scaling your sales force?\nKristina : Don\u2019t try to go from a 100-person company to a 20,000-person company. Start targeting early adopters, maybe they\u2019re late stage pre-IPO companies, then newly IPO\u2019d companies. Starting in tech tends to be a little bit easier because they tend to be early adopters. Going vertical by vertical can be a great strategy as well. Targeting one customer who might be branded in that space, can help brand yourself in that category. And then all their competitors will also want your product if you do a good job. A lot of times people will dedicate a sales rep to each vertical, so that they become really, really knowledgeable in that space, and also build their own brand and reputation and know who are the right customers to target.\nDas : So right now, you\u2019ve got a lot more people working remote. Does this move to remote work mean that on-premise software is dying? And is it accelerating the move to software as a service?\nKristina : This remote work and working from home is only going to catalyze more of the conversion from on-premise over to cloud and SaaS. In general, software spend declines 20% during an economic downturn. This happened in \u201908, this happened in \u201901. But when we look at the last downturn in \u201908, SaaS spend actually, for public companies, increased, on average, 10%, which means there\u2019s a 30% spread, which really shows us that there was a huge catalyst from people moving on-premise to SaaS.\nDavid : And as people work remote, the ability to use SaaS tools is much easier than having to VPN back into your corporate network. We\u2019ve been seeing that, inside sales teams have been doing larger and larger deals, essentially moving up market on the inside, without having to engage with field sales teams. In fact, a lot of the new SaaS companies today rather than building out a field team, they have a hybrid team, where people are working and closing deals on the inside and if they had to go out and meet with a customer, they would do that. But by and large, most of it was happening over the phone, over email, and over videoconferencing. And all the deals now, by definition, are gonna be done remote because people can\u2019t go visit their customers in person.\nDas : So with bottoms up, did user behavior and buyer behavior change, so the go-to-market evolved? Or did the go-to-market evolve and then you saw user and buyer behavior change? I\u2019m curious with this move to remote work. Is that going to trigger more changes or has the go-to-market enabled that change in user behavior, even though we see that change coming because of a lot of forces outside of the market?\nKristina : I definitely think they are interrelated. But I do think it was a user change that catalyzed everything. We decided that we preferred better software, and we tried a couple products. We were able to purchase off our credit card. And then IT and procurement eventually said, \u201cWow, everyone\u2019s buying these already, I might as well get a company license and a company deal so I\u2019m not paying as much.\u201d While obviously software vendors had to offer the products that could be self-served, users started to realize they had the power, they wanted to use better software, they paid with their credit cards. And now software vendors are forced to change their go-to-market to actually suit that use case.\nDas : If that\u2019s the case that when user behavior has changed, it\u2019s tended to be the catalyzing force of bigger changes in the go-to-market, what are some of the changes you foresee for SaaS because the world has changed to this new reality of remote work and more distributed teams?\nDavid : We\u2019re in a very uncertain economic environment right now. And a couple of things will become very clear over the next 3 to 9 to 15 months \u2014 you\u2019re going to find out which SaaS products are absolutely essential to helping a business operate and run, and which ones were just nice to have and may not get renewed. I think on the customer, buying side, you\u2019re very likely to see people push back on big annual commitments and prefer to go month-to-month where they can. Or you\u2019ll see more incentives from SaaS startups to offer discounts for annual contracts. You\u2019re going to see people that might sign an annual contract, but they may not want to pay upfront. They may prefer to meter the cash out ratably over the term of the contract. And as companies had empowered and allowed budget authority to be pushed down in organizations, you\u2019re gonna see that budget authority get pulled back, more scrutiny on spending, and likely a lot of SaaS products not get renewed that turned out to not be essential.\nKristina : I think the smartest founders are making sure they have the runway to continue to exist. And they\u2019re doing that in a couple of ways. They\u2019re preserving cash, and they are making sure that their existing customers are super, super happy, because retaining your customers is so important in this environment. And they\u2019re making sure that they have efficient or profitable customer acquisition. Don\u2019t spend valuable dollars acquiring customers. But acquire customers efficiently that will add to a great existing customer base.\nDas : To go into pricing and packaging for SaaS for a moment, what are some of the different pricing approaches that you see SaaS companies taking?\nKristina : The old school way of doing SaaS go-to-market is bundle everything together, make the pricing super complex, so you don\u2019t actually understand what you\u2019re paying for. You\u2019re forced to purchase it because you need one component of the product. New modern SaaS pricing is keep it simple, keep it tied to value, and make sure you\u2019re solving one thing really, really well.\nDavid : You want to make it easy for your customers to give you money. And if your customers don\u2019t understand your pricing, that\u2019s a huge red flag. Sometimes founders will try to over engineer their pricing model.\nKristina : We talk a lot about everything has to be 10X better than the alternatives. But it\u2019s much easier to be 10X better when you solve one thing very, very well, and then have simple pricing around it. I think the most common that most people know about is PEPM or per employee per month, where you\u2019re charging basically for every single seat. Another really common model is the freemium model. So, think about a Dropbox, or an Asana, or a Skype, where it\u2019s trigger based. You try the product for free, but when you hit a certain amount of storage, or a certain amount of users, then it converts over to paid. And then you also have a time trial, where you get the full experience of the product for some limited time period. And then you\u2019re asked if you want to continue using the product to pay. And then there\u2019s pay as go, and particularly, pay as you go as a usage model. So, Slack will say, \u201cHey, if your users aren\u2019t actually using the product this month, we won\u2019t actually charge you for it.\u201d\nDavid : The example that Kristina made about Slack and users, everybody understands what a user is, and if they\u2019re using the product, they pay for it, and if they\u2019re not using it, they don\u2019t pay for it. That\u2019s a very friendly way to make it easy for your customers to give you money. If Slack came up with a pricing model that was like based on number of messages, or number of API integration calls, the customer would have no idea what that means.\nKristina : There\u2019s also the consumption model. So Twilio only charges you for every SMS text or phone call that you make on the platform any given month. And so they make money or lose money as your usage goes. The pricing is very aligned to your productivity.\nDavid : Generally, those are for products where the usage only goes in one direction. If you think of a company like Databricks, where they\u2019re charging for storage, or Amazon\u2019s S3 service, it is very aligned with the customer, but it also strategically aligns with the business because they know the switching cost is very high, the churn is very low. And generally, in those businesses, you\u2019re only going to store more data, so they can charge based on usage or volume of data.\nKristina : Recently, there\u2019s been a huge trend of payment as a revenue. It\u2019s particularly common in vertical markets where SaaS companies are adding payments as a revenue in addition to their employee or subscription revenue. If you look at Shopify, for example, more than 50% of their revenue is actually payment revenue. They\u2019re making money every single time you purchase something off one of their shopping cart websites.\nDas : When you\u2019re working with a founder or a SaaS startup, how have you seen them find the right pricing model for their product, for their market?\nKristina : Step one is just talk to a lot of customers. Try to figure out what is the market pricing for possible alternatives or competitors, understand their pain points and their willingness to pay. And just throw a price out there, because you have to have a starting point in order to actually test and iterate. Particularly in the SMB, or the bottoms up business, you can test and iterate pretty quickly because you have so many data points.\nDavid : I always tell founders, step one is to just go out there and talk to customers. Step two is just double your prices. I don\u2019t think there\u2019s ever been a great company with a great product that\u2019s fallen apart because their pricing was wrong. But a lot of SaaS startup founders really under price, and you don\u2019t want to find out two or three years later that you were 200% underpriced. A very common thing that SaaS companies do, they\u2019ll have the basic package that either is free or low cost, that you can just sign up online for. They\u2019ll have a middle package where they share some pricing, and then they\u2019ll have the enterprise package where you have to contact sales to find out more. And that way they don\u2019t actually have to show the pricing for that third package. And that gives the salespeople the flexibility to adjust pricing on a per deal basis.\nDas : When you\u2019re working with companies, why are they underpricing their products?\nDavid : I think it\u2019s psychological. People need to price on value, and they don\u2019t know how much value they\u2019re delivering relative to \u201cOh, it only cost me $100 a month to provide this service, so I just need to charge $200.\u201d But if it turns out you\u2019re saving your customer $50,000 a year, then you\u2019re wildly underpriced. You have to remember that SaaS is essentially a proxy for outsourced IT. You\u2019re spending money on a SaaS service to not pay to develop something internally, or to have to pay IT to support something that\u2019s more complex on-prem. Software is much cheaper than people, and so generally, the price point can be much higher.\nKristina : And the other thing is your value increases over time. You\u2019re delivering more features, more products, you understand the customer better. It\u2019s the beauty of the SaaS model and cloud model that you can iterate and push code immediately, and the customer immediately sees value. A lot of times people have the same price point from the first customer sold to three years later and the 200th customer. Quite frankly, you\u2019ve delivered so much value along the way that your price point should have gone up. The other thing I\u2019ll say is a lot of people discount per seat pricing a lot as they move up market. We tend to tell people that the best validation of your product having great product market fit is your ability to hold your price point. So while there is some natural discounting on a per seat basis because people do deserve some volume discounting, I would say try to resist that as much as possible.\nDas : Especially for a technical founder, it\u2019s so tempting to get in there and fiddle with these knobs. How do you know when it is time to experiment with your pricing and packaging?\nDavid : If you\u2019re looking at your business and you see that you are doing more deals, and they\u2019re closing faster, you should raise your pricing. And you pay attention to how long it takes to close deals and whether the number of deals is staying consistent as you do that. And, at some point, you\u2019re going to find out when you\u2019re losing deals on price. I think a moment where companies have to plan ahead to avoid having to course correct is after they roll out massive pricing and packaging changes, which are pretty natural as companies move up market. But how they navigate that transition to larger accounts, and how they either bring along or move away from those smaller, earlier customers who got them to where they are, tends to be really important because they can get a lot of noise on Twitter, they can get a lot of blowback from their customers. So Zendesk is a company where they rolled out a major packaging change. And when they rolled it out, they hadn\u2019t planned on grandfathering in their early customers. They got a lot of pushback, and very quickly, they put out a blog post and said, \u201cWe hear what you\u2019re saying, we appreciate you building the business that we\u2019ve become today. We do need to have a package for the future. But all the people that have been customers so far will be grandfathered in for at least a period of time into the old model.\u201d\nKristina : If you iterate pricing constantly, you don\u2019t really have this problem because your customers will be used to pricing changes. You normally pair them with new features, and it all kind of works out. But if you have to go through a big grandfather change, I tend to lean towards treating your early customers really, really well. They adopted when you weren\u2019t a big company yet. They probably co-built the product with you in many ways. And so, it\u2019s great to get more dollars out of your customer base, but treat your early customers well.\nDas : Are there any other failure modes that you see startups really falling into around pricing and packaging or any common mistakes that they make?\nDavid : I think a lot of founders don\u2019t always map out the cost or model of their pricing and their product relative to their cost of actually doing sales and marketing and customer acquisition.\nKristina : Inside sales is so popular in Silicon Valley. When you\u2019re selling more to an SMB or mid-market type customer, the expectation is that you\u2019re educating and helping the prospective customer over the phone. And so, you\u2019re not expected to be as high touch. But 5K is almost the minimum price point you need to sell to the SMB with an inside sales team in order to pay for the outbound costs and all the conversions, because there is typically a team that sits around the quota carrying rep. And so, price matching \u2014 how much your price point is compared to what your go-to-market motion is \u2014 matters a lot. Other big failure modes that I see, people guess the ramp time of a sales rep wrong. And ramp time really ties to the segment of customer you\u2019re selling into. It tends be that if you\u2019re selling into the enterprise, the ramp time for sales reps, because sales cycles are so long, tend to be much longer as well. They could be six months plus, could be a year. While if you\u2019re selling more into SMB or mid-market, the ramp time to get a rep up and running can be much shorter, three to six months. Because the sales cycles are shorter, they just iterate much faster, and they ramp up much more quickly.\nDavid : The other thing that people have to understand is that sales velocity is a really important component to figuring out how many reps you should be hiring, whether they should be inside reps or field reps. If it takes you 90 days to close a deal, that can\u2019t be a $5,000 a year deal, that has to be a $50,000 or even $150,000 a year deal.\nDas : Kristina, I know you\u2019ve done a lot of work with metrics. So how do those play in?\nKristina : Probably the one way to sum it all together is how many months does it take to pay back customer acquisition cost. Very commonly within the SaaS world, we talk about a 12-month CAC payback. We typically want to see for every dollar you spend on sales and marketing, you get a dollar back within a year. That means you can tweak the inputs any way you want. Let\u2019s say that doing paid acquisition is really effective for you. Then, you can spend proportionally more on paid acquisition and less on sales reps. Vice versa, if you have a great inbound engine, you actually can hire a lot more sales reps and spend more on sales headcount. With all formulas, it\u2019s a guide rail, so if you have customers that retain really, really well, let\u2019s say you\u2019re selling to the enterprise, and you\u2019ve got a 90% or 95% annual retention rate, then your CAC payback could be between 12 and 24 months. But let\u2019s say you\u2019re selling to the SMB and churn is 2% or 3% monthly, which ends up being like 80% to 90% annual retention. Then, because your customer is less sticky, I would recommend looking at a CAC payback of 6 to 12 months.\nDas : How should you think about doing a free trial versus a paid trial?\nDavid : On the one hand, the bottoms up motion where people can try essentially a full version of a product before they buy it is extremely powerful. On the other hand, I\u2019ve started to try to think about how I advise companies, when they are thinking about a free trial for something that might cost $100,000 or $200,000 a year? Do we do a paid pilot that has some sort of contractual obligation that if we meet then turns into a commercial engagement?\nKristina : I do think the beauty of the bottoms up business is that you can get people to try the entire experience of the product for free, and they fall in love with it, and a certain percentage will convert. And that works really, really well for products that can self-serve. When you start moving up market to more complex products, the challenge with trials is it takes work to actually implement the product, whether it be integrations, IT has to give access, etc. You lose that self-serve ability, which is so amazing in the trial. And so, I tend to be more in the camp of paid trials, if it costs you money to actually deploy the trial. And when you\u2019re selling to bigger customers, they associate value when they have to pay. Once a customer has to pay you, then they feel a need to make the project successful and thus they will onboard, schedule things, give you data and access.\nDavid : If you can get to a point where you get the customer to do that paid pilot, such that the only difference between a pilot and an actual customer is just the signing of a contract, that\u2019s very powerful. Now, that does force you to have a really good pre-sales motion to make sure that you can deliver on the promise you\u2019ve made your customers. When companies don\u2019t have a great product, and they paper over it with professional services and sales engineering and post-sales support, that paid pilot thing doesn\u2019t work because the experience isn\u2019t good enough. So, it really is incumbent on the SaaS company that does a paid pilot to make sure that they are able to deliver on that experience.\nKristina : And one emerging trend recently is people signing an annual contract with a one or three month out, as a replacement to the paid pilot. Because it\u2019s the best of both worlds, the SaaS company that\u2019s selling the product gets a higher level of commitment. And the customer gets the optionality of opting out in the same way as a trial without any clawback. It really comes down to where procurement falls. Sometimes procurement is at the beginning of that decision, which makes it more like an annual contract. Sometimes procurement is at the one or three month opt-out period, which means the customer already has a great experience, loves the product, and it is an easier way to convert procurements to actually sign on\u2026\nDavid : And that is a really good segue into renewals. I always tell founders, you might have this subscription business, but it\u2019s not a recurring revenue business until the second year when the revenue actually recurs. I think you really have the first three months to get a customer up and running and happy. And if they\u2019re not, you then have about three months to fix it. And if all that works out, then the remaining six months of the contract can be focused on upsell and expansion.\nDas : Awesome. Thank you, Kristina. Thank you, David.\nKristina : Thanks so much for having us. This was fun.\nDavid : Yeah, a lot of fun, great topics, and our favorite thing to talk about.\n'''\nsummarizer(text)\n```"} {"downloads": 4777, "id": "pszemraj/long-t5-tglobal-base-16384-book-summary", "likes": 63, "pipeline_tag": "summarization", "task": "summarization", "meta": {"tags": ["summarization", "summary", "booksum", "long-document", "long-form"], "license": ["apache-2.0", "bsd-3-clause"], "datasets": ["kmfoda/booksum"], "metrics": ["rouge"], "widget": [{"text": "large earthquakes along a given fault segment do not occur at random intervals because it takes time to accumulate the strain energy for the rupture. The rates at which tectonic plates move and accumulate strain at their boundaries are approximately uniform. Therefore, in first approximation, one may expect that large ruptures of the same fault segment will occur at approximately constant time intervals. If subsequent main shocks have different amounts of slip across the fault, then the recurrence time may vary, and the basic idea of periodic mainshocks must be modified. For great plate boundary ruptures the length and slip often vary by a factor of 2. Along the southern segment of the San Andreas fault the recurrence interval is 145 years with variations of several decades. The smaller the standard deviation of the average recurrence interval, the more specific could be the long term prediction of a future mainshock.", "example_title": "earthquakes"}, {"text": " A typical feed-forward neural field algorithm. Spatiotemporal coordinates are fed into a neural network that predicts values in the reconstructed domain. Then, this domain is mapped to the sensor domain where sensor measurements are available as supervision. Class and Section Problems Addressed Generalization (Section 2) Inverse problems, ill-posed problems, editability; symmetries. Hybrid Representations (Section 3) Computation & memory efficiency, representation capacity, editability: Forward Maps (Section 4) Inverse problems Network Architecture (Section 5) Spectral bias, integration & derivatives. Manipulating Neural Fields (Section 6) Edit ability, constraints, regularization. Table 2: The five classes of techniques in the neural field toolbox each addresses problems that arise in learning, inference, and control. (Section 3). We can supervise reconstruction via differentiable forward maps that transform Or project our domain (e.g, 3D reconstruction via 2D images; Section 4) With appropriate network architecture choices, we can overcome neural network spectral biases (blurriness) and efficiently compute derivatives and integrals (Section 5). Finally, we can manipulate neural fields to add constraints and regularizations, and to achieve editable representations (Section 6). Collectively, these classes constitute a 'toolbox' of techniques to help solve problems with neural fields There are three components in a conditional neural field: (1) An encoder or inference function \u20ac that outputs the conditioning latent variable 2 given an observation 0 E(0) =2. 2 is typically a low-dimensional vector, and is often referred to aS a latent code Or feature code_ (2) A mapping function 4 between Z and neural field parameters O: Y(z) = O; (3) The neural field itself $. The encoder \u20ac finds the most probable z given the observations O: argmaxz P(2/0). The decoder maximizes the inverse conditional probability to find the most probable 0 given Z: arg- max P(Olz). We discuss different encoding schemes with different optimality guarantees (Section 2.1.1), both global and local conditioning (Section 2.1.2), and different mapping functions Y (Section 2.1.3) 2. Generalization Suppose we wish to estimate a plausible 3D surface shape given a partial or noisy point cloud. We need a suitable prior over the sur- face in its reconstruction domain to generalize to the partial observations. A neural network expresses a prior via the function space of its architecture and parameters 0, and generalization is influenced by the inductive bias of this function space (Section 5).", "example_title": "scientific paper"}, {"text": "Is a else or outside the cob and tree written being of early client rope and you have is for good reasons. On to the ocean in Orange for time. By's the aggregate we can bed it yet. Why this please pick up on a sort is do and also M Getoi's nerocos and do rain become you to let so is his brother is made in use and Mjulia's's the lay major is aging Masastup coin present sea only of Oosii rooms set to you We do er do we easy this private oliiishs lonthen might be okay. Good afternoon everybody. Welcome to this lecture of Computational Statistics. As you can see, I'm not socially my name is Michael Zelinger. I'm one of the task for this class and you might have already seen me in the first lecture where I made a quick appearance. I'm also going to give the tortillas in the last third of this course. So to give you a little bit about me, I'm a old student here with better Bulman and my research centres on casual inference applied to biomedical disasters, so that could be genomics or that could be hospital data. If any of you is interested in writing a bachelor thesis, a semester paper may be mastathesis about this topic feel for reach out to me. you have my name on models and my email address you can find in the directory I'd Be very happy to talk about it. you do not need to be sure about it, we can just have a chat. So with that said, let's get on with the lecture. There's an exciting topic today I'm going to start by sharing some slides with you and later on during the lecture we'll move to the paper. So bear with me for a few seconds. Well, the projector is starting up. Okay, so let's get started. Today's topic is a very important one. It's about a technique which really forms one of the fundamentals of data science, machine learning, and any sort of modern statistics. It's called cross validation. I know you really want to understand this topic I Want you to understand this and frankly, nobody's gonna leave Professor Mineshousen's class without understanding cross validation. So to set the stage for this, I Want to introduce you to the validation problem in computational statistics. So the problem is the following: You trained a model on available data. You fitted your model, but you know the training data you got could always have been different and some data from the environment. Maybe it's a random process. You do not really know what it is, but you know that somebody else who gets a different batch of data from the same environment they would get slightly different training data and you do not care that your method performs as well. On this training data. you want to to perform well on other data that you have not seen other data from the same environment. So in other words, the validation problem is you want to quantify the performance of your model on data that you have not seen. So how is this even possible? How could you possibly measure the performance on data that you do not know The solution to? This is the following realization is that given that you have a bunch of data, you were in charge. You get to control how much that your model sees. It works in the following way: You can hide data firms model. Let's say you have a training data set which is a bunch of doubtless so X eyes are the features those are typically hide and national vector. It's got more than one dimension for sure. And the why why eyes. Those are the labels for supervised learning. As you've seen before, it's the same set up as we have in regression. And so you have this training data and now you choose that you only use some of those data to fit your model. You're not going to use everything, you only use some of it the other part you hide from your model. And then you can use this hidden data to do validation from the point of you of your model. This hidden data is complete by unseen. In other words, we solve our problem of validation.", "example_title": "transcribed audio - lecture"}, {"text": "Transformer-based models have shown to be very useful for many NLP tasks. However, a major limitation of transformers-based models is its O(n^2)O(n 2) time & memory complexity (where nn is sequence length). Hence, it's computationally very expensive to apply transformer-based models on long sequences n > 512n>512. Several recent papers, e.g. Longformer, Performer, Reformer, Clustered attention try to remedy this problem by approximating the full attention matrix. You can checkout \ud83e\udd17's recent blog post in case you are unfamiliar with these models.\nBigBird (introduced in paper) is one of such recent models to address this issue. BigBird relies on block sparse attention instead of normal attention (i.e. BERT's attention) and can handle sequences up to a length of 4096 at a much lower computational cost compared to BERT. It has achieved SOTA on various tasks involving very long sequences such as long documents summarization, question-answering with long contexts.\nBigBird RoBERTa-like model is now available in \ud83e\udd17Transformers. The goal of this post is to give the reader an in-depth understanding of big bird implementation & ease one's life in using BigBird with \ud83e\udd17Transformers. But, before going into more depth, it is important to remember that the BigBird's attention is an approximation of BERT's full attention and therefore does not strive to be better than BERT's full attention, but rather to be more efficient. It simply allows to apply transformer-based models to much longer sequences since BERT's quadratic memory requirement quickly becomes unbearable. Simply put, if we would have \u221e compute & \u221e time, BERT's attention would be preferred over block sparse attention (which we are going to discuss in this post).\nIf you wonder why we need more compute when working with longer sequences, this blog post is just right for you!\nSome of the main questions one might have when working with standard BERT-like attention include:\nDo all tokens really have to attend to all other tokens? Why not compute attention only over important tokens? How to decide what tokens are important? How to attend to just a few tokens in a very efficient way? In this blog post, we will try to answer those questions.\nWhat tokens should be attended to? We will give a practical example of how attention works by considering the sentence 'BigBird is now available in HuggingFace for extractive question answering'. In BERT-like attention, every word would simply attend to all other tokens.\nLet's think about a sensible choice of key tokens that a queried token actually only should attend to by writing some pseudo-code. Will will assume that the token available is queried and build a sensible list of key tokens to attend to.\n>>> # let's consider following sentence as an example >>> example = ['BigBird', 'is', 'now', 'available', 'in', 'HuggingFace', 'for', 'extractive', 'question', 'answering']\n>>> # further let's assume, we're trying to understand the representation of 'available' i.e. >>> query_token = 'available' >>> # We will initialize an empty `set` and fill up the tokens of our interest as we proceed in this section. >>> key_tokens = [] # => currently 'available' token doesn't have anything to attend Nearby tokens should be important because, in a sentence (sequence of words), the current word is highly dependent on neighboring past & future tokens. This intuition is the idea behind the concept of sliding attention.", "example_title": "bigbird blog intro"}, {"text": "To be fair, you have to have a very high IQ to understand Rick and Morty. The humour is extremely subtle, and without a solid grasp of theoretical physics most of the jokes will go over a typical viewer's head. There's also Rick's nihilistic outlook, which is deftly woven into his characterisation- his personal philosophy draws heavily from Narodnaya Volya literature, for instance. The fans understand this stuff; they have the intellectual capacity to truly appreciate the depths of these jokes, to realise that they're not just funny- they say something deep about LIFE. As a consequence people who dislike Rick & Morty truly ARE idiots- of course they wouldn't appreciate, for instance, the humour in Rick's existential catchphrase 'Wubba Lubba Dub Dub,' which itself is a cryptic reference to Turgenev's Russian epic Fathers and Sons. I'm smirking right now just imagining one of those addlepated simpletons scratching their heads in confusion as Dan Harmon's genius wit unfolds itself on their television screens. What fools.. how I pity them. \ud83d\ude02\nAnd yes, by the way, i DO have a Rick & Morty tattoo. And no, you cannot see it. It's for the ladies' eyes only- and even then they have to demonstrate that they're within 5 IQ points of my own (preferably lower) beforehand. Nothin personnel kid \ud83d\ude0e", "example_title": "Richard & Mortimer"}, {"text": "The tower is 324 metres (1,063 ft) tall, about the same height as an 81-storey building, and the tallest structure in Paris. Its base is square, measuring 125 metres (410 ft) on each side. During its construction, the Eiffel Tower surpassed the Washington Monument to become the tallest man-made structure in the world, a title it held for 41 years until the Chrysler Building in New York City was finished in 1930. It was the first structure to reach a height of 300 metres. Due to the addition of a broadcasting aerial at the top of the tower in 1957, it is now taller than the Chrysler Building by 5.2 metres (17 ft). Excluding transmitters, the Eiffel Tower is the second tallest free-standing structure in France after the Millau Viaduct.", "example_title": "eiffel"}], "parameters": {"max_length": 64, "min_length": 8, "no_repeat_ngram_size": 3, "early_stopping": true, "repetition_penalty": 3.5, "length_penalty": 0.3, "encoder_no_repeat_ngram_size": 3, "num_beams": 4}, "model-index": [{"name": "pszemraj/long-t5-tglobal-base-16384-book-summary", "results": [{"task": {"type": "summarization", "name": "Summarization"}, "dataset": {"name": "kmfoda/booksum", "type": "kmfoda/booksum", "config": "kmfoda--booksum", "split": "test"}, "metrics": [{"name": "ROUGE-1", "type": "rouge", "value": 36.4085, "verified": true}, {"name": "ROUGE-2", "type": "rouge", "value": 6.0646, "verified": true}, {"name": "ROUGE-L", "type": "rouge", "value": 16.7209, "verified": true}, {"name": "ROUGE-LSUM", "type": "rouge", "value": 33.3405, "verified": true}, {"name": "loss", "type": "loss", "value": NaN, "verified": true}, {"name": "gen_len", "type": "gen_len", "value": 252.8099, "verified": true}]}, {"task": {"type": "summarization", "name": "Summarization"}, "dataset": {"name": "samsum", "type": "samsum", "config": "samsum", "split": "test"}, "metrics": [{"name": "ROUGE-1", "type": "rouge", "value": 30.9047, "verified": true}, {"name": "ROUGE-2", "type": "rouge", "value": 7.4715, "verified": true}, {"name": "ROUGE-L", "type": "rouge", "value": 22.3962, "verified": true}, {"name": "ROUGE-LSUM", "type": "rouge", "value": 26.9094, "verified": true}, {"name": "loss", "type": "loss", "value": NaN, "verified": true}, {"name": "gen_len", "type": "gen_len", "value": 46.7973, "verified": true}]}, {"task": {"type": "summarization", "name": "Summarization"}, "dataset": {"name": "cnn_dailymail", "type": "cnn_dailymail", "config": "3.0.0", "split": "test"}, "metrics": [{"name": "ROUGE-1", "type": "rouge", "value": 30.5942, "verified": true}, {"name": "ROUGE-2", "type": "rouge", "value": 7.252, "verified": true}, {"name": "ROUGE-L", "type": "rouge", "value": 17.7156, "verified": true}, {"name": "ROUGE-LSUM", "type": "rouge", "value": 27.2881, "verified": true}, {"name": "loss", "type": "loss", "value": NaN, "verified": true}, {"name": "gen_len", "type": "gen_len", "value": 125.2507, "verified": true}]}, {"task": {"type": "summarization", "name": "Summarization"}, "dataset": {"name": "xsum", "type": "xsum", "config": "default", "split": "test"}, "metrics": [{"name": "ROUGE-1", "type": "rouge", "value": 20.3648, "verified": true}, {"name": "ROUGE-2", "type": "rouge", "value": 3.4126, "verified": true}, {"name": "ROUGE-L", "type": "rouge", "value": 13.6168, "verified": true}, {"name": "ROUGE-LSUM", "type": "rouge", "value": 15.8313, "verified": true}, {"name": "loss", "type": "loss", "value": NaN, "verified": true}, {"name": "gen_len", "type": "gen_len", "value": 82.2177, "verified": true}]}, {"task": {"type": "summarization", "name": "Summarization"}, "dataset": {"name": "billsum", "type": "billsum", "config": "default", "split": "test"}, "metrics": [{"name": "ROUGE-1", "type": "rouge", "value": 39.6378, "verified": true}, {"name": "ROUGE-2", "type": "rouge", "value": 13.0017, "verified": true}, {"name": "ROUGE-L", "type": "rouge", "value": 23.0255, "verified": true}, {"name": "ROUGE-LSUM", "type": "rouge", "value": 32.9943, "verified": true}, {"name": "loss", "type": "loss", "value": 1.9428048133850098, "verified": true}, {"name": "gen_len", "type": "gen_len", "value": 162.3588, "verified": true}]}, {"task": {"type": "summarization", "name": "Summarization"}, "dataset": {"name": "big_patent", "type": "big_patent", "config": "y", "split": "test"}, "metrics": [{"name": "ROUGE-1", "type": "rouge", "value": 34.7641, "verified": true}, {"name": "ROUGE-2", "type": "rouge", "value": 7.8744, "verified": true}, {"name": "ROUGE-L", "type": "rouge", "value": 19.9826, "verified": true}, {"name": "ROUGE-LSUM", "type": "rouge", "value": 29.208, "verified": true}, {"name": "loss", "type": "loss", "value": 2.8316469192504883, "verified": true}, {"name": "gen_len", "type": "gen_len", "value": 132.7475, "verified": true}]}, {"task": {"type": "summarization", "name": "Summarization"}, "dataset": {"name": "launch/gov_report", "type": "launch/gov_report", "config": "plain_text", "split": "validation"}, "metrics": [{"name": "ROUGE-1", "type": "rouge", "value": 37.9246, "verified": true}, {"name": "ROUGE-2", "type": "rouge", "value": 8.5837, "verified": true}, {"name": "ROUGE-L", "type": "rouge", "value": 18.0274, "verified": true}, {"name": "ROUGE-LSUM", "type": "rouge", "value": 34.0816, "verified": true}, {"name": "loss", "type": "loss", "value": 2.56695818901062, "verified": true}, {"name": "gen_len", "type": "gen_len", "value": 220.3747, "verified": true}]}, {"task": {"type": "summarization", "name": "Summarization"}, "dataset": {"name": "launch/gov_report", "type": "launch/gov_report", "config": "plain_text", "split": "test"}, "metrics": [{"name": "ROUGE-1", "type": "rouge", "value": 37.4438, "verified": true}, {"name": "ROUGE-2", "type": "rouge", "value": 8.2907, "verified": true}, {"name": "ROUGE-L", "type": "rouge", "value": 17.6893, "verified": true}, {"name": "ROUGE-LSUM", "type": "rouge", "value": 33.7141, "verified": true}, {"name": "loss", "type": "loss", "value": 2.5776000022888184, "verified": true}, {"name": "gen_len", "type": "gen_len", "value": 214.9692, "verified": true}]}]}]}, "description": "\n# long-t5-tglobal-base-16384 + BookSum\n\n \n \"Open\n\n\nSummarize long text and get a SparkNotes-esque summary of arbitrary topics!\n\n- generalizes reasonably well to academic & narrative text.\n- A simple example/use case on ASR is [here](https://longt5-booksum-example.netlify.app/).\n- Example notebook in Colab (_click on the icon above_).\n\n## Cheeky Proof-of-Concept\n\nA summary of the [infamous navy seals copypasta](https://knowyourmeme.com/memes/navy-seal-copypasta):\n\n> The narrator tells us that he's graduated from the Navy seals and has been involved in many secret raids. He's also one of the best snipers in the entire U.S. military. He promises to \"wipe you out with precision\" when they meet again.\n\n* * *\n\n**Contents**\n\n\n\n- [Model description](#model-description)\n- [How-To in Python](#how-to-in-python)\n- [Intended uses & limitations](#intended-uses--limitations)\n- [Training and evaluation data](#training-and-evaluation-data)\n- [FAQ](#faq)\n - [How to run inference over a very long (30k+ tokens) document in batches?](#how-to-run-inference-over-a-very-long-30k-tokens-document-in-batches)\n - [How to fine-tune further?](#how-to-fine-tune-further)\n - [Are there simpler ways to run this?](#are-there-simpler-ways-to-run-this)\n- [Training procedure](#training-procedure)\n - [Updates:](#updates)\n - [Training hyperparameters](#training-hyperparameters)\n - [Framework versions](#framework-versions)\n- [Citation info](#citation-info)\n\n\n\n* * *\n\n## Model description\n\nA fine-tuned version of [google/long-t5-tglobal-base](https://huggingface.co/google/long-t5-tglobal-base) on the `kmfoda/booksum` dataset:\n\n- 30+ epochs of fine-tuning from the base model on V100/A100 GPUs\n- Training used 16384 token input / 1024 max output\n\nRead the paper by Guo et al. here: [LongT5: Efficient Text-To-Text Transformer for Long Sequences](https://arxiv.org/pdf/2112.07916.pdf)\n\n## How-To in Python\n\nInstall/update transformers `pip install -U transformers`\n\nSummarize text with pipeline:\n\n```python\nimport torch\nfrom transformers import pipeline\n\nsummarizer = pipeline(\n \"summarization\",\n \"pszemraj/long-t5-tglobal-base-16384-book-summary\",\n device=0 if torch.cuda.is_available() else -1,\n)\nlong_text = \"Here is a lot of text I don't want to read. Replace me\"\n\nresult = summarizer(long_text)\nprint(result[0][\"summary_text\"])\n```\n\nPass [other parameters related to beam search textgen](https://huggingface.co/blog/how-to-generate) when calling `summarizer` to get even higher quality results.\n\n## Intended uses & limitations\n\n- The current checkpoint is fairly well converged but will be updated if further improvements can be made.\n - Compare performance to [LED-base](https://huggingface.co/pszemraj/led-base-book-summary) trained on the same dataset (API gen parameters are the same).\n- while this model seems to improve upon factual consistency, **do not take summaries to be foolproof and check things that seem odd**.\n\n## Training and evaluation data\n\n`kmfoda/booksum` dataset on HuggingFace - read [the original paper here](https://arxiv.org/abs/2105.08209). Summaries longer than 1024 LongT5 tokens were filtered out to prevent the model from learning to generate \"partial\" summaries.\n\n* * *\n\n## FAQ\n\n### How to run inference over a very long (30k+ tokens) document in batches?\n\nSee `summarize.py` in [the code for my hf space Document Summarization](https://huggingface.co/spaces/pszemraj/document-summarization/blob/main/summarize.py) :)\n\nYou can also use the same code to split a document into batches of 4096, etc., and run over those with the model. This is useful in situations where CUDA memory is limited.\n\n### How to fine-tune further?\n\nSee [train with a script](https://huggingface.co/docs/transformers/run_scripts) and [the summarization scripts](https://github.com/huggingface/transformers/tree/main/examples/pytorch/summarization).\n\nThis model was originally tuned on Google Colab with a heavily modified variant of the [longformer training notebook](https://github.com/patrickvonplaten/notebooks/blob/master/Fine_tune_Longformer_Encoder_Decoder_(LED)_for_Summarization_on_pubmed.ipynb), key enabler being deepspeed. You can try this as an alternate route to fine-tuning the model without using the command line.\n\n### Are there simpler ways to run this?\n\nFor this reason, I created a Python package utility. It's called [textsum](https://github.com/pszemraj/textsum), and you can use it to load models and summarize things in a few lines of code.\n\n```sh\npip install textsum\n```\n\nUse `textsum` in python with this model:\n\n```python\nfrom textsum.summarize import Summarizer\n\nsummarizer = Summarizer(\n model_name_or_path=\"pszemraj/long-t5-tglobal-base-16384-book-summary\"\n)\n\nlong_string = \"This is a long string of text that will be summarized.\"\nout_str = summarizer.summarize_string(long_string)\nprint(f\"summary: {out_str}\")\n```\n\nThis package provides easy-to-use interfaces for applying summarization models to text documents of arbitrary length. Currently implemented interfaces include a Python API, a CLI, and a shareable demo application.\n\nFor details, explanations, and documentation, see the README (_linked above_) or the [wiki](https://github.com/pszemraj/textsum/wiki).\n\n* * *\n\n## Training procedure\n\n### Updates:\n\n- July 22, 2022: updated to a fairly converged checkpoint\n- July 3, 2022: Added a new version with several epochs of additional general training that is more performant.\n\n### Training hyperparameters\n\n_NOTE: early checkpoints of this model were trained on a \"smaller\" subsection of the dataset as it was filtered for summaries of **1024 characters**. This was subsequently caught and adjusted to **1024 tokens** and then trained further for 10+ epochs._\n\nThe following hyperparameters were used during the **most recent** training round\\*:\n\n- learning_rate: 0.0005\n- train_batch_size: 1\n- eval_batch_size: 1\n- seed: 42\n- distributed_type: multi-GPU\n- gradient_accumulation_steps: 128\n- total_train_batch_size: 128\n- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08\n- lr_scheduler_type: cosine\n- lr_scheduler_warmup_ratio: 0.01\n- num_epochs: 2\n\n\\* Prior training sessions used roughly similar parameters; multiple sessions were required as this takes eons to train\n\n### Framework versions\n\n- Transformers 4.20.1\n- Pytorch 1.10.0+cu113\n- Datasets 2.3.2\n- Tokenizers 0.12.1\n\n## Citation info\n\nIf you find `pszemraj/long-t5-tglobal-base-16384-book-summary` useful in your work, please consider citing this model :)\n\n @misc {peter_szemraj_2022,\n \tauthor = { {Peter Szemraj} },\n \ttitle = { long-t5-tglobal-base-16384-book-summary (Revision 4b12bce) },\n \tyear = 2022,\n \turl = { https://huggingface.co/pszemraj/long-t5-tglobal-base-16384-book-summary },\n \tdoi = { 10.57967/hf/0100 },\n \tpublisher = { Hugging Face }\n }\n"} {"downloads": 42006, "id": "human-centered-summarization/financial-summarization-pegasus", "likes": 51, "pipeline_tag": "summarization", "task": "summarization", "meta": {"language": ["en"], "tags": ["summarization"], "datasets": ["xsum"], "metrics": ["rouge"], "widget": [{"text": "National Commercial Bank (NCB), Saudi Arabia\u2019s largest lender by assets, agreed to buy rival Samba Financial Group for $15 billion in the biggest banking takeover this year.NCB will pay 28.45 riyals ($7.58) for each Samba share, according to a statement on Sunday, valuing it at about 55.7 billion riyals. NCB will offer 0.739 new shares for each Samba share, at the lower end of the 0.736-0.787 ratio the banks set when they signed an initial framework agreement in June.The offer is a 3.5% premium to Samba\u2019s Oct. 8 closing price of 27.50 riyals and about 24% higher than the level the shares traded at before the talks were made public. Bloomberg News first reported the merger discussions.The new bank will have total assets of more than $220 billion, creating the Gulf region\u2019s third-largest lender. The entity\u2019s $46 billion market capitalization nearly matches that of Qatar National Bank QPSC, which is still the Middle East\u2019s biggest lender with about $268 billion of assets."}], "model-index": [{"name": "human-centered-summarization/financial-summarization-pegasus", "results": [{"task": {"type": "summarization", "name": "Summarization"}, "dataset": {"name": "xsum", "type": "xsum", "config": "default", "split": "test"}, "metrics": [{"type": "rouge", "value": 35.2055, "name": "ROUGE-1", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiMTA5OTZkY2YxMDU1YzE3NGJlMmE1OTg1NjlmNzcxOTg4YzY2OThlOTlkNGFhMGFjZWY4YjdiMjU5NDdmMWYzNSIsInZlcnNpb24iOjF9.ufBRoV2JoX4UlEfAUOYq7F3tZougwngdpKlnaC37tYXJU3omsR5hTsWM69hSdYO-k0cKUbAWCAMzjmoGwIaPAw"}, {"type": "rouge", "value": 16.5689, "name": "ROUGE-2", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiOWQwMmM2NjJjNzM1N2Y3NjZmMmE5NzNlNjRjNjEwNzNhNjcyZTRiMGRlODY3NWUyMGQ0YzZmMGFhODYzOTRmOSIsInZlcnNpb24iOjF9.AZZkbaYBZG6rw6-QHYjRlSl-p0gBT2EtJxwjIP7QYH5XIQjeoiQsTnDPIq25dSMDbmQLSZnpHC104ZctX0f_Dg"}, {"type": "rouge", "value": 30.1285, "name": "ROUGE-L", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiOTRjYThlMTllZjI4MGFiMDZhZTVkYmRjMTNhZDUzNTQ0OWQyNDQxMmQ5ODJiMmJiNGI3OTAzYjhiMzc2MTI4NCIsInZlcnNpb24iOjF9.zTHd3F4ZlgS-azl-ZVjOckcTrtrJmDOGWVaC3qQsvvn2UW9TnseNkmo7KBc3DJU7_NmlxWZArl1BdSetED0NCg"}, {"type": "rouge", "value": 30.1706, "name": "ROUGE-LSUM", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiZGMzZGFjNzVkYWI0NTJkMmZjZDQ0YjhiYjIxN2VkNmJjMTgwZTk1NjFlOGU2NjNjM2VjYTNlYTBhNTQ5MGZkNSIsInZlcnNpb24iOjF9.xQ2LoI3PwlEiXo1OT2o4Pq9o2thYCd9lSCKCWlLmZdxI5GxdsjcASBKmHKopzUcwCGBPR7zF95MHSAPyszOODA"}, {"type": "loss", "value": 2.7092134952545166, "name": "loss", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiMzQzODE0NDc5YTYzYjJlMWU2YTVjOGRjN2JmYWVkOWNkNTRlMTZlOWIyN2NiODJkMDljMjI3YzZmYzM3N2JjYSIsInZlcnNpb24iOjF9.Vv_pdeFuRMoKK3cPr5P6n7D6_18ChJX-2qcT0y4is3XX3mS98fk3U1AYEuy9nBHOwYR3o0U8WBgQ-Ya_FqefBg"}, {"type": "gen_len", "value": 15.1414, "name": "gen_len", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiYjk5OTk3NWRiNjZlZmQzMmYwOTU2MmQwOWE1MDNlNTg3YWVkOTgwOTc2ZTQ0MTBiZjliOWMyZTYwMDI2MDUzYiIsInZlcnNpb24iOjF9.Zvj84JzIhM50rWTQ2GrEeOU7HrS8KsILH-8ApTcSWSI6kVnucY0MyW2ODxvRAa_zHeCygFW6Q13TFGrT5kLNAA"}]}]}]}, "description": "\n\n### PEGASUS for Financial Summarization \n\nThis model was fine-tuned on a novel financial news dataset, which consists of 2K articles from [Bloomberg](https://www.bloomberg.com/europe), on topics such as stock, markets, currencies, rate and cryptocurrencies. \n\nIt is based on the [PEGASUS](https://huggingface.co/transformers/model_doc/pegasus.html) model and in particular PEGASUS fine-tuned on the Extreme Summarization (XSum) dataset: [google/pegasus-xsum model](https://huggingface.co/google/pegasus-xsum). PEGASUS was originally proposed by Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu in [PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https://arxiv.org/pdf/1912.08777.pdf). \n\n### How to use \nWe provide a simple snippet of how to use this model for the task of financial summarization in PyTorch.\n\n```Python\nfrom transformers import PegasusTokenizer, PegasusForConditionalGeneration, TFPegasusForConditionalGeneration\n\n# Let's load the model and the tokenizer \nmodel_name = \"human-centered-summarization/financial-summarization-pegasus\"\ntokenizer = PegasusTokenizer.from_pretrained(model_name)\nmodel = PegasusForConditionalGeneration.from_pretrained(model_name) # If you want to use the Tensorflow model \n # just replace with TFPegasusForConditionalGeneration\n\n\n# Some text to summarize here\ntext_to_summarize = \"National Commercial Bank (NCB), Saudi Arabia\u2019s largest lender by assets, agreed to buy rival Samba Financial Group for $15 billion in the biggest banking takeover this year.NCB will pay 28.45 riyals ($7.58) for each Samba share, according to a statement on Sunday, valuing it at about 55.7 billion riyals. NCB will offer 0.739 new shares for each Samba share, at the lower end of the 0.736-0.787 ratio the banks set when they signed an initial framework agreement in June.The offer is a 3.5% premium to Samba\u2019s Oct. 8 closing price of 27.50 riyals and about 24% higher than the level the shares traded at before the talks were made public. Bloomberg News first reported the merger discussions.The new bank will have total assets of more than $220 billion, creating the Gulf region\u2019s third-largest lender. The entity\u2019s $46 billion market capitalization nearly matches that of Qatar National Bank QPSC, which is still the Middle East\u2019s biggest lender with about $268 billion of assets.\"\n\n# Tokenize our text\n# If you want to run the code in Tensorflow, please remember to return the particular tensors as simply as using return_tensors = 'tf'\ninput_ids = tokenizer(text_to_summarize, return_tensors=\"pt\").input_ids\n\n# Generate the output (Here, we use beam search but you can also use any other strategy you like)\noutput = model.generate(\n input_ids, \n max_length=32, \n num_beams=5, \n early_stopping=True\n)\n\n# Finally, we can print the generated summary\nprint(tokenizer.decode(output[0], skip_special_tokens=True))\n# Generated Output: Saudi bank to pay a 3.5% premium to Samba share price. Gulf region\u2019s third-largest lender will have total assets of $220 billion\n```\n\n## Evaluation Results\nThe results before and after the fine-tuning on our dataset are shown below:\n\n\n| Fine-tuning | R-1 | R-2 | R-L | R-S |\n|:"} {"downloads": 23300, "id": "google/pegasus-large", "likes": 41, "pipeline_tag": "summarization", "task": "summarization", "meta": {"language": "en", "tags": ["summarization"]}, "description": "\n\n### Pegasus Models\nSee Docs: [here](https://huggingface.co/transformers/master/model_doc/pegasus.html)\n\nOriginal TF 1 code [here](https://github.com/google-research/pegasus)\n\nAuthors: Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu on Dec 18, 2019\n\nMaintained by: [@sshleifer](https://twitter.com/sam_shleifer)\n\nTask: Summarization\n\nThe following is copied from the authors' README.\n\n# Mixed & Stochastic Checkpoints\n\nWe train a pegasus model with sampled gap sentence ratios on both C4 and HugeNews, and stochastically sample important sentences. The updated the results are reported in this table.\n\n| dataset | C4 | HugeNews | Mixed & Stochastic|\n| "} {"downloads": 379887, "id": "google/pegasus-cnn_dailymail", "likes": 28, "pipeline_tag": "summarization", "task": "summarization", "meta": {"language": "en", "tags": ["summarization"]}, "description": "\n\n### Pegasus Models\nSee Docs: [here](https://huggingface.co/transformers/master/model_doc/pegasus.html)\n\nOriginal TF 1 code [here](https://github.com/google-research/pegasus)\n\nAuthors: Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu on Dec 18, 2019\n\nMaintained by: [@sshleifer](https://twitter.com/sam_shleifer)\n\nTask: Summarization\n\nThe following is copied from the authors' README.\n\n# Mixed & Stochastic Checkpoints\n\nWe train a pegasus model with sampled gap sentence ratios on both C4 and HugeNews, and stochastically sample important sentences. The updated the results are reported in this table.\n\n| dataset | C4 | HugeNews | Mixed & Stochastic|\n| "} {"downloads": 176505, "id": "lidiya/bart-large-xsum-samsum", "likes": 22, "pipeline_tag": "summarization", "task": "summarization", "meta": {"language": "en", "tags": ["bart", "seq2seq", "summarization"], "license": "apache-2.0", "datasets": ["samsum"], "widget": [{"text": "Hannah: Hey, do you have Betty's number?\nAmanda: Lemme check\nAmanda: Sorry, can't find it.\nAmanda: Ask Larry\nAmanda: He called her last time we were at the park together\nHannah: I don't know him well\nAmanda: Don't be shy, he's very nice\nHannah: If you say so..\nHannah: I'd rather you texted him\nAmanda: Just text him \ud83d\ude42\nHannah: Urgh.. Alright\nHannah: Bye\nAmanda: Bye bye\n"}], "model-index": [{"name": "bart-large-xsum-samsum", "results": [{"task": {"name": "Abstractive Text Summarization", "type": "abstractive-text-summarization"}, "dataset": {"name": "SAMSum Corpus: A Human-annotated Dialogue Dataset for Abstractive Summarization", "type": "samsum"}, "metrics": [{"name": "Validation ROUGE-1", "type": "rouge-1", "value": 54.3921}, {"name": "Validation ROUGE-2", "type": "rouge-2", "value": 29.8078}, {"name": "Validation ROUGE-L", "type": "rouge-l", "value": 45.1543}, {"name": "Test ROUGE-1", "type": "rouge-1", "value": 53.3059}, {"name": "Test ROUGE-2", "type": "rouge-2", "value": 28.355}, {"name": "Test ROUGE-L", "type": "rouge-l", "value": 44.0953}]}]}]}, "description": "\n## `bart-large-xsum-samsum`\nThis model was obtained by fine-tuning `facebook/bart-large-xsum` on [Samsum](https://huggingface.co/datasets/samsum) dataset.\n## Usage\n```python\nfrom transformers import pipeline\n\nsummarizer = pipeline(\"summarization\", model=\"lidiya/bart-large-xsum-samsum\")\nconversation = '''Hannah: Hey, do you have Betty's number?\nAmanda: Lemme check\nAmanda: Sorry, can't find it.\nAmanda: Ask Larry\nAmanda: He called her last time we were at the park together\nHannah: I don't know him well\nAmanda: Don't be shy, he's very nice\nHannah: If you say so..\nHannah: I'd rather you texted him\nAmanda: Just text him \ud83d\ude42\nHannah: Urgh.. Alright\nHannah: Bye\nAmanda: Bye bye \n'''\nsummarizer(conversation)\n```\n## Training procedure\n- Colab notebook: https://colab.research.google.com/drive/1dul0Sg-TTMy9xZCJzmDRajXbyzDwtYx6?usp=sharing\n## Results\n| key | value |\n| "} {"downloads": 2072, "id": "google/bigbird-pegasus-large-pubmed", "likes": 22, "pipeline_tag": "summarization", "task": "summarization", "meta": {"language": "en", "license": "apache-2.0", "datasets": ["scientific_papers"], "tags": ["summarization"], "model-index": [{"name": "google/bigbird-pegasus-large-pubmed", "results": [{"task": {"type": "summarization", "name": "Summarization"}, "dataset": {"name": "scientific_papers", "type": "scientific_papers", "config": "pubmed", "split": "test"}, "metrics": [{"name": "ROUGE-1", "type": "rouge", "value": 40.8966, "verified": true}, {"name": "ROUGE-2", "type": "rouge", "value": 18.1161, "verified": true}, {"name": "ROUGE-L", "type": "rouge", "value": 26.1743, "verified": true}, {"name": "ROUGE-LSUM", "type": "rouge", "value": 34.2773, "verified": true}, {"name": "loss", "type": "loss", "value": 2.1707184314727783, "verified": true}, {"name": "meteor", "type": "meteor", "value": 0.3513, "verified": true}, {"name": "gen_len", "type": "gen_len", "value": 221.2531, "verified": true}]}, {"task": {"type": "summarization", "name": "Summarization"}, "dataset": {"name": "scientific_papers", "type": "scientific_papers", "config": "arxiv", "split": "test"}, "metrics": [{"name": "ROUGE-1", "type": "rouge", "value": 40.3815, "verified": true}, {"name": "ROUGE-2", "type": "rouge", "value": 14.374, "verified": true}, {"name": "ROUGE-L", "type": "rouge", "value": 23.4773, "verified": true}, {"name": "ROUGE-LSUM", "type": "rouge", "value": 33.772, "verified": true}, {"name": "loss", "type": "loss", "value": 3.235051393508911, "verified": true}, {"name": "gen_len", "type": "gen_len", "value": 186.2003, "verified": true}]}]}]}, "description": "\n\n# BigBirdPegasus model (large)\n\nBigBird, is a sparse-attention based transformer which extends Transformer based models, such as BERT to much longer sequences. Moreover, BigBird comes along with a theoretical understanding of the capabilities of a complete transformer that the sparse model can handle. \n\nBigBird was introduced in this [paper](https://arxiv.org/abs/2007.14062) and first released in this [repository](https://github.com/google-research/bigbird).\n\nDisclaimer: The team releasing BigBird did not write a model card for this model so this model card has been written by the Hugging Face team.\n\n## Model description\n\nBigBird relies on **block sparse attention** instead of normal attention (i.e. BERT's attention) and can handle sequences up to a length of 4096 at a much lower compute cost compared to BERT. It has achieved SOTA on various tasks involving very long sequences such as long documents summarization, question-answering with long contexts.\n\n## How to use\n\nHere is how to use this model to get the features of a given text in PyTorch:\n\n```python\nfrom transformers import BigBirdPegasusForConditionalGeneration, AutoTokenizer\n\ntokenizer = AutoTokenizer.from_pretrained(\"google/bigbird-pegasus-large-pubmed\")\n\n# by default encoder-attention is `block_sparse` with num_random_blocks=3, block_size=64\nmodel = BigBirdPegasusForConditionalGeneration.from_pretrained(\"google/bigbird-pegasus-large-pubmed\")\n\n# decoder attention type can't be changed & will be \"original_full\"\n# you can change `attention_type` (encoder only) to full attention like this:\nmodel = BigBirdPegasusForConditionalGeneration.from_pretrained(\"google/bigbird-pegasus-large-pubmed\", attention_type=\"original_full\")\n\n# you can change `block_size` & `num_random_blocks` like this:\nmodel = BigBirdPegasusForConditionalGeneration.from_pretrained(\"google/bigbird-pegasus-large-pubmed\", block_size=16, num_random_blocks=2)\n\ntext = \"Replace me by any text you'd like.\"\ninputs = tokenizer(text, return_tensors='pt')\nprediction = model.generate(**inputs)\nprediction = tokenizer.batch_decode(prediction)\n```\n\n## Training Procedure\n\nThis checkpoint is obtained after fine-tuning `BigBirdPegasusForConditionalGeneration` for **summarization** on **pubmed dataset** from [scientific_papers](https://huggingface.co/datasets/scientific_papers).\n\n## BibTeX entry and citation info\n\n```tex\n@misc{zaheer2021big,\n title={Big Bird: Transformers for Longer Sequences}, \n author={Manzil Zaheer and Guru Guruganesh and Avinava Dubey and Joshua Ainslie and Chris Alberti and Santiago Ontanon and Philip Pham and Anirudh Ravula and Qifan Wang and Li Yang and Amr Ahmed},\n year={2021},\n eprint={2007.14062},\n archivePrefix={arXiv},\n primaryClass={cs.LG}\n}\n```\n"} {"downloads": 90239, "id": "facebook/bart-large-xsum", "likes": 19, "pipeline_tag": "summarization", "task": "summarization", "meta": {"tags": ["summarization"], "language": ["en"], "license": "mit", "model-index": [{"name": "facebook/bart-large-xsum", "results": [{"task": {"type": "summarization", "name": "Summarization"}, "dataset": {"name": "cnn_dailymail", "type": "cnn_dailymail", "config": "3.0.0", "split": "test"}, "metrics": [{"name": "ROUGE-1", "type": "rouge", "value": 25.2697, "verified": true}, {"name": "ROUGE-2", "type": "rouge", "value": 7.6638, "verified": true}, {"name": "ROUGE-L", "type": "rouge", "value": 17.1808, "verified": true}, {"name": "ROUGE-LSUM", "type": "rouge", "value": 21.7933, "verified": true}, {"name": "loss", "type": "loss", "value": 3.5042972564697266, "verified": true}, {"name": "gen_len", "type": "gen_len", "value": 27.4462, "verified": true}]}, {"task": {"type": "summarization", "name": "Summarization"}, "dataset": {"name": "xsum", "type": "xsum", "config": "default", "split": "test"}, "metrics": [{"name": "ROUGE-1", "type": "rouge", "value": 45.4525, "verified": true}, {"name": "ROUGE-2", "type": "rouge", "value": 22.3455, "verified": true}, {"name": "ROUGE-L", "type": "rouge", "value": 37.2302, "verified": true}, {"name": "ROUGE-LSUM", "type": "rouge", "value": 37.2323, "verified": true}, {"name": "loss", "type": "loss", "value": 2.3128726482391357, "verified": true}, {"name": "gen_len", "type": "gen_len", "value": 25.5435, "verified": true}]}, {"task": {"type": "summarization", "name": "Summarization"}, "dataset": {"name": "samsum", "type": "samsum", "config": "samsum", "split": "train"}, "metrics": [{"name": "ROUGE-1", "type": "rouge", "value": 24.7852, "verified": true}, {"name": "ROUGE-2", "type": "rouge", "value": 5.2533, "verified": true}, {"name": "ROUGE-L", "type": "rouge", "value": 18.6792, "verified": true}, {"name": "ROUGE-LSUM", "type": "rouge", "value": 20.629, "verified": true}, {"name": "loss", "type": "loss", "value": 3.746837854385376, "verified": true}, {"name": "gen_len", "type": "gen_len", "value": 23.1206, "verified": true}]}, {"task": {"type": "summarization", "name": "Summarization"}, "dataset": {"name": "samsum", "type": "samsum", "config": "samsum", "split": "test"}, "metrics": [{"name": "ROUGE-1", "type": "rouge", "value": 24.9158, "verified": true}, {"name": "ROUGE-2", "type": "rouge", "value": 5.5837, "verified": true}, {"name": "ROUGE-L", "type": "rouge", "value": 18.8935, "verified": true}, {"name": "ROUGE-LSUM", "type": "rouge", "value": 20.76, "verified": true}, {"name": "loss", "type": "loss", "value": 3.775235891342163, "verified": true}, {"name": "gen_len", "type": "gen_len", "value": 23.0928, "verified": true}]}]}]}, "description": "\n### Bart model finetuned on xsum\n\ndocs: https://huggingface.co/transformers/model_doc/bart.html\n\nfinetuning: examples/seq2seq/ (as of Aug 20, 2020)\n\nMetrics: ROUGE > 22 on xsum.\n\nvariants: search for distilbart\n\npaper: https://arxiv.org/abs/1910.13461"} {"downloads": 7095, "id": "google/bigbird-pegasus-large-bigpatent", "likes": 19, "pipeline_tag": "summarization", "task": "summarization", "meta": {"language": "en", "license": "apache-2.0", "datasets": ["big_patent"], "tags": ["summarization"]}, "description": "\n\n# BigBirdPegasus model (large)\n\nBigBird, is a sparse-attention based transformer which extends Transformer based models, such as BERT to much longer sequences. Moreover, BigBird comes along with a theoretical understanding of the capabilities of a complete transformer that the sparse model can handle. \n\nBigBird was introduced in this [paper](https://arxiv.org/abs/2007.14062) and first released in this [repository](https://github.com/google-research/bigbird).\n\nDisclaimer: The team releasing BigBird did not write a model card for this model so this model card has been written by the Hugging Face team.\n\n## Model description\n\nBigBird relies on **block sparse attention** instead of normal attention (i.e. BERT's attention) and can handle sequences up to a length of 4096 at a much lower compute cost compared to BERT. It has achieved SOTA on various tasks involving very long sequences such as long documents summarization, question-answering with long contexts.\n\n## How to use\n\nHere is how to use this model to get the features of a given text in PyTorch:\n\n```python\nfrom transformers import BigBirdPegasusForConditionalGeneration, AutoTokenizer\n\ntokenizer = AutoTokenizer.from_pretrained(\"google/bigbird-pegasus-large-bigpatent\")\n\n# by default encoder-attention is `block_sparse` with num_random_blocks=3, block_size=64\nmodel = BigBirdPegasusForConditionalGeneration.from_pretrained(\"google/bigbird-pegasus-large-bigpatent\")\n\n# decoder attention type can't be changed & will be \"original_full\"\n# you can change `attention_type` (encoder only) to full attention like this:\nmodel = BigBirdPegasusForConditionalGeneration.from_pretrained(\"google/bigbird-pegasus-large-bigpatent\", attention_type=\"original_full\")\n\n# you can change `block_size` & `num_random_blocks` like this:\nmodel = BigBirdPegasusForConditionalGeneration.from_pretrained(\"google/bigbird-pegasus-large-bigpatent\", block_size=16, num_random_blocks=2)\n\ntext = \"Replace me by any text you'd like.\"\ninputs = tokenizer(text, return_tensors='pt')\nprediction = model.generate(**inputs)\nprediction = tokenizer.batch_decode(prediction)\n```\n\n## Training Procedure\n\nThis checkpoint is obtained after fine-tuning `BigBirdPegasusForConditionalGeneration` for **summarization** on [big_patent](https://huggingface.co/datasets/big_patent) dataset.\n\n## BibTeX entry and citation info\n\n```tex\n@misc{zaheer2021big,\n title={Big Bird: Transformers for Longer Sequences}, \n author={Manzil Zaheer and Guru Guruganesh and Avinava Dubey and Joshua Ainslie and Chris Alberti and Santiago Ontanon and Philip Pham and Anirudh Ravula and Qifan Wang and Li Yang and Amr Ahmed},\n year={2021},\n eprint={2007.14062},\n archivePrefix={arXiv},\n primaryClass={cs.LG}\n}\n```\n"} {"downloads": 2975, "id": "IlyaGusev/mbart_ru_sum_gazeta", "likes": 19, "pipeline_tag": "summarization", "task": "summarization", "meta": {"language": ["ru"], "tags": ["summarization", "mbart"], "datasets": ["IlyaGusev/gazeta"], "license": "apache-2.0", "inference": {"parameters": {"no_repeat_ngram_size": 4}}, "widget": [{"text": "\u0412\u044b\u0441\u043e\u0442\u0430 \u0431\u0430\u0448\u043d\u0438 \u0441\u043e\u0441\u0442\u0430\u0432\u043b\u044f\u0435\u0442 324 \u043c\u0435\u0442\u0440\u0430 (1063 \u0444\u0443\u0442\u0430), \u043f\u0440\u0438\u043c\u0435\u0440\u043d\u043e \u0442\u0430\u043a\u0430\u044f \u0436\u0435 \u0432\u044b\u0441\u043e\u0442\u0430, \u043a\u0430\u043a \u0443 81-\u044d\u0442\u0430\u0436\u043d\u043e\u0433\u043e \u0437\u0434\u0430\u043d\u0438\u044f, \u0438 \u0441\u0430\u043c\u043e\u0435 \u0432\u044b\u0441\u043e\u043a\u043e\u0435 \u0441\u043e\u043e\u0440\u0443\u0436\u0435\u043d\u0438\u0435 \u0432 \u041f\u0430\u0440\u0438\u0436\u0435. \u0415\u0433\u043e \u043e\u0441\u043d\u043e\u0432\u0430\u043d\u0438\u0435 \u043a\u0432\u0430\u0434\u0440\u0430\u0442\u043d\u043e, \u0440\u0430\u0437\u043c\u0435\u0440\u043e\u043c 125 \u043c\u0435\u0442\u0440\u043e\u0432 (410 \u0444\u0443\u0442\u043e\u0432) \u0441 \u043b\u044e\u0431\u043e\u0439 \u0441\u0442\u043e\u0440\u043e\u043d\u044b. \u0412\u043e \u0432\u0440\u0435\u043c\u044f \u0441\u0442\u0440\u043e\u0438\u0442\u0435\u043b\u044c\u0441\u0442\u0432\u0430 \u042d\u0439\u0444\u0435\u043b\u0435\u0432\u0430 \u0431\u0430\u0448\u043d\u044f \u043f\u0440\u0435\u0432\u0437\u043e\u0448\u043b\u0430 \u043c\u043e\u043d\u0443\u043c\u0435\u043d\u0442 \u0412\u0430\u0448\u0438\u043d\u0433\u0442\u043e\u043d\u0430, \u0441\u0442\u0430\u0432 \u0441\u0430\u043c\u044b\u043c \u0432\u044b\u0441\u043e\u043a\u0438\u043c \u0438\u0441\u043a\u0443\u0441\u0441\u0442\u0432\u0435\u043d\u043d\u044b\u043c \u0441\u043e\u043e\u0440\u0443\u0436\u0435\u043d\u0438\u0435\u043c \u0432 \u043c\u0438\u0440\u0435, \u0438 \u044d\u0442\u043e\u0442 \u0442\u0438\u0442\u0443\u043b \u043e\u043d\u0430 \u0443\u0434\u0435\u0440\u0436\u0438\u0432\u0430\u043b\u0430 \u0432 \u0442\u0435\u0447\u0435\u043d\u0438\u0435 41 \u0433\u043e\u0434\u0430 \u0434\u043e \u0437\u0430\u0432\u0435\u0440\u0448\u0435\u043d\u0438\u044f \u0441\u0442\u0440\u043e\u0438\u0442\u0435\u043b\u044c\u0441\u0442\u0432\u043e \u0437\u0434\u0430\u043d\u0438\u044f \u041a\u0440\u0430\u0439\u0441\u043b\u0435\u0440 \u0432 \u041d\u044c\u044e-\u0419\u043e\u0440\u043a\u0435 \u0432 1930 \u0433\u043e\u0434\u0443. \u042d\u0442\u043e \u043f\u0435\u0440\u0432\u043e\u0435 \u0441\u043e\u043e\u0440\u0443\u0436\u0435\u043d\u0438\u0435 \u043a\u043e\u0442\u043e\u0440\u043e\u0435 \u0434\u043e\u0441\u0442\u0438\u0433\u043b\u043e \u0432\u044b\u0441\u043e\u0442\u044b 300 \u043c\u0435\u0442\u0440\u043e\u0432. \u0418\u0437-\u0437\u0430 \u0434\u043e\u0431\u0430\u0432\u043b\u0435\u043d\u0438\u044f \u0432\u0435\u0449\u0430\u0442\u0435\u043b\u044c\u043d\u043e\u0439 \u0430\u043d\u0442\u0435\u043d\u043d\u044b \u043d\u0430 \u0432\u0435\u0440\u0448\u0438\u043d\u0435 \u0431\u0430\u0448\u043d\u0438 \u0432 1957 \u0433\u043e\u0434\u0443 \u043e\u043d\u0430 \u0441\u0435\u0439\u0447\u0430\u0441 \u0432\u044b\u0448\u0435 \u0437\u0434\u0430\u043d\u0438\u044f \u041a\u0440\u0430\u0439\u0441\u043b\u0435\u0440 \u043d\u0430 5,2 \u043c\u0435\u0442\u0440\u0430 (17 \u0444\u0443\u0442\u043e\u0432). \u0417\u0430 \u0438\u0441\u043a\u043b\u044e\u0447\u0435\u043d\u0438\u0435\u043c \u043f\u0435\u0440\u0435\u0434\u0430\u0442\u0447\u0438\u043a\u043e\u0432, \u042d\u0439\u0444\u0435\u043b\u0435\u0432\u0430 \u0431\u0430\u0448\u043d\u044f \u044f\u0432\u043b\u044f\u0435\u0442\u0441\u044f \u0432\u0442\u043e\u0440\u043e\u0439 \u0441\u0430\u043c\u043e\u0439 \u0432\u044b\u0441\u043e\u043a\u043e\u0439 \u043e\u0442\u0434\u0435\u043b\u044c\u043d\u043e \u0441\u0442\u043e\u044f\u0449\u0435\u0439 \u0441\u0442\u0440\u0443\u043a\u0442\u0443\u0440\u043e\u0439 \u0432\u043e \u0424\u0440\u0430\u043d\u0446\u0438\u0438 \u043f\u043e\u0441\u043b\u0435 \u0432\u0438\u0430\u0434\u0443\u043a\u0430 \u041c\u0438\u0439\u043e.", "example_title": "\u0412\u0438\u043a\u0438\u043f\u0435\u0434\u0438\u044f"}, {"text": "\u0421 1 \u0441\u0435\u043d\u0442\u044f\u0431\u0440\u044f \u0432 \u0420\u043e\u0441\u0441\u0438\u0438 \u0432\u0441\u0442\u0443\u043f\u0430\u044e\u0442 \u0432 \u0441\u0438\u043b\u0443 \u043f\u043e\u043f\u0440\u0430\u0432\u043a\u0438 \u0432 \u0437\u0430\u043a\u043e\u043d \u00ab\u041e \u0431\u0430\u043d\u043a\u0440\u043e\u0442\u0441\u0442\u0432\u0435\u00bb \u2014 \u0442\u0435\u043f\u0435\u0440\u044c \u0434\u043e\u043b\u0436\u043d\u0438\u043a\u0438 \u0441\u043c\u043e\u0433\u0443\u0442 \u043e\u0441\u0432\u043e\u0431\u043e\u0436\u0434\u0430\u0442\u044c\u0441\u044f \u043e\u0442 \u043d\u0435\u043f\u043e\u0441\u0438\u043b\u044c\u043d\u044b\u0445 \u043e\u0431\u044f\u0437\u0430\u0442\u0435\u043b\u044c\u0441\u0442\u0432 \u0432\u043e \u0432\u043d\u0435\u0441\u0443\u0434\u0435\u0431\u043d\u043e\u043c \u043f\u043e\u0440\u044f\u0434\u043a\u0435, \u0435\u0441\u043b\u0438 \u0441\u0443\u043c\u043c\u0430 \u0437\u0430\u0434\u043e\u043b\u0436\u0435\u043d\u043d\u043e\u0441\u0442\u0438 \u0441\u043e\u0441\u0442\u0430\u0432\u043b\u044f\u0435\u0442 \u043d\u0435 \u043c\u0435\u043d\u0435\u0435 50 \u0442\u044b\u0441. \u0440\u0443\u0431\u043b\u0435\u0439 \u0438 \u043d\u0435 \u043f\u0440\u0435\u0432\u044b\u0448\u0430\u0435\u0442 500 \u0442\u044b\u0441. \u0440\u0443\u0431\u043b\u0435\u0439 \u0431\u0435\u0437 \u0443\u0447\u0435\u0442\u0430 \u0448\u0442\u0440\u0430\u0444\u043e\u0432, \u043f\u0435\u043d\u0438, \u043f\u0440\u043e\u0446\u0435\u043d\u0442\u043e\u0432 \u0437\u0430 \u043f\u0440\u043e\u0441\u0440\u043e\u0447\u043a\u0443 \u043f\u043b\u0430\u0442\u0435\u0436\u0430 \u0438 \u043f\u0440\u043e\u0447\u0438\u0445 \u0438\u043c\u0443\u0449\u0435\u0441\u0442\u0432\u0435\u043d\u043d\u044b\u0445 \u0438\u043b\u0438 \u0444\u0438\u043d\u0430\u043d\u0441\u043e\u0432\u044b\u0445 \u0441\u0430\u043d\u043a\u0446\u0438\u0439. \u0423 \u0444\u0438\u0437\u043b\u0438\u0446 \u0438 \u0438\u043d\u0434\u0438\u0432\u0438\u0434\u0443\u0430\u043b\u044c\u043d\u044b\u0445 \u043f\u0440\u0435\u0434\u043f\u0440\u0438\u043d\u0438\u043c\u0430\u0442\u0435\u043b\u0435\u0439 \u043f\u043e\u044f\u0432\u0438\u043b\u0430\u0441\u044c \u0432\u043e\u0437\u043c\u043e\u0436\u043d\u043e\u0441\u0442\u044c \u043f\u0440\u043e\u0439\u0442\u0438 \u043f\u0440\u043e\u0446\u0435\u0434\u0443\u0440\u0443 \u0431\u0430\u043d\u043a\u0440\u043e\u0442\u0441\u0442\u0432\u0430 \u0431\u0435\u0437 \u0443\u0447\u0430\u0441\u0442\u0438\u044f \u0441\u0443\u0434\u0430 \u0438 \u0444\u0438\u043d\u0430\u043d\u0441\u043e\u0432\u043e\u0433\u043e \u0443\u043f\u0440\u0430\u0432\u043b\u044f\u044e\u0449\u0435\u0433\u043e \u2014 \u0434\u043e\u0441\u0442\u0430\u0442\u043e\u0447\u043d\u043e \u043f\u043e\u0434\u0430\u0442\u044c \u0441\u043e\u043e\u0442\u0432\u0435\u0442\u0441\u0442\u0432\u0443\u044e\u0449\u0435\u0435 \u0437\u0430\u044f\u0432\u043b\u0435\u043d\u0438\u0435 \u0447\u0435\u0440\u0435\u0437 \u041c\u0424\u0426. \u0421\u0443\u043c\u043c\u0443 \u0437\u0430\u0434\u043e\u043b\u0436\u0435\u043d\u043d\u043e\u0441\u0442\u0438 \u0438 \u0441\u043f\u0438\u0441\u043e\u043a \u0432\u0441\u0435\u0445 \u0438\u0437\u0432\u0435\u0441\u0442\u043d\u044b\u0445 \u0437\u0430\u044f\u0432\u0438\u0442\u0435\u043b\u044e \u043a\u0440\u0435\u0434\u0438\u0442\u043e\u0440\u043e\u0432 \u043d\u0443\u0436\u043d\u043e \u043f\u0440\u0435\u0434\u043e\u0441\u0442\u0430\u0432\u0438\u0442\u044c \u0441\u0430\u043c\u043e\u0441\u0442\u043e\u044f\u0442\u0435\u043b\u044c\u043d\u043e. \u0415\u0441\u043b\u0438 \u0432\u0441\u0435 \u0443\u0441\u043b\u043e\u0432\u0438\u044f \u0441\u043e\u0431\u043b\u044e\u0434\u0435\u043d\u044b, \u0441\u0432\u0435\u0434\u0435\u043d\u0438\u044f \u0432\u043d\u0435\u0441\u0443\u0442 \u0432 \u0415\u0434\u0438\u043d\u044b\u0439 \u0444\u0435\u0434\u0435\u0440\u0430\u043b\u044c\u043d\u044b\u0439 \u0440\u0435\u0435\u0441\u0442\u0440 \u0432 \u0442\u0435\u0447\u0435\u043d\u0438\u0435 \u0442\u0440\u0435\u0445 \u0440\u0430\u0431\u043e\u0447\u0438\u0445 \u0434\u043d\u0435\u0439. \u041f\u0440\u0438 \u044d\u0442\u043e\u043c \u043d\u0430 \u043c\u043e\u043c\u0435\u043d\u0442 \u043f\u043e\u0434\u0430\u0447\u0438 \u0437\u0430\u044f\u0432\u043b\u0435\u043d\u0438\u044f \u0432 \u043e\u0442\u043d\u043e\u0448\u0435\u043d\u0438\u0438 \u0437\u0430\u044f\u0432\u0438\u0442\u0435\u043b\u044f \u0434\u043e\u043b\u0436\u043d\u043e \u0431\u044b\u0442\u044c \u043e\u043a\u043e\u043d\u0447\u0435\u043d\u043e \u0438\u0441\u043f\u043e\u043b\u043d\u0438\u0442\u0435\u043b\u044c\u043d\u043e\u0435 \u043f\u0440\u043e\u0438\u0437\u0432\u043e\u0434\u0441\u0442\u0432\u043e \u0441 \u0432\u043e\u0437\u0432\u0440\u0430\u0449\u0435\u043d\u0438\u0435\u043c \u0438\u0441\u043f\u043e\u043b\u043d\u0438\u0442\u0435\u043b\u044c\u043d\u043e\u0433\u043e \u0434\u043e\u043a\u0443\u043c\u0435\u043d\u0442\u0430 \u0432\u0437\u044b\u0441\u043a\u0430\u0442\u0435\u043b\u044e. \u042d\u0442\u043e \u0437\u043d\u0430\u0447\u0438\u0442, \u0447\u0442\u043e \u0443 \u043f\u043e\u0442\u0435\u043d\u0446\u0438\u0430\u043b\u044c\u043d\u043e\u0433\u043e \u0431\u0430\u043d\u043a\u0440\u043e\u0442\u0430 \u043d\u0435 \u0434\u043e\u043b\u0436\u043d\u043e \u0431\u044b\u0442\u044c \u0438\u043c\u0443\u0449\u0435\u0441\u0442\u0432\u0430, \u043a\u043e\u0442\u043e\u0440\u043e\u0435 \u043c\u043e\u0436\u043d\u043e \u0432\u0437\u044b\u0441\u043a\u0430\u0442\u044c. \u041a\u0440\u043e\u043c\u0435 \u0442\u043e\u0433\u043e, \u0432 \u043e\u0442\u043d\u043e\u0448\u0435\u043d\u0438\u0438 \u0433\u0440\u0430\u0436\u0434\u0430\u043d\u0438\u043d\u0430 \u043d\u0435 \u0434\u043e\u043b\u0436\u043d\u043e \u0431\u044b\u0442\u044c \u0432\u043e\u0437\u0431\u0443\u0436\u0434\u0435\u043d\u043e \u0434\u0440\u0443\u0433\u043e\u0435 \u0438\u0441\u043f\u043e\u043b\u043d\u0438\u0442\u0435\u043b\u044c\u043d\u043e\u0435 \u043f\u0440\u043e\u0438\u0437\u0432\u043e\u0434\u0441\u0442\u0432\u043e. \u0412 \u043f\u0435\u0440\u0438\u043e\u0434 \u0432\u0441\u0435\u0439 \u043f\u0440\u043e\u0446\u0435\u0434\u0443\u0440\u044b \u0437\u0430\u044f\u0432\u0438\u0442\u0435\u043b\u044c \u043d\u0435 \u0441\u043c\u043e\u0436\u0435\u0442 \u0431\u0440\u0430\u0442\u044c \u0437\u0430\u0439\u043c\u044b, \u043a\u0440\u0435\u0434\u0438\u0442\u044b, \u0432\u044b\u0434\u0430\u0432\u0430\u0442\u044c \u043f\u043e\u0440\u0443\u0447\u0438\u0442\u0435\u043b\u044c\u0441\u0442\u0432\u0430, \u0441\u043e\u0432\u0435\u0440\u0448\u0430\u0442\u044c \u0438\u043d\u044b\u0435 \u043e\u0431\u0435\u0441\u043f\u0435\u0447\u0438\u0442\u0435\u043b\u044c\u043d\u044b\u0435 \u0441\u0434\u0435\u043b\u043a\u0438. \u0412\u043d\u0435\u0441\u0443\u0434\u0435\u0431\u043d\u043e\u0435 \u0431\u0430\u043d\u043a\u0440\u043e\u0442\u0441\u0442\u0432\u043e \u0431\u0443\u0434\u0435\u0442 \u0434\u043b\u0438\u0442\u044c\u0441\u044f \u0448\u0435\u0441\u0442\u044c \u043c\u0435\u0441\u044f\u0446\u0435\u0432, \u0432 \u0442\u0435\u0447\u0435\u043d\u0438\u0435 \u043a\u043e\u0442\u043e\u0440\u044b\u0445 \u0442\u0430\u043a\u0436\u0435 \u0431\u0443\u0434\u0435\u0442 \u0434\u0435\u0439\u0441\u0442\u0432\u043e\u0432\u0430\u0442\u044c \u043c\u043e\u0440\u0430\u0442\u043e\u0440\u0438\u0439 \u043d\u0430 \u0443\u0434\u043e\u0432\u043b\u0435\u0442\u0432\u043e\u0440\u0435\u043d\u0438\u0435 \u0442\u0440\u0435\u0431\u043e\u0432\u0430\u043d\u0438\u0439 \u043a\u0440\u0435\u0434\u0438\u0442\u043e\u0440\u043e\u0432, \u043e\u0442\u043c\u0435\u0447\u0435\u043d\u043d\u044b\u0445 \u0432 \u0437\u0430\u044f\u0432\u043b\u0435\u043d\u0438\u0438 \u0434\u043e\u043b\u0436\u043d\u0438\u043a\u0430, \u0438 \u043c\u043e\u0440\u0430\u0442\u043e\u0440\u0438\u0439 \u043e\u0431 \u0443\u043f\u043b\u0430\u0442\u0435 \u043e\u0431\u044f\u0437\u0430\u0442\u0435\u043b\u044c\u043d\u044b\u0445 \u043f\u043b\u0430\u0442\u0435\u0436\u0435\u0439. \u041a\u0440\u043e\u043c\u0435 \u0442\u043e\u0433\u043e, \u043f\u0440\u0435\u043a\u0440\u0430\u0449\u0430\u0435\u0442\u0441\u044f \u043d\u0430\u0447\u0438\u0441\u043b\u0435\u043d\u0438\u0435 \u043d\u0435\u0443\u0441\u0442\u043e\u0435\u043a \u0438 \u0438\u043d\u044b\u0445 \u0444\u0438\u043d\u0430\u043d\u0441\u043e\u0432\u044b\u0445 \u0441\u0430\u043d\u043a\u0446\u0438\u0439; \u0438\u043c\u0443\u0449\u0435\u0441\u0442\u0432\u0435\u043d\u043d\u044b\u0435 \u0432\u0437\u044b\u0441\u043a\u0430\u043d\u0438\u044f (\u043a\u0440\u043e\u043c\u0435 \u0430\u043b\u0438\u043c\u0435\u043d\u0442\u043e\u0432) \u0442\u0430\u043a\u0436\u0435 \u0431\u0443\u0434\u0443\u0442 \u043f\u0440\u0438\u043e\u0441\u0442\u0430\u043d\u043e\u0432\u043b\u0435\u043d\u044b. \u041f\u043e \u0437\u0430\u0432\u0435\u0440\u0448\u0435\u043d\u0438\u044e \u043f\u0440\u043e\u0446\u0435\u0434\u0443\u0440\u044b \u0437\u0430\u044f\u0432\u0438\u0442\u0435\u043b\u044f \u043e\u0441\u0432\u043e\u0431\u043e\u0434\u044f\u0442 \u043e\u0442 \u0434\u0430\u043b\u044c\u043d\u0435\u0439\u0448\u0435\u0433\u043e \u0432\u044b\u043f\u043e\u043b\u043d\u0435\u043d\u0438\u044f \u0442\u0440\u0435\u0431\u043e\u0432\u0430\u043d\u0438\u0439 \u043a\u0440\u0435\u0434\u0438\u0442\u043e\u0440\u043e\u0432, \u0443\u043a\u0430\u0437\u0430\u043d\u043d\u044b\u0445 \u0432 \u0437\u0430\u044f\u0432\u043b\u0435\u043d\u0438\u0438 \u043e \u043f\u0440\u0438\u0437\u043d\u0430\u043d\u0438\u0438 \u0435\u0433\u043e \u0431\u0430\u043d\u043a\u0440\u043e\u0442\u043e\u043c, \u0430 \u044d\u0442\u0430 \u0437\u0430\u0434\u043e\u043b\u0436\u0435\u043d\u043d\u043e\u0441\u0442\u044c \u043f\u0440\u0438\u0437\u043d\u0430\u0435\u0442\u0441\u044f \u0431\u0435\u0437\u043d\u0430\u0434\u0435\u0436\u043d\u043e\u0439. \u0412 \u043f\u0440\u043e\u0448\u043b\u043e\u043c \u043c\u0435\u0441\u044f\u0446\u0435 \u0441\u0442\u0430\u043b\u043e \u0438\u0437\u0432\u0435\u0441\u0442\u043d\u043e, \u0447\u0442\u043e \u0437\u0430 \u043f\u0435\u0440\u0432\u043e\u0435 \u043f\u043e\u043b\u0443\u0433\u043e\u0434\u0438\u0435 2020 \u0433\u043e\u0434\u0430 \u0440\u043e\u0441\u0441\u0438\u0439\u0441\u043a\u0438\u0435 \u0441\u0443\u0434\u044b \u043f\u0440\u0438\u0437\u043d\u0430\u043b\u0438 \u0431\u0430\u043d\u043a\u0440\u043e\u0442\u0430\u043c\u0438 42,7 \u0442\u044b\u0441. \u0433\u0440\u0430\u0436\u0434\u0430\u043d (\u0432 \u0442\u043e\u043c \u0447\u0438\u0441\u043b\u0435 \u0438\u043d\u0434\u0438\u0432\u0438\u0434\u0443\u0430\u043b\u044c\u043d\u044b\u0445 \u043f\u0440\u0435\u0434\u043f\u0440\u0438\u043d\u0438\u043c\u0430\u0442\u0435\u043b\u0435\u0439) \u2014 \u043f\u043e \u0434\u0430\u043d\u043d\u044b\u043c \u0435\u0434\u0438\u043d\u043e\u0433\u043e \u0440\u0435\u0435\u0441\u0442\u0440\u0430 \u00ab\u0424\u0435\u0434\u0440\u0435\u0441\u0443\u0440\u0441\u00bb, \u044d\u0442\u043e \u043d\u0430 47,2% \u0431\u043e\u043b\u044c\u0448\u0435 \u043f\u043e\u043a\u0430\u0437\u0430\u0442\u0435\u043b\u044f \u0430\u043d\u0430\u043b\u043e\u0433\u0438\u0447\u043d\u043e\u0433\u043e \u043f\u0435\u0440\u0438\u043e\u0434\u0430 2019 \u0433\u043e\u0434\u0430. \u0420\u043e\u0441\u0442 \u0447\u0438\u0441\u043b\u0430 \u043e\u0431\u0430\u043d\u043a\u0440\u043e\u0442\u0438\u0432\u0448\u0438\u0445\u0441\u044f \u0433\u0440\u0430\u0436\u0434\u0430\u043d \u0432\u043e \u0432\u0442\u043e\u0440\u043e\u043c \u043a\u0432\u0430\u0440\u0442\u0430\u043b\u0435 \u043f\u043e \u0441\u0440\u0430\u0432\u043d\u0435\u043d\u0438\u044e \u0441 \u043f\u0435\u0440\u0432\u044b\u043c \u0437\u0430\u043c\u0435\u0434\u043b\u0438\u043b\u0441\u044f \u2014 \u0442\u0430\u043a\u0430\u044f \u0434\u0438\u043d\u0430\u043c\u0438\u043a\u0430 \u043e\u0431\u0443\u0441\u043b\u043e\u0432\u043b\u0435\u043d\u0430 \u0442\u0435\u043c, \u0447\u0442\u043e \u0432 \u043f\u0435\u0440\u0438\u043e\u0434 \u043e\u0433\u0440\u0430\u043d\u0438\u0447\u0435\u043d\u0438\u0439 \u0441 19 \u043c\u0430\u0440\u0442\u0430 \u043f\u043e 11 \u043c\u0430\u044f \u0441\u0443\u0434\u044b \u0440\u0435\u0434\u043a\u043e \u0440\u0430\u0441\u0441\u043c\u0430\u0442\u0440\u0438\u0432\u0430\u043b\u0438 \u0431\u0430\u043d\u043a\u0440\u043e\u0442\u043d\u044b\u0435 \u0434\u0435\u043b\u0430 \u043a\u043e\u043c\u043f\u0430\u043d\u0438\u0439 \u0438 \u043c\u0435\u043d\u044c\u0448\u0435, \u0447\u0435\u043c \u043e\u0431\u044b\u0447\u043d\u043e, \u0432 \u043e\u0442\u043d\u043e\u0448\u0435\u043d\u0438\u0438 \u0433\u0440\u0430\u0436\u0434\u0430\u043d, \u043e\u0431\u044a\u044f\u0441\u043d\u044f\u043b \u0440\u0443\u043a\u043e\u0432\u043e\u0434\u0438\u0442\u0435\u043b\u044c \u043f\u0440\u043e\u0435\u043a\u0442\u0430 \u00ab\u0424\u0435\u0434\u0440\u0435\u0441\u0443\u0440\u0441\u00bb \u0410\u043b\u0435\u043a\u0441\u0435\u0439 \u042e\u0445\u043d\u0438\u043d. \u041e\u043d \u043f\u0440\u043e\u0433\u043d\u043e\u0437\u0438\u0440\u0443\u0435\u0442, \u0447\u0442\u043e \u0432\u043e \u0432\u0442\u043e\u0440\u043e\u043c \u043f\u043e\u043b\u0443\u0433\u043e\u0434\u0438\u0438 \u043c\u044b \u0443\u0432\u0438\u0434\u0438\u043c \u0440\u043e\u0441\u0442 \u043f\u043e\u043a\u0430\u0437\u0430\u0442\u0435\u043b\u044f, \u043a\u043e\u0433\u0434\u0430 \u0441\u0443\u0434\u044b \u0440\u0430\u0441\u0441\u043c\u043e\u0442\u0440\u044f\u0442 \u0432\u0441\u0435 \u0434\u0435\u043b\u0430, \u0447\u0442\u043e \u043d\u0435 \u0441\u043c\u043e\u0433\u043b\u0438 \u0440\u0430\u043d\u0435\u0435 \u0432 \u0440\u0435\u0436\u0438\u043c\u0435 \u043e\u0433\u0440\u0430\u043d\u0438\u0447\u0435\u043d\u0438\u0439. \u041f\u043e \u0435\u0433\u043e \u0434\u0430\u043d\u043d\u044b\u043c, \u0443\u0436\u0435 \u0432 \u0438\u044e\u043d\u0435 \u0447\u0438\u0441\u043b\u043e \u043b\u0438\u0447\u043d\u044b\u0445 \u0431\u0430\u043d\u043a\u0440\u043e\u0442\u0441\u0442\u0432 \u0432\u044b\u0440\u043e\u0441\u043b\u043e \u0434\u043e 11,5 \u0442\u044b\u0441., \u0447\u0442\u043e \u0432 \u0434\u0432\u0430 \u0440\u0430\u0437\u0430 \u043f\u0440\u0435\u0432\u044b\u0448\u0430\u0435\u0442 \u043f\u043e\u043a\u0430\u0437\u0430\u0442\u0435\u043b\u044c \u0430\u043d\u0430\u043b\u043e\u0433\u0438\u0447\u043d\u043e\u0433\u043e \u043f\u0435\u0440\u0438\u043e\u0434\u0430 2019 \u0433\u043e\u0434\u0430.", "example_title": "\u041d\u043e\u0432\u043e\u0441\u0442\u0438"}, {"text": "\u0410\u043a\u0442\u0443\u0430\u043b\u044c\u043d\u043e\u0441\u0442\u044c \u043f\u0440\u043e\u0431\u043b\u0435\u043c\u044b. \u042d\u043b\u0435\u043a\u0442\u0440\u043e\u043d\u043d\u0430\u044f \u0438\u043d\u0444\u043e\u0440\u043c\u0430\u0446\u0438\u044f \u0438\u0433\u0440\u0430\u0435\u0442 \u0432\u0441\u0435 \u0431\u043e\u043b\u044c\u0448\u0443\u044e \u0440\u043e\u043b\u044c \u0432\u043e \u0432\u0441\u0435\u0445 \u0441\u0444\u0435\u0440\u0430\u0445 \u0436\u0438\u0437\u043d\u0438 \u0441\u043e\u0432\u0440\u0435\u043c\u0435\u043d\u043d\u043e\u0433\u043e \u043e\u0431\u0449\u0435\u0441\u0442\u0432\u0430. \u0412 \u043f\u043e\u0441\u043b\u0435\u0434\u043d\u0438\u0435 \u0433\u043e\u0434\u044b \u043e\u0431\u044a\u0435\u043c \u043d\u0430\u0443\u0447\u043d\u043e-\u0442\u0435\u0445\u043d\u0438\u0447\u0435\u0441\u043a\u043e\u0439 \u0442\u0435\u043a\u0441\u0442\u043e\u0432\u043e\u0439 \u0438\u043d\u0444\u043e\u0440\u043c\u0430\u0446\u0438\u0438 \u0432 \u044d\u043b\u0435\u043a\u0442\u0440\u043e\u043d\u043d\u043e\u043c \u0432\u0438\u0434\u0435 \u0432\u043e\u0437\u0440\u043e\u0441 \u043d\u0430\u0441\u0442\u043e\u043b\u044c\u043a\u043e, \u0447\u0442\u043e \u0432\u043e\u0437\u043d\u0438\u043a\u0430\u0435\u0442 \u0443\u0433\u0440\u043e\u0437\u0430 \u043e\u0431\u0435\u0441\u0446\u0435\u043d\u0438\u0432\u0430\u043d\u0438\u044f \u044d\u0442\u043e\u0439 \u0438\u043d\u0444\u043e\u0440\u043c\u0430\u0446\u0438\u0438 \u0432 \u0441\u0432\u044f\u0437\u0438 \u0441 \u0442\u0440\u0443\u0434\u043d\u043e\u0441\u0442\u044f\u043c\u0438 \u043f\u043e\u0438\u0441\u043a\u0430 \u043d\u0435\u043e\u0431\u0445\u043e\u0434\u0438\u043c\u044b\u0445 \u0441\u0432\u0435\u0434\u0435\u043d\u0438\u0439 \u0441\u0440\u0435\u0434\u0438 \u043c\u043d\u043e\u0436\u0435\u0441\u0442\u0432\u0430 \u0434\u043e\u0441\u0442\u0443\u043f\u043d\u044b\u0445 \u0442\u0435\u043a\u0441\u0442\u043e\u0432. \u0420\u0430\u0437\u0432\u0438\u0442\u0438\u0435 \u0438\u043d\u0444\u043e\u0440\u043c\u0430\u0446\u0438\u043e\u043d\u043d\u044b\u0445 \u0440\u0435\u0441\u0443\u0440\u0441\u043e\u0432 \u0418\u043d\u0442\u0435\u0440\u043d\u0435\u0442 \u043c\u043d\u043e\u0433\u043e\u043a\u0440\u0430\u0442\u043d\u043e \u0443\u0441\u0443\u0433\u0443\u0431\u0438\u043b\u043e \u043f\u0440\u043e\u0431\u043b\u0435\u043c\u0443 \u0438\u043d\u0444\u043e\u0440\u043c\u0430\u0446\u0438\u043e\u043d\u043d\u043e\u0439 \u043f\u0435\u0440\u0435\u0433\u0440\u0443\u0437\u043a\u0438. \u0412 \u044d\u0442\u043e\u0439 \u0441\u0438\u0442\u0443\u0430\u0446\u0438\u0438 \u043e\u0441\u043e\u0431\u0435\u043d\u043d\u043e \u0430\u043a\u0442\u0443\u0430\u043b\u044c\u043d\u044b\u043c\u0438 \u0441\u0442\u0430\u043d\u043e\u0432\u044f\u0442\u0441\u044f \u043c\u0435\u0442\u043e\u0434\u044b \u0430\u0432\u0442\u043e\u043c\u0430\u0442\u0438\u0437\u0430\u0446\u0438\u0438 \u0440\u0435\u0444\u0435\u0440\u0438\u0440\u043e\u0432\u0430\u043d\u0438\u044f \u0442\u0435\u043a\u0441\u0442\u043e\u0432\u043e\u0439 \u0438\u043d\u0444\u043e\u0440\u043c\u0430\u0446\u0438\u0438, \u0442\u043e \u0435\u0441\u0442\u044c \u043c\u0435\u0442\u043e\u0434\u044b \u043f\u043e\u043b\u0443\u0447\u0435\u043d\u0438\u044f \u0441\u0436\u0430\u0442\u043e\u0433\u043e \u043f\u0440\u0435\u0434\u0441\u0442\u0430\u0432\u043b\u0435\u043d\u0438\u044f \u0442\u0435\u043a\u0441\u0442\u043e\u0432\u044b\u0445 \u0434\u043e\u043a\u0443\u043c\u0435\u043d\u0442\u043e\u0432\u2013\u0440\u0435\u0444\u0435\u0440\u0430\u0442\u043e\u0432 (\u0430\u043d\u043d\u043e\u0442\u0430\u0446\u0438\u0439). \u041f\u043e\u0441\u0442\u0430\u043d\u043e\u0432\u043a\u0430 \u043f\u0440\u043e\u0431\u043b\u0435\u043c\u044b \u0430\u0432\u0442\u043e\u043c\u0430\u0442\u0438\u0447\u0435\u0441\u043a\u043e\u0433\u043e \u0440\u0435\u0444\u0435\u0440\u0438\u0440\u043e\u0432\u0430\u043d\u0438\u044f \u0442\u0435\u043a\u0441\u0442\u0430 \u0438 \u0441\u043e\u043e\u0442\u0432\u0435\u0442\u0441\u0442\u0432\u0435\u043d\u043d\u043e \u043f\u043e\u043f\u044b\u0442\u043a\u0438 \u0435\u0435 \u0440\u0435\u0448\u0435\u043d\u0438\u044f \u0441 \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u043d\u0438\u0435\u043c \u0440\u0430\u0437\u043b\u0438\u0447\u043d\u044b\u0445 \u043f\u043e\u0434\u0445\u043e\u0434\u043e\u0432 \u043f\u0440\u0435\u0434\u043f\u0440\u0438\u043d\u0438\u043c\u0430\u043b\u0438\u0441\u044c \u043c\u043d\u043e\u0433\u0438\u043c\u0438 \u0438\u0441\u0441\u043b\u0435\u0434\u043e\u0432\u0430\u0442\u0435\u043b\u044f\u043c\u0438. \u0418\u0441\u0442\u043e\u0440\u0438\u044f \u043f\u0440\u0438\u043c\u0435\u043d\u0435\u043d\u0438\u044f \u0432\u044b\u0447\u0438\u0441\u043b\u0438\u0442\u0435\u043b\u044c\u043d\u043e\u0439 \u0442\u0435\u0445\u043d\u0438\u043a\u0438 \u0434\u043b\u044f \u0440\u0435\u0444\u0435\u0440\u0438\u0440\u043e\u0432\u0430\u043d\u0438\u044f \u043d\u0430\u0441\u0447\u0438\u0442\u044b\u0432\u0430\u0435\u0442 \u0443\u0436\u0435 \u0431\u043e\u043b\u0435\u0435 50 \u043b\u0435\u0442 \u0438 \u0441\u0432\u044f\u0437\u0430\u043d\u0430 \u0441 \u0438\u043c\u0435\u043d\u0430\u043c\u0438 \u0442\u0430\u043a\u0438\u0445 \u0438\u0441\u0441\u043b\u0435\u0434\u043e\u0432\u0430\u0442\u0435\u043b\u0435\u0439, \u043a\u0430\u043a \u0413.\u041f. \u041b\u0443\u043d, \u0412.\u0415. \u0411\u0435\u0440\u0437\u043e\u043d, \u0418.\u041f. C\u0435\u0432\u0431\u043e, \u042d.\u0424. \u0421\u043a\u043e\u0440\u043e\u0445\u043e\u0434\u044c\u043a\u043e, \u0414.\u0413. \u041b\u0430\u0445\u0443\u0442\u0438, \u0420.\u0413. \u041f\u0438\u043e\u0442\u0440\u043e\u0432\u0441\u043a\u0438\u0439 \u0438 \u0434\u0440. \u0417\u0430 \u044d\u0442\u0438 \u0433\u043e\u0434\u044b \u0432\u044b\u0440\u0430\u0431\u043e\u0442\u0430\u043d\u044b \u043c\u043d\u043e\u0433\u043e\u0447\u0438\u0441\u043b\u0435\u043d\u043d\u044b\u0435 \u043f\u043e\u0434\u0445\u043e\u0434\u044b \u043a \u0440\u0435\u0448\u0435\u043d\u0438\u044e \u0434\u0430\u043d\u043d\u043e\u0439 \u043f\u0440\u043e\u0431\u043b\u0435\u043c\u044b, \u043a\u043e\u0442\u043e\u0440\u044b\u0435 \u0434\u043e\u0441\u0442\u0430\u0442\u043e\u0447\u043d\u043e \u0447\u0435\u0442\u043a\u043e \u043f\u043e\u0434\u0440\u0430\u0437\u0434\u0435\u043b\u044f\u044e\u0442\u0441\u044f \u043d\u0430 \u0434\u0432\u0430 \u043d\u0430\u043f\u0440\u0430\u0432\u043b\u0435\u043d\u0438\u044f: \u0430\u0432\u0442\u043e\u043c\u0430\u0442\u0438\u0447\u0435\u0441\u043a\u043e\u0435 \u0440\u0435\u0444\u0435\u0440\u0438\u0440\u043e\u0432\u0430\u043d\u0438\u0435, \u043e\u0441\u043d\u043e\u0432\u0430\u043d\u043d\u043e\u0435 \u043d\u0430 \u044d\u043a\u0441\u0442\u0440\u0430\u0433\u0438\u0440\u043e\u0432\u0430\u043d\u0438\u0438 \u0438\u0437 \u043f\u0435\u0440\u0432\u0438\u0447\u043d\u044b\u0445 \u0434\u043e\u043a\u0443\u043c\u0435\u043d\u0442\u043e\u0432 \u0441 \u043f\u043e\u043c\u043e\u0449\u044c\u044e \u043e\u043f\u0440\u0435\u0434\u0435\u043b\u0435\u043d\u043d\u044b\u0445 \u0444\u043e\u0440\u043c\u0430\u043b\u044c\u043d\u044b\u0445 \u043f\u0440\u0438\u0437\u043d\u0430\u043a\u043e\u0432 \u00ab\u043d\u0430\u0438\u0431\u043e\u043b\u0435\u0435 \u0438\u043d\u0444\u043e\u0440\u043c\u0430\u0442\u0438\u0432\u043d\u044b\u0445\u00bb \u0444\u0440\u0430\u0437 (\u0444\u0440\u0430\u0433\u043c\u0435\u043d\u0442\u043e\u0432), \u0441\u043e\u0432\u043e\u043a\u0443\u043f\u043d\u043e\u0441\u0442\u044c \u043a\u043e\u0442\u043e\u0440\u044b\u0445 \u043e\u0431\u0440\u0430\u0437\u0443\u0435\u0442 \u043d\u0435\u043a\u043e\u0442\u043e\u0440\u044b\u0439 \u044d\u043a\u0441\u0442\u0440\u0430\u043a\u0442; \u0430\u0432\u0442\u043e\u043c\u0430\u0442\u0438\u0447\u0435\u0441\u043a\u043e\u0435 \u0440\u0435\u0444\u0435\u0440\u0438\u0440\u043e\u0432\u0430\u043d\u0438\u0435, \u043e\u0441\u043d\u043e\u0432\u0430\u043d\u043d\u043e\u0435 \u043d\u0430 \u0432\u044b\u0434\u0435\u043b\u0435\u043d\u0438\u0438 \u0438\u0437 \u0442\u0435\u043a\u0441\u0442\u043e\u0432 \u0441 \u043f\u043e\u043c\u043e\u0449\u044c\u044e \u0441\u043f\u0435\u0446\u0438\u0430\u043b\u044c\u043d\u044b\u0445 \u0438\u043d\u0444\u043e\u0440\u043c\u0430\u0446\u0438\u043e\u043d\u043d\u044b\u0445 \u044f\u0437\u044b\u043a\u043e\u0432 \u043d\u0430\u0438\u0431\u043e\u043b\u0435\u0435 \u0441\u0443\u0449\u0435\u0441\u0442\u0432\u0435\u043d\u043d\u043e\u0439 \u0438\u043d\u0444\u043e\u0440\u043c\u0430\u0446\u0438\u0438 \u0438 \u043f\u043e\u0440\u043e\u0436\u0434\u0435\u043d\u0438\u0438 \u043d\u043e\u0432\u044b\u0445 \u0442\u0435\u043a\u0441\u0442\u043e\u0432 (\u0440\u0435\u0444\u0435\u0440\u0430\u0442\u043e\u0432), \u0441\u043e\u0434\u0435\u0440\u0436\u0430\u0442\u0435\u043b\u044c\u043d\u043e \u043e\u0431\u043e\u0431\u0449\u0430\u044e\u0449\u0438\u0445 \u043f\u0435\u0440\u0432\u0438\u0447\u043d\u044b\u0435 \u0434\u043e\u043a\u0443\u043c\u0435\u043d\u0442\u044b.", "example_title": "\u041d\u0430\u0443\u0447\u043d\u0430\u044f \u0441\u0442\u0430\u0442\u044c\u044f"}]}, "description": "\n\n# MBARTRuSumGazeta\n\n## Model description\n\nThis is a ported version of [fairseq model](https://www.dropbox.com/s/fijtntnifbt9h0k/gazeta_mbart_v2_fairseq.tar.gz).\n\nFor more details, please see [Dataset for Automatic Summarization of Russian News](https://arxiv.org/abs/2006.11063).\n\n## Intended uses & limitations\n\n#### How to use\n\nColab: [link](https://colab.research.google.com/drive/1wdo_nPZPk6dWAn1J8nGx4Z5Ef82jCCob)\n\n```python\nfrom transformers import MBartTokenizer, MBartForConditionalGeneration\n\nmodel_name = \"IlyaGusev/mbart_ru_sum_gazeta\"\ntokenizer = MBartTokenizer.from_pretrained(model_name)\nmodel = MBartForConditionalGeneration.from_pretrained(model_name)\n\narticle_text = \"...\"\n\ninput_ids = tokenizer(\n [article_text],\n max_length=600,\n padding=\"max_length\",\n truncation=True,\n return_tensors=\"pt\",\n)[\"input_ids\"]\n\noutput_ids = model.generate(\n input_ids=input_ids,\n no_repeat_ngram_size=4\n)[0]\n\nsummary = tokenizer.decode(output_ids, skip_special_tokens=True)\nprint(summary)\n```\n\n#### Limitations and bias\n\n- The model should work well with Gazeta.ru articles, but for any other agencies it can suffer from domain shift\n\n\n## Training data\n\n- Dataset: [Gazeta](https://huggingface.co/datasets/IlyaGusev/gazeta)\n\n## Training procedure\n\n- Fairseq training script: [train.sh](https://github.com/IlyaGusev/summarus/blob/master/external/bart_scripts/train.sh)\n- Porting: [Colab link](https://colab.research.google.com/drive/13jXOlCpArV-lm4jZQ0VgOpj6nFBYrLAr)\n\n## Eval results\n\n* Train dataset: **Gazeta v1 train**\n* Test dataset: **Gazeta v1 test**\n* Source max_length: **600**\n* Target max_length: **200**\n* no_repeat_ngram_size: **4**\n* num_beams: **5**\n\n| Model | R-1-f | R-2-f | R-L-f | chrF | METEOR | BLEU | Avg char length |\n|:"} {"downloads": 7607, "id": "google/bigbird-pegasus-large-arxiv", "likes": 18, "pipeline_tag": "summarization", "task": "summarization", "meta": {"language": "en", "license": "apache-2.0", "datasets": ["scientific_papers"], "tags": ["summarization"], "model-index": [{"name": "google/bigbird-pegasus-large-arxiv", "results": [{"task": {"type": "summarization", "name": "Summarization"}, "dataset": {"name": "scientific_papers", "type": "scientific_papers", "config": "pubmed", "split": "test"}, "metrics": [{"name": "ROUGE-1", "type": "rouge", "value": 36.0276, "verified": true}, {"name": "ROUGE-2", "type": "rouge", "value": 13.4166, "verified": true}, {"name": "ROUGE-L", "type": "rouge", "value": 21.9612, "verified": true}, {"name": "ROUGE-LSUM", "type": "rouge", "value": 29.648, "verified": true}, {"name": "loss", "type": "loss", "value": 2.774355173110962, "verified": true}, {"name": "meteor", "type": "meteor", "value": 0.2824, "verified": true}, {"name": "gen_len", "type": "gen_len", "value": 209.2537, "verified": true}]}, {"task": {"type": "summarization", "name": "Summarization"}, "dataset": {"name": "cnn_dailymail", "type": "cnn_dailymail", "config": "3.0.0", "split": "test"}, "metrics": [{"name": "ROUGE-1", "type": "rouge", "value": 9.0885, "verified": true}, {"name": "ROUGE-2", "type": "rouge", "value": 1.0325, "verified": true}, {"name": "ROUGE-L", "type": "rouge", "value": 7.3182, "verified": true}, {"name": "ROUGE-LSUM", "type": "rouge", "value": 8.1455, "verified": true}, {"name": "loss", "type": "loss", "value": NaN, "verified": true}, {"name": "gen_len", "type": "gen_len", "value": 210.4762, "verified": true}]}, {"task": {"type": "summarization", "name": "Summarization"}, "dataset": {"name": "xsum", "type": "xsum", "config": "default", "split": "test"}, "metrics": [{"name": "ROUGE-1", "type": "rouge", "value": 4.9787, "verified": true}, {"name": "ROUGE-2", "type": "rouge", "value": 0.3527, "verified": true}, {"name": "ROUGE-L", "type": "rouge", "value": 4.3679, "verified": true}, {"name": "ROUGE-LSUM", "type": "rouge", "value": 4.1723, "verified": true}, {"name": "loss", "type": "loss", "value": NaN, "verified": true}, {"name": "gen_len", "type": "gen_len", "value": 230.4886, "verified": true}]}, {"task": {"type": "summarization", "name": "Summarization"}, "dataset": {"name": "scientific_papers", "type": "scientific_papers", "config": "arxiv", "split": "test"}, "metrics": [{"name": "ROUGE-1", "type": "rouge", "value": 43.4702, "verified": true}, {"name": "ROUGE-2", "type": "rouge", "value": 17.4297, "verified": true}, {"name": "ROUGE-L", "type": "rouge", "value": 26.2587, "verified": true}, {"name": "ROUGE-LSUM", "type": "rouge", "value": 35.5587, "verified": true}, {"name": "loss", "type": "loss", "value": 2.1113228797912598, "verified": true}, {"name": "gen_len", "type": "gen_len", "value": 183.3702, "verified": true}]}, {"task": {"type": "summarization", "name": "Summarization"}, "dataset": {"name": "samsum", "type": "samsum", "config": "samsum", "split": "test"}, "metrics": [{"name": "ROUGE-1", "type": "rouge", "value": 3.621, "verified": true}, {"name": "ROUGE-2", "type": "rouge", "value": 0.1699, "verified": true}, {"name": "ROUGE-L", "type": "rouge", "value": 3.2016, "verified": true}, {"name": "ROUGE-LSUM", "type": "rouge", "value": 3.3269, "verified": true}, {"name": "loss", "type": "loss", "value": 7.664482116699219, "verified": true}, {"name": "gen_len", "type": "gen_len", "value": 233.8107, "verified": true}]}]}]}, "description": "\n\n# BigBirdPegasus model (large)\n\nBigBird, is a sparse-attention based transformer which extends Transformer based models, such as BERT to much longer sequences. Moreover, BigBird comes along with a theoretical understanding of the capabilities of a complete transformer that the sparse model can handle. \n\nBigBird was introduced in this [paper](https://arxiv.org/abs/2007.14062) and first released in this [repository](https://github.com/google-research/bigbird).\n\nDisclaimer: The team releasing BigBird did not write a model card for this model so this model card has been written by the Hugging Face team.\n\n## Model description\n\nBigBird relies on **block sparse attention** instead of normal attention (i.e. BERT's attention) and can handle sequences up to a length of 4096 at a much lower compute cost compared to BERT. It has achieved SOTA on various tasks involving very long sequences such as long documents summarization, question-answering with long contexts.\n\n## How to use\n\nHere is how to use this model to get the features of a given text in PyTorch:\n\n```python\nfrom transformers import BigBirdPegasusForConditionalGeneration, AutoTokenizer\n\ntokenizer = AutoTokenizer.from_pretrained(\"google/bigbird-pegasus-large-arxiv\")\n\n# by default encoder-attention is `block_sparse` with num_random_blocks=3, block_size=64\nmodel = BigBirdPegasusForConditionalGeneration.from_pretrained(\"google/bigbird-pegasus-large-arxiv\")\n\n# decoder attention type can't be changed & will be \"original_full\"\n# you can change `attention_type` (encoder only) to full attention like this:\nmodel = BigBirdPegasusForConditionalGeneration.from_pretrained(\"google/bigbird-pegasus-large-arxiv\", attention_type=\"original_full\")\n\n# you can change `block_size` & `num_random_blocks` like this:\nmodel = BigBirdPegasusForConditionalGeneration.from_pretrained(\"google/bigbird-pegasus-large-arxiv\", block_size=16, num_random_blocks=2)\n\ntext = \"Replace me by any text you'd like.\"\ninputs = tokenizer(text, return_tensors='pt')\nprediction = model.generate(**inputs)\nprediction = tokenizer.batch_decode(prediction)\n```\n\n## Training Procedure\n\nThis checkpoint is obtained after fine-tuning `BigBirdPegasusForConditionalGeneration` for **summarization** on **arxiv dataset** from [scientific_papers](https://huggingface.co/datasets/scientific_papers).\n\n## BibTeX entry and citation info\n\n```tex\n@misc{zaheer2021big,\n title={Big Bird: Transformers for Longer Sequences}, \n author={Manzil Zaheer and Guru Guruganesh and Avinava Dubey and Joshua Ainslie and Chris Alberti and Santiago Ontanon and Philip Pham and Anirudh Ravula and Qifan Wang and Li Yang and Amr Ahmed},\n year={2021},\n eprint={2007.14062},\n archivePrefix={arXiv},\n primaryClass={cs.LG}\n}\n```\n"} {"downloads": 4015, "id": "IDEA-CCNL/Randeng-Pegasus-238M-Summary-Chinese", "likes": 17, "pipeline_tag": "summarization", "task": "summarization", "meta": {"language": "zh", "tags": ["summarization", "chinese"], "inference": false}, "description": "\n\n# Randeng-Pegasus-238M-Summary-Chinese\n\n- Github: [Fengshenbang-LM](https://github.com/IDEA-CCNL/Fengshenbang-LM/blob/main/fengshen/examples/summary/randeng_pegasus_523M_summary.sh)\n- Docs: [Fengshenbang-Docs](https://fengshenbang-doc.readthedocs.io/zh/latest/docs/%E7%87%83%E7%81%AF%E7%B3%BB%E5%88%97/Randeng-Pegasus-238M-Summary-Chinese.html)\n\n## \u7b80\u4ecb Brief Introduction\n\n\u5584\u4e8e\u5904\u7406\u6458\u8981\u4efb\u52a1\uff0c\u5728\u6570\u4e2a\u4e2d\u6587\u6458\u8981\u6570\u636e\u96c6\u4e0a\u5fae\u8c03\u540e\u7684\uff0c\u4e2d\u6587\u7248\u7684PAGASUS-base\u3002\n\nGood at solving text summarization tasks, after fine-tuning on multiple Chinese text summarization datasets, Chinese PAGASUS-base.\n\n## \u6a21\u578b\u5206\u7c7b Model Taxonomy\n\n| \u9700\u6c42 Demand | \u4efb\u52a1 Task | \u7cfb\u5217 Series | \u6a21\u578b Model | \u53c2\u6570 Parameter | \u989d\u5916 Extra |\n| :"} {"downloads": 2404, "id": "tuner007/pegasus_summarizer", "likes": 17, "pipeline_tag": "summarization", "task": "summarization", "meta": {"language": "en", "license": "apache-2.0", "tags": ["pegasus", "seq2seq", "summarization"], "model-index": [{"name": "tuner007/pegasus_summarizer", "results": [{"task": {"type": "summarization", "name": "Summarization"}, "dataset": {"name": "cnn_dailymail", "type": "cnn_dailymail", "config": "3.0.0", "split": "train"}, "metrics": [{"name": "ROUGE-1", "type": "rouge", "value": 36.604, "verified": true}, {"name": "ROUGE-2", "type": "rouge", "value": 14.6398, "verified": true}, {"name": "ROUGE-L", "type": "rouge", "value": 23.8845, "verified": true}, {"name": "ROUGE-LSUM", "type": "rouge", "value": 32.9017, "verified": true}, {"name": "loss", "type": "loss", "value": 2.5757133960723877, "verified": true}, {"name": "gen_len", "type": "gen_len", "value": 76.3984, "verified": true}]}]}]}, "description": "\n\n## Model description\n[PEGASUS](https://github.com/google-research/pegasus) fine-tuned for summarization\n\n## Install \"sentencepiece\" library required for tokenizer\n```\npip install sentencepiece\n```\n\n## Model in Action \ud83d\ude80\n```\nimport torch\nfrom transformers import PegasusForConditionalGeneration, PegasusTokenizer\nmodel_name = 'tuner007/pegasus_summarizer'\ntorch_device = 'cuda' if torch.cuda.is_available() else 'cpu'\ntokenizer = PegasusTokenizer.from_pretrained(model_name)\nmodel = PegasusForConditionalGeneration.from_pretrained(model_name).to(torch_device)\n\ndef get_response(input_text):\n batch = tokenizer([input_text],truncation=True,padding='longest',max_length=1024, return_tensors=\"pt\").to(torch_device)\n gen_out = model.generate(**batch,max_length=128,num_beams=5, num_return_sequences=1, temperature=1.5)\n output_text = tokenizer.batch_decode(gen_out, skip_special_tokens=True)\n return output_text\n```\n#### Example: \ncontext = \"\"\"\"\nIndia wicket-keeper batsman Rishabh Pant has said someone from the crowd threw a ball on pacer Mohammed Siraj while he was fielding in the ongoing third Test against England on Wednesday. Pant revealed the incident made India skipper Virat Kohli \"upset\". \"I think, somebody threw a ball inside, at Siraj, so he [Kohli] was upset,\" said Pant in a virtual press conference after the close of the first day\\'s play.\"You can say whatever you want to chant, but don\\'t throw things at the fielders and all those things. It is not good for cricket, I guess,\" he added.In the third session of the opening day of the third Test, a section of spectators seemed to have asked Siraj the score of the match to tease the pacer. The India pacer however came with a brilliant reply as he gestured 1-0 (India leading the Test series) towards the crowd.Earlier this month, during the second Test match, there was some bad crowd behaviour on a show as some unruly fans threw champagne corks at India batsman KL Rahul.Kohli also intervened and he was seen gesturing towards the opening batsman to know more about the incident. An over later, the TV visuals showed that many champagne corks were thrown inside the playing field, and the Indian players were visibly left frustrated.Coming back to the game, after bundling out India for 78, openers Rory Burns and Haseeb Hameed ensured that England took the honours on the opening day of the ongoing third Test.At stumps, England\\'s score reads 120/0 and the hosts have extended their lead to 42 runs. For the Three Lions, Burns (52*) and Hameed (60*) are currently unbeaten at the crease.Talking about the pitch on opening day, Pant said, \"They took the heavy roller, the wicket was much more settled down, and they batted nicely also,\" he said. \"But when we batted, the wicket was slightly soft, and they bowled in good areas, but we could have applied [ourselves] much better.\"Both England batsmen managed to see off the final session and the hosts concluded the opening day with all ten wickets intact, extending the lead to 42.(ANI)\n\"\"\"\n\n```\nget_response(context)\n```\n#### Output:\nTeam India wicketkeeper-batsman Rishabh Pant has said that Virat Kohli was \"upset\" after someone threw a ball on pacer Mohammed Siraj while he was fielding in the ongoing third Test against England. \"You can say whatever you want to chant, but don't throw things at the fielders and all those things. It's not good for cricket, I guess,\" Pant added.'\n\n#### [Inshort](https://www.inshorts.com/) (60 words News summary app, rated 4.4 by 5,27,246+ users on android playstore) summary:\nIndia wicketkeeper-batsman Rishabh Pant has revealed that captain Virat Kohli was upset with the crowd during the first day of Leeds Test against England because someone threw a ball at pacer Mohammed Siraj. Pant added, \"You can say whatever you want to chant, but don't throw things at the fielders and all those things. It is not good for cricket.\"\n\n\n> Created by [Arpit Rajauria](https://twitter.com/arpit_rajauria)\n[![Twitter icon](https://cdn0.iconfinder.com/data/icons/shift-logotypes/32/Twitter-32.png)](https://twitter.com/arpit_rajauria)\n"} {"downloads": 8065, "id": "sshleifer/distilbart-cnn-6-6", "likes": 16, "pipeline_tag": "summarization", "task": "summarization", "meta": {"language": "en", "tags": ["summarization"], "license": "apache-2.0", "datasets": ["cnn_dailymail", "xsum"], "thumbnail": "https://huggingface.co/front/thumbnails/distilbart_medium.png"}, "description": "\n\n### Usage\n\nThis checkpoint should be loaded into `BartForConditionalGeneration.from_pretrained`. See the [BART docs](https://huggingface.co/transformers/model_doc/bart.html?#transformers.BartForConditionalGeneration) for more information.\n\n### Metrics for DistilBART models\n\n| Model Name | MM Params | Inference Time (MS) | Speedup | Rouge 2 | Rouge-L |\n|:"} {"downloads": 3519, "id": "pszemraj/led-large-book-summary", "likes": 16, "pipeline_tag": "summarization", "task": "summarization", "meta": {"language": ["en"], "license": ["apache-2.0", "bsd-3-clause"], "tags": ["summarization", "led", "summary", "longformer", "booksum", "long-document", "long-form"], "datasets": ["kmfoda/booksum"], "metrics": ["rouge"], "widget": [{"text": "large earthquakes along a given fault segment do not occur at random intervals because it takes time to accumulate the strain energy for the rupture. The rates at which tectonic plates move and accumulate strain at their boundaries are approximately uniform. Therefore, in first approximation, one may expect that large ruptures of the same fault segment will occur at approximately constant time intervals. If subsequent main shocks have different amounts of slip across the fault, then the recurrence time may vary, and the basic idea of periodic mainshocks must be modified. For great plate boundary ruptures the length and slip often vary by a factor of 2. Along the southern segment of the San Andreas fault the recurrence interval is 145 years with variations of several decades. The smaller the standard deviation of the average recurrence interval, the more specific could be the long term prediction of a future mainshock.", "example_title": "earthquakes"}, {"text": " A typical feed-forward neural field algorithm. Spatiotemporal coordinates are fed into a neural network that predicts values in the reconstructed domain. Then, this domain is mapped to the sensor domain where sensor measurements are available as supervision. Class and Section Problems Addressed Generalization (Section 2) Inverse problems, ill-posed problems, editability; symmetries. Hybrid Representations (Section 3) Computation & memory efficiency, representation capacity, editability: Forward Maps (Section 4) Inverse problems Network Architecture (Section 5) Spectral bias, integration & derivatives. Manipulating Neural Fields (Section 6) Edit ability, constraints, regularization. Table 2: The five classes of techniques in the neural field toolbox each addresses problems that arise in learning, inference, and control. (Section 3). We can supervise reconstruction via differentiable forward maps that transform Or project our domain (e.g, 3D reconstruction via 2D images; Section 4) With appropriate network architecture choices, we can overcome neural network spectral biases (blurriness) and efficiently compute derivatives and integrals (Section 5). Finally, we can manipulate neural fields to add constraints and regularizations, and to achieve editable representations (Section 6). Collectively, these classes constitute a 'toolbox' of techniques to help solve problems with neural fields There are three components in a conditional neural field: (1) An encoder or inference function \u20ac that outputs the conditioning latent variable 2 given an observation 0 E(0) =2. 2 is typically a low-dimensional vector, and is often referred to aS a latent code Or feature code_ (2) A mapping function 4 between Z and neural field parameters O: Y(z) = O; (3) The neural field itself $. The encoder \u20ac finds the most probable z given the observations O: argmaxz P(2/0). The decoder maximizes the inverse conditional probability to find the most probable 0 given Z: arg- max P(Olz). We discuss different encoding schemes with different optimality guarantees (Section 2.1.1), both global and local conditioning (Section 2.1.2), and different mapping functions Y (Section 2.1.3) 2. Generalization Suppose we wish to estimate a plausible 3D surface shape given a partial or noisy point cloud. We need a suitable prior over the sur- face in its reconstruction domain to generalize to the partial observations. A neural network expresses a prior via the function space of its architecture and parameters 0, and generalization is influenced by the inductive bias of this function space (Section 5).", "example_title": "scientific paper"}, {"text": " the big variety of data coming from diverse sources is one of the key properties of the big data phenomenon. It is, therefore, beneficial to understand how data is generated in various environments and scenarios, before looking at what should be done with this data and how to design the best possible architecture to accomplish this The evolution of IT architectures, described in Chapter 2, means that the data is no longer processed by a few big monolith systems, but rather by a group of services In parallel to the processing layer, the underlying data storage has also changed and became more distributed This, in turn, required a significant paradigm shift as the traditional approach to transactions (ACID) could no longer be supported. On top of this, cloud computing is becoming a major approach with the benefits of reducing costs and providing on-demand scalability but at the same time introducing concerns about privacy, data ownership, etc In the meantime the Internet continues its exponential growth: Every day both structured and unstructured data is published and available for processing: To achieve competitive advantage companies have to relate their corporate resources to external services, e.g. financial markets, weather forecasts, social media, etc While several of the sites provide some sort of API to access the data in a more orderly fashion; countless sources require advanced web mining and Natural Language Processing (NLP) processing techniques: Advances in science push researchers to construct new instruments for observing the universe O conducting experiments to understand even better the laws of physics and other domains. Every year humans have at their disposal new telescopes, space probes, particle accelerators, etc These instruments generate huge streams of data, which need to be stored and analyzed. The constant drive for efficiency in the industry motivates the introduction of new automation techniques and process optimization: This could not be done without analyzing the precise data that describe these processes. As more and more human tasks are automated, machines provide rich data sets, which can be analyzed in real-time to drive efficiency to new levels. Finally, it is now evident that the growth of the Internet of Things is becoming a major source of data. More and more of the devices are equipped with significant computational power and can generate a continuous data stream from their sensors. In the subsequent sections of this chapter, we will look at the domains described above to see what they generate in terms of data sets. We will compare the volumes but will also look at what is characteristic and important from their respective points of view. 3.1 The Internet is undoubtedly the largest database ever created by humans. While several well described; cleaned, and structured data sets have been made available through this medium, most of the resources are of an ambiguous, unstructured, incomplete or even erroneous nature. Still, several examples in the areas such as opinion mining, social media analysis, e-governance, etc, clearly show the potential lying in these resources. Those who can successfully mine and interpret the Internet data can gain unique insight and competitive advantage in their business An important area of data analytics on the edge of corporate IT and the Internet is Web Analytics.", "example_title": "data science textbook"}, {"text": "Transformer-based models have shown to be very useful for many NLP tasks. However, a major limitation of transformers-based models is its O(n^2)O(n 2) time & memory complexity (where nn is sequence length). Hence, it's computationally very expensive to apply transformer-based models on long sequences n > 512n>512. Several recent papers, e.g. Longformer, Performer, Reformer, Clustered attention try to remedy this problem by approximating the full attention matrix. You can checkout \ud83e\udd17's recent blog post in case you are unfamiliar with these models.\nBigBird (introduced in paper) is one of such recent models to address this issue. BigBird relies on block sparse attention instead of normal attention (i.e. BERT's attention) and can handle sequences up to a length of 4096 at a much lower computational cost compared to BERT. It has achieved SOTA on various tasks involving very long sequences such as long documents summarization, question-answering with long contexts.\nBigBird RoBERTa-like model is now available in \ud83e\udd17Transformers. The goal of this post is to give the reader an in-depth understanding of big bird implementation & ease one's life in using BigBird with \ud83e\udd17Transformers. But, before going into more depth, it is important to remember that the BigBird's attention is an approximation of BERT's full attention and therefore does not strive to be better than BERT's full attention, but rather to be more efficient. It simply allows to apply transformer-based models to much longer sequences since BERT's quadratic memory requirement quickly becomes unbearable. Simply put, if we would have \u221e compute & \u221e time, BERT's attention would be preferred over block sparse attention (which we are going to discuss in this post).\nIf you wonder why we need more compute when working with longer sequences, this blog post is just right for you!\nSome of the main questions one might have when working with standard BERT-like attention include:\nDo all tokens really have to attend to all other tokens? Why not compute attention only over important tokens? How to decide what tokens are important? How to attend to just a few tokens in a very efficient way? In this blog post, we will try to answer those questions.\nWhat tokens should be attended to? We will give a practical example of how attention works by considering the sentence 'BigBird is now available in HuggingFace for extractive question answering'. In BERT-like attention, every word would simply attend to all other tokens.\nLet's think about a sensible choice of key tokens that a queried token actually only should attend to by writing some pseudo-code. Will will assume that the token available is queried and build a sensible list of key tokens to attend to.\n>>> # let's consider following sentence as an example >>> example = ['BigBird', 'is', 'now', 'available', 'in', 'HuggingFace', 'for', 'extractive', 'question', 'answering']\n>>> # further let's assume, we're trying to understand the representation of 'available' i.e. >>> query_token = 'available' >>> # We will initialize an empty `set` and fill up the tokens of our interest as we proceed in this section. >>> key_tokens = [] # => currently 'available' token doesn't have anything to attend Nearby tokens should be important because, in a sentence (sequence of words), the current word is highly dependent on neighboring past & future tokens. This intuition is the idea behind the concept of sliding attention.", "example_title": "bigbird blog intro"}, {"text": "The majority of available text summarization datasets include short-form source documents that lack long-range causal and temporal dependencies, and often contain strong layout and stylistic biases. While relevant, such datasets will offer limited challenges for future generations of text summarization systems. We address these issues by introducing BookSum, a collection of datasets for long-form narrative summarization. Our dataset covers source documents from the literature domain, such as novels, plays and stories, and includes highly abstractive, human written summaries on three levels of granularity of increasing difficulty: paragraph-, chapter-, and book-level. The domain and structure of our dataset poses a unique set of challenges for summarization systems, which include: processing very long documents, non-trivial causal and temporal dependencies, and rich discourse structures. To facilitate future work, we trained and evaluated multiple extractive and abstractive summarization models as baselines for our dataset.", "example_title": "BookSum Abstract"}], "inference": {"parameters": {"max_length": 64, "min_length": 8, "no_repeat_ngram_size": 3, "early_stopping": true, "repetition_penalty": 3.5, "length_penalty": 0.3, "encoder_no_repeat_ngram_size": 3, "num_beams": 4}}, "model-index": [{"name": "pszemraj/led-large-book-summary", "results": [{"task": {"type": "summarization", "name": "Summarization"}, "dataset": {"name": "kmfoda/booksum", "type": "kmfoda/booksum", "config": "kmfoda--booksum", "split": "test"}, "metrics": [{"type": "rouge", "value": 31.7308, "name": "ROUGE-1", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiNjJmZjMxYTY0OGU3MzNjNmIzNmYyODNlNDg2ZGRhZDAzNTMwMDM5YWMxODc1OTc1ZWE3MzM2OTg1ODFhZDBkNCIsInZlcnNpb24iOjF9.B8BCKgySYVZW910_1zP0LfCpQYJbAe6loyWut76JlgZb2kV1_x9ybqtNESX0ka-lNqhYyXUNDpuS-7pTmsJVDg"}, {"type": "rouge", "value": 5.3311, "name": "ROUGE-2", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiYzViMmY4ODFjYTc5ODk5MmRhMDQ3ZDRiYWQwMDg0OTk3ZTA4NDAxYTNiNDgyMmI4NDA3ZDMwYWViOTBkODBjNyIsInZlcnNpb24iOjF9.MOhJLDcgvv93mVFL1igIgIiTAH3b2Xa4gmBObq7RF44Mmu8Kxtd1KP7rOlDVFOrtrsooGPGsyE1GMCQ2kqeMDg"}, {"type": "rouge", "value": 16.1465, "name": "ROUGE-L", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiNzNjMzEwMTliZGE3ZmQ4M2UxMDAyMTY3YzJjZmMyMDYyN2YyNDM0N2VhNzI1MDc1YTg4MTRjMmEzNjVkNTk1NCIsInZlcnNpb24iOjF9.XLJ-DVKiYLlbw5E5rWADKbzUzf5fNHhlTCWPCC5dU4NI9Yeh76aR7TPt36ZzLDwTBknnR8KHqlaF8F8YAvBUAg"}, {"type": "rouge", "value": 29.0883, "name": "ROUGE-LSUM", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiMTcwNzEwMmE5NjQxZTkzYmQyZDZmNzllYzYyNGI5OTMyNWMwNjdiM2I2YmM5YjdmY2E5OWQ3OTk3ZDA1MTc3YyIsInZlcnNpb24iOjF9.d6rFxjCB6RJNI_pn2DNNSjuZe4rdvj0RatkaTJRp5lP0F_AFfU5Zn9zRWzZJV7V-xMauIc4UhfdoLp9r_-CABA"}, {"type": "loss", "value": 4.815707206726074, "name": "loss", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiNTMwMTgxMmJkODY3MjkzOWJhMzJhOTIxMWVkODhjZmM0MWUzMWQ1N2JkZjRhOTQxNmU1YWVjYzQ0MDNlZWI3OSIsInZlcnNpb24iOjF9.mkBQHYhYFfDV6F4klXGJ1dSsF-pbCs-6F9zcw6IYznwmXUjtk7m5J4Zt4JAju5LKz4YizvEcUCl_L0WddnfvDA"}, {"type": "gen_len", "value": 154.9036, "name": "gen_len", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiMTc0ZmM1ZDM4MDE0MzY3MDM3OWJhNDkzZjJkZDdkMjU5M2JmMDJjYTIxODA1OTllNmY5ZWQzZDlmNWFiYzk4NiIsInZlcnNpb24iOjF9.VQ_O_xSTz870tnM08PJXQOwg9OsNNwI_HVX4S7AuW57_FzGGyRaWSuGE5SWzRS4Tur9YP0QxV4VV0Yoaoi3IAA"}]}, {"task": {"type": "summarization", "name": "Summarization"}, "dataset": {"name": "samsum", "type": "samsum", "config": "samsum", "split": "test"}, "metrics": [{"type": "rouge", "value": 33.4484, "name": "ROUGE-1", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiNTk4Yjg1YTc4YmY0MzBiZDU4ZjFhNzI4MjZkMWU1MzBlOWNlMjQ5ODMzY2YzYzRhYjJkMGUzNmI3ZjdkMzIzZSIsInZlcnNpb24iOjF9.AqS8A1OUiM0IZFBEGirv5F3Novk8lSUYSfPc3bYWLA6t-W7wgup3qA207eGbE5j9CkDWZ7QrSG1U6Z9A0sOqAA"}, {"type": "rouge", "value": 10.4249, "name": "ROUGE-2", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiN2U4NjUyNTFmOGM5OTlhZDMyMTlmM2E4OWI2NGFiMDAyMGJjMzRjNWNlMGEyYWFmNTE5ZWMxM2I0ZGZmNWNmOCIsInZlcnNpb24iOjF9.SgJcHJ4qoRWXFvFiwv1PUutWktvsxQNynVPEv-GtBgxd6WI7o561ONyco5U-5tcyE_1SbSCJzz-L-R-q3cvoDA"}, {"type": "rouge", "value": 24.5802, "name": "ROUGE-L", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiZmQ5MDI5MzdiNGE5NDM0MmU5OThmZTBkNjkxMzg5N2IxNGVlODdhZTZhNjg3NzFjYWEyMzA3MTQxNjMyMjRkOCIsInZlcnNpb24iOjF9.Bg5dHqCcJjmxa-xGWNR5lD9g3quX7lKkH0pjiTd2xE5WiPoLLN2c0mYa2GovdW7__WnYwhhHC7es03jmvyZbCw"}, {"type": "rouge", "value": 29.8226, "name": "ROUGE-LSUM", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiNGFhOTEwNGM1MmZkNDk2ZjQ1Y2MyNjM3MGI5MGY3MWVkM2I0MjU2NWFiYmEwMjE4MTJlZWIwOGQ2MjQ3YjgzYSIsInZlcnNpb24iOjF9.W_aQKs10oXQdKEczJBGM3iiwJgb-VaXTpyA3sGof5WbhHf9vITAQA-xvynh5LgKtXQ1zjx737hnHgjEsu_Y0Cw"}, {"type": "loss", "value": 4.176078796386719, "name": "loss", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiN2JhODQ5YTZkNDZkZGYyNGU2MzkxMWU5MTEwMGM2YmVjZTA5YzI5NTMxMDNhYjhlOTAxMzFiMDYwYmM0MjEzZCIsInZlcnNpb24iOjF9.OvZrPBOR5jhkoTGBgsInkH7j3_xpacXHDoT7UIXEnyXzadfBO-O-K6fjalLNZw8wSkbjHIFcL_6S_qTTxPsNAQ"}, {"type": "gen_len", "value": 65.4005, "name": "gen_len", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiM2NhYjc3ZjQzNDEwYmMzOTM0ODkyZTJhZWNhNzZhYmEyZTYxMzA2YTYzMWFjOTA5ZjlhYWMzODg3NzY1ZTUwYSIsInZlcnNpb24iOjF9.vk9bgmtQFeRwdY3VXjtrJr_5wUCIeoAkI3kO0cHxhxmJo6RvUnyXiut72FuB-mlLZvqgiNkaZ-u_bh0Z3DjuCw"}]}, {"task": {"type": "summarization", "name": "Summarization"}, "dataset": {"name": "billsum", "type": "billsum", "config": "default", "split": "test"}, "metrics": [{"type": "rouge", "value": 40.5843, "name": "ROUGE-1", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiNTVjMDkyMWZjYTQ0NzgzNGUxZjNiMTg3NjU1MWJlNTQ2MWQ1NjE1MDk1OTU4ZjJiNGQ5ODg3Y2VlMWUyMzllNyIsInZlcnNpb24iOjF9.OhqBcVIuHk7fzmdrsWMvUe1bLeVMZVstZUoZpP7C1vR-3aIDl7r6eBmPrt5w-KcNq5p4teNPBsq7oKzbd5ZgDQ"}, {"type": "rouge", "value": 17.3401, "name": "ROUGE-2", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiNGQxYmQzMmE0OTcyNTM5NmMwNjIxNzYxZDcwMDFkYzJkOWY4YWY3NTdhZGRhZDdlMDAxNzcwODQ5OGM3Mzc1MCIsInZlcnNpb24iOjF9.Pksn25EEqvmx757N7Swrd4yXc_xU7-AMN9yNe8lrbBa-l1LoI_2PUASvnjML4f705cfuyMAfb0FkFp5WfER2AA"}, {"type": "rouge", "value": 25.1256, "name": "ROUGE-L", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiMjhjYzI5MDBiMjk2NTY3MDNmZTdiOGYwMTRlYjIwZjAwMjdlNTAyYzdhYTJlODQ4MjYzYmQ3MjRlYTA2YzhhZSIsInZlcnNpb24iOjF9.1jPepsweS2bzIqDverQzzhmhFGch7gpoEGFGqQ8zW7K10aUKWFX8lt-uZAmTa1Z5ZhzyXGBzc3dReFPhWRRJBg"}, {"type": "rouge", "value": 34.6619, "name": "ROUGE-LSUM", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiM2VkZDIxNWJjOTA0NzFjOTIwOTdjYjc1M2EyNDVjZjY2ZjY3MjIxNDk3YTc5YWExNzAwN2FhOTc1NjVhYjBkYiIsInZlcnNpb24iOjF9.8opqHSUckPohoSF9jfPTpXDz2AtDwvdMqOdIXx2kE1tkOcbLPbOBfcc8RhRR98y8S26yC6EYFhFnf03CV2ejAQ"}, {"type": "loss", "value": 4.792657375335693, "name": "loss", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiYTY5ZTRkMGU3OGVkODMzMDU5OWE1NTM5YjA4NDliZDlmNzc2NzZjNjFmNTA3M2EwY2NmN2E0MWJmZjQ5ZDliMiIsInZlcnNpb24iOjF9.KCKdk8xt2NWcMmYKV3-9eVEsFm9MqGllSMu9QCFJFIQlnyNXllHKdBLouoaGQz8IRYXvZKH8_TLDPIQx-31jAg"}, {"type": "gen_len", "value": 163.9394, "name": "gen_len", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiYzdkZDYyZGUzYmFkZmI2NjUwYmQ0MzZjMmIyZjI1YTFiMzM4OThiZjBiMzljOTVkZTgwMjA0NTE5OGM2YmFjMiIsInZlcnNpb24iOjF9.XyMZLUdkUIF32KTJMuv_bJswQCx_Tfg4Fx823cURUixSeoIKps8_a634AreZ3Z8kb7bfE_sFGh3rM9KWsMxlDw"}]}, {"task": {"type": "summarization", "name": "Summarization"}, "dataset": {"name": "multi_news", "type": "multi_news", "config": "default", "split": "test"}, "metrics": [{"type": "rouge", "value": 39.0834, "name": "ROUGE-1", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiNjYzMmVlMDM4MTNkMTI4MjAyMTU2YTg1ZWQwNTI1MmJlNGUwZmE1NTRmYTljZTQwY2RlMjcxOTgyZGMyYTc0ZiIsInZlcnNpb24iOjF9.6yuSr7UmsFatwqQ-mEO4gmsEtWI05kGB5Ib2pnl05H1OiPT2uUwmqdUytUw8KTx9u1jv9q0cTF1cL-n2kPEJAA"}, {"type": "rouge", "value": 11.4043, "name": "ROUGE-2", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiMWI5N2U2ZWI1ODM2MWUwOTIzYTAzNmRhNDA2OWEzZWRjMGEzMjBmY2EwN2YyYzU1NWE0YjIyZDE3MWE0MmMxZCIsInZlcnNpb24iOjF9.wonuxbBl25TzEaHUH_E816nHJ1OSXKfkaq7eJzbLpsfeGwcDklxUSxZxRO7VBiBMaY3Qttf9ywmEIPp40HnpBA"}, {"type": "rouge", "value": 19.1813, "name": "ROUGE-L", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiZjU1NDZhN2NkMzZiZGJkODE4NDZiYjViOTZkNGMyNDlkNjBlZmFjYzU1N2IzMjFjYjY1MDU1Zjk2MzA0M2U4NyIsInZlcnNpb24iOjF9.bTCRzv3J9NiCh4aV23tAWGTvrdQCv_RS40zGwC4AJXtGS40cY7tJHYwBf9U9_rCetDBxqfjJpdaUbCAOglxLAA"}, {"type": "rouge", "value": 35.1581, "name": "ROUGE-LSUM", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiMDNhNTUyZjE4NjYxYjIzYThmMDM2YWNhM2QwYzY1ODI2ZTE3NmNjMmVhOTAzZjZlOWQwYzc1NzU2NDNjNzIxMyIsInZlcnNpb24iOjF9.cWlSbEBgrMN5D-fV_yL9geNMyMkIItcVO3wehNJPzFi3E0v1-4q8pnX-UgjLzto8X7JLi6as2V_HtZE4-C-CDw"}, {"type": "loss", "value": 4.654905319213867, "name": "loss", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiYTc5Nzk0ODhiNWUzNTAxNzk2YzZmMjU2NDliY2UzOTYyYTdmZGEyYjI5NDNhOTE0MGUxOTgxMGVjMmNhM2UyMSIsInZlcnNpb24iOjF9.eBBAebcl3AwkrjR6a8BvoSjDfpw8LWTRFjyIFHVzspvoOKVfnO8_NB_UeR_K127OwXyoZ70Z7X_aKJOe-2kTDA"}, {"type": "gen_len", "value": 186.2494, "name": "gen_len", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiOWI2NjVlYjgwYWJiMjcyMDUzMzEwNDNjZTMxMDM0MjAzMzk1ZmIwY2Q1ZDQ2Y2M5NDBlMDEzYzFkNWEyNzJmNiIsInZlcnNpb24iOjF9.iZ1Iy7FuWL4GH7LS5EylVj5eZRC3L2ZsbYQapAkMNzR_VXPoMGvoM69Hp-kU7gW55tmz2V4Qxhvoz9cM8fciBA"}]}]}]}, "description": "\n\n# Longformer Encoder-Decoder (LED) for Narrative-Esque Long Text Summarization\n\n\n \"Open\n\n\nA fine-tuned version of [allenai/led-large-16384](https://huggingface.co/allenai/led-large-16384) on the `BookSum` dataset.\n\nGoal: a model that can generalize well and is useful in summarizing long text in academic and daily usage. The result works well on lots of text and can handle 16384 tokens/batch (_if you have the GPU memory to handle that_)\n\n - See the Colab demo linked above or try the [demo on Spaces](https://huggingface.co/spaces/pszemraj/summarize-long-text)\n\n\n> Note: the API is set to generate a max of 64 tokens for runtime reasons, so the summaries may be truncated (depending on the length of input text). For best results use python as below.\n\n"} {"downloads": 616, "id": "pszemraj/pegasus-x-large-book-summary", "likes": 15, "pipeline_tag": "summarization", "task": "summarization", "meta": {"license": ["apache-2.0", "bsd-3-clause"], "tags": ["summarization", "summary", "booksum", "long-document", "long-form"], "datasets": ["kmfoda/booksum"], "metrics": ["rouge"], "languages": "en", "widget": [{"text": "large earthquakes along a given fault segment do not occur at random intervals because it takes time to accumulate the strain energy for the rupture. The rates at which tectonic plates move and accumulate strain at their boundaries are approximately uniform. Therefore, in first approximation, one may expect that large ruptures of the same fault segment will occur at approximately constant time intervals. If subsequent main shocks have different amounts of slip across the fault, then the recurrence time may vary, and the basic idea of periodic mainshocks must be modified. For great plate boundary ruptures the length and slip often vary by a factor of 2. Along the southern segment of the San Andreas fault the recurrence interval is 145 years with variations of several decades. The smaller the standard deviation of the average recurrence interval, the more specific could be the long term prediction of a future mainshock.", "example_title": "earthquakes"}, {"text": " A typical feed-forward neural field algorithm. Spatiotemporal coordinates are fed into a neural network that predicts values in the reconstructed domain. Then, this domain is mapped to the sensor domain where sensor measurements are available as supervision. Class and Section Problems Addressed Generalization (Section 2) Inverse problems, ill-posed problems, editability; symmetries. Hybrid Representations (Section 3) Computation & memory efficiency, representation capacity, editability: Forward Maps (Section 4) Inverse problems Network Architecture (Section 5) Spectral bias, integration & derivatives. Manipulating Neural Fields (Section 6) Edit ability, constraints, regularization. Table 2: The five classes of techniques in the neural field toolbox each addresses problems that arise in learning, inference, and control. (Section 3). We can supervise reconstruction via differentiable forward maps that transform Or project our domain (e.g, 3D reconstruction via 2D images; Section 4) With appropriate network architecture choices, we can overcome neural network spectral biases (blurriness) and efficiently compute derivatives and integrals (Section 5). Finally, we can manipulate neural fields to add constraints and regularizations, and to achieve editable representations (Section 6). Collectively, these classes constitute a 'toolbox' of techniques to help solve problems with neural fields There are three components in a conditional neural field: (1) An encoder or inference function \u20ac that outputs the conditioning latent variable 2 given an observation 0 E(0) =2. 2 is typically a low-dimensional vector, and is often referred to aS a latent code Or feature code_ (2) A mapping function 4 between Z and neural field parameters O: Y(z) = O; (3) The neural field itself $. The encoder \u20ac finds the most probable z given the observations O: argmaxz P(2/0). The decoder maximizes the inverse conditional probability to find the most probable 0 given Z: arg- max P(Olz). We discuss different encoding schemes with different optimality guarantees (Section 2.1.1), both global and local conditioning (Section 2.1.2), and different mapping functions Y (Section 2.1.3) 2. Generalization Suppose we wish to estimate a plausible 3D surface shape given a partial or noisy point cloud. We need a suitable prior over the sur- face in its reconstruction domain to generalize to the partial observations. A neural network expresses a prior via the function space of its architecture and parameters 0, and generalization is influenced by the inductive bias of this function space (Section 5).", "example_title": "scientific paper"}, {"text": "Is a else or outside the cob and tree written being of early client rope and you have is for good reasons. On to the ocean in Orange for time. By's the aggregate we can bed it yet. Why this please pick up on a sort is do and also M Getoi's nerocos and do rain become you to let so is his brother is made in use and Mjulia's's the lay major is aging Masastup coin present sea only of Oosii rooms set to you We do er do we easy this private oliiishs lonthen might be okay. Good afternoon everybody. Welcome to this lecture of Computational Statistics. As you can see, I'm not socially my name is Michael Zelinger. I'm one of the task for this class and you might have already seen me in the first lecture where I made a quick appearance. I'm also going to give the tortillas in the last third of this course. So to give you a little bit about me, I'm a old student here with better Bulman and my research centres on casual inference applied to biomedical disasters, so that could be genomics or that could be hospital data. If any of you is interested in writing a bachelor thesis, a semester paper may be mastathesis about this topic feel for reach out to me. you have my name on models and my email address you can find in the directory I'd Be very happy to talk about it. you do not need to be sure about it, we can just have a chat. So with that said, let's get on with the lecture. There's an exciting topic today I'm going to start by sharing some slides with you and later on during the lecture we'll move to the paper. So bear with me for a few seconds. Well, the projector is starting up. Okay, so let's get started. Today's topic is a very important one. It's about a technique which really forms one of the fundamentals of data science, machine learning, and any sort of modern statistics. It's called cross validation. I know you really want to understand this topic I Want you to understand this and frankly, nobody's gonna leave Professor Mineshousen's class without understanding cross validation. So to set the stage for this, I Want to introduce you to the validation problem in computational statistics. So the problem is the following: You trained a model on available data. You fitted your model, but you know the training data you got could always have been different and some data from the environment. Maybe it's a random process. You do not really know what it is, but you know that somebody else who gets a different batch of data from the same environment they would get slightly different training data and you do not care that your method performs as well. On this training data. you want to to perform well on other data that you have not seen other data from the same environment. So in other words, the validation problem is you want to quantify the performance of your model on data that you have not seen. So how is this even possible? How could you possibly measure the performance on data that you do not know The solution to? This is the following realization is that given that you have a bunch of data, you were in charge. You get to control how much that your model sees. It works in the following way: You can hide data firms model. Let's say you have a training data set which is a bunch of doubtless so X eyes are the features those are typically hide and national vector. It's got more than one dimension for sure. And the why why eyes. Those are the labels for supervised learning. As you've seen before, it's the same set up as we have in regression. And so you have this training data and now you choose that you only use some of those data to fit your model. You're not going to use everything, you only use some of it the other part you hide from your model. And then you can use this hidden data to do validation from the point of you of your model. This hidden data is complete by unseen. In other words, we solve our problem of validation.", "example_title": "transcribed audio - lecture"}, {"text": "Transformer-based models have shown to be very useful for many NLP tasks. However, a major limitation of transformers-based models is its O(n^2)O(n 2) time & memory complexity (where nn is sequence length). Hence, it's computationally very expensive to apply transformer-based models on long sequences n > 512n>512. Several recent papers, e.g. Longformer, Performer, Reformer, Clustered attention try to remedy this problem by approximating the full attention matrix. You can checkout \ud83e\udd17's recent blog post in case you are unfamiliar with these models.\nBigBird (introduced in paper) is one of such recent models to address this issue. BigBird relies on block sparse attention instead of normal attention (i.e. BERT's attention) and can handle sequences up to a length of 4096 at a much lower computational cost compared to BERT. It has achieved SOTA on various tasks involving very long sequences such as long documents summarization, question-answering with long contexts.\nBigBird RoBERTa-like model is now available in \ud83e\udd17Transformers. The goal of this post is to give the reader an in-depth understanding of big bird implementation & ease one's life in using BigBird with \ud83e\udd17Transformers. But, before going into more depth, it is important to remember that the BigBird's attention is an approximation of BERT's full attention and therefore does not strive to be better than BERT's full attention, but rather to be more efficient. It simply allows to apply transformer-based models to much longer sequences since BERT's quadratic memory requirement quickly becomes unbearable. Simply put, if we would have \u221e compute & \u221e time, BERT's attention would be preferred over block sparse attention (which we are going to discuss in this post).\nIf you wonder why we need more compute when working with longer sequences, this blog post is just right for you!\nSome of the main questions one might have when working with standard BERT-like attention include:\nDo all tokens really have to attend to all other tokens? Why not compute attention only over important tokens? How to decide what tokens are important? How to attend to just a few tokens in a very efficient way? In this blog post, we will try to answer those questions.\nWhat tokens should be attended to? We will give a practical example of how attention works by considering the sentence 'BigBird is now available in HuggingFace for extractive question answering'. In BERT-like attention, every word would simply attend to all other tokens.\nLet's think about a sensible choice of key tokens that a queried token actually only should attend to by writing some pseudo-code. Will will assume that the token available is queried and build a sensible list of key tokens to attend to.\n>>> # let's consider following sentence as an example >>> example = ['BigBird', 'is', 'now', 'available', 'in', 'HuggingFace', 'for', 'extractive', 'question', 'answering']\n>>> # further let's assume, we're trying to understand the representation of 'available' i.e. >>> query_token = 'available' >>> # We will initialize an empty `set` and fill up the tokens of our interest as we proceed in this section. >>> key_tokens = [] # => currently 'available' token doesn't have anything to attend Nearby tokens should be important because, in a sentence (sequence of words), the current word is highly dependent on neighboring past & future tokens. This intuition is the idea behind the concept of sliding attention.", "example_title": "bigbird blog intro"}, {"text": "To be fair, you have to have a very high IQ to understand Rick and Morty. The humour is extremely subtle, and without a solid grasp of theoretical physics most of the jokes will go over a typical viewer's head. There's also Rick's nihilistic outlook, which is deftly woven into his characterisation- his personal philosophy draws heavily from Narodnaya Volya literature, for instance. The fans understand this stuff; they have the intellectual capacity to truly appreciate the depths of these jokes, to realise that they're not just funny- they say something deep about LIFE. As a consequence people who dislike Rick & Morty truly ARE idiots- of course they wouldn't appreciate, for instance, the humour in Rick's existential catchphrase 'Wubba Lubba Dub Dub,' which itself is a cryptic reference to Turgenev's Russian epic Fathers and Sons. I'm smirking right now just imagining one of those addlepated simpletons scratching their heads in confusion as Dan Harmon's genius wit unfolds itself on their television screens. What fools.. how I pity them. \ud83d\ude02\nAnd yes, by the way, i DO have a Rick & Morty tattoo. And no, you cannot see it. It's for the ladies' eyes only- and even then they have to demonstrate that they're within 5 IQ points of my own (preferably lower) beforehand. Nothin personnel kid \ud83d\ude0e", "example_title": "Richard & Mortimer"}], "parameters": {"max_length": 48, "min_length": 2, "no_repeat_ngram_size": 3, "encoder_no_repeat_ngram_size": 3, "early_stopping": true, "length_penalty": 0.1, "num_beams": 2}, "model-index": [{"name": "pszemraj/pegasus-x-large-book-summary", "results": [{"task": {"type": "summarization", "name": "Summarization"}, "dataset": {"name": "samsum", "type": "samsum", "config": "samsum", "split": "test"}, "metrics": [{"type": "rouge", "value": 33.1401, "name": "ROUGE-1", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiYjQ1NjY1OGVjYWEwMzBjMzk3ZmMyZDA0ZTcxOTdmZTUxNTc0OGYxYmY3MzJkMzFmYTVjNzU2ZTk4MzE0NWMzMSIsInZlcnNpb24iOjF9.PSHB6DMF6tkwSw5nsFE57a2ApRAy_tkS6ziKA6PSTWddEdaqfca4pfig6_olmRmcS4KxN6HHcsmioHzv4LJQBw"}, {"type": "rouge", "value": 9.3095, "name": "ROUGE-2", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiMzk3MTA3NmY1OGE3MzFjZTJhYWYzNGU4NTUzMTgwM2Y1NWZjMmEyNDNmNmEzYmQzZThjOGExMjc2ZjAyZjMzZCIsInZlcnNpb24iOjF9.tfgp8p-WlkVrfducTSg4zs-byeZMCmdZw1aizPQHXm_qRAwGtKcuVkZcmza5Y3o3VqsAEmGzg5HQD1vnZvWIDA"}, {"type": "rouge", "value": 24.8552, "name": "ROUGE-L", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiOTVmMTIwNDQwNTI4MmI2MmY1ODc1Mjk0NGQ5ZWE4ZTYzOGNkMjY2ZmJhMjg2MTZlNTdhYTA2ZDAxNTFjMjA2MSIsInZlcnNpb24iOjF9.9HLgy9842oIDm6ABb3L94R1P4zAqTI0QN8aP62xzIyDxUXTbWw68PEDufYLiBJbTgZ8ElopZ9I7aou2zCgXeAA"}, {"type": "rouge", "value": 29.0391, "name": "ROUGE-LSUM", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiMmNhYWJjYjdjMzMxMmE4ZTE4NGEzMDdmZDZjODI5ZWRjZWJmYTEyZGIzYWQ2NjM3YzQ4MjI4ZTM4MmU5MzRjZSIsInZlcnNpb24iOjF9.d2yoVdmxjVJnsgIYFiLuaBO5Krgw4Axl5yeOSTKrvHygrAxoqT1nl4anzQiyoR3PwYBXwBkwmgpJUfZ7RNXtDQ"}, {"type": "loss", "value": 2.288182497024536, "name": "loss", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiYzM5NGIwODMxOTA3MTY3ODc2ZDczYTNmMTMwM2QyZmNlZjFmZDJjMGY3NWNkMDEyYzA4OTA2ZDRiODY3Zjg4OCIsInZlcnNpb24iOjF9.8k9mC050OS7mQSR9oA8liDRDQvEx1VxmTXGLmDYJVYYtTh2HYJFGP8Vy_krocFRIYDxh-IHPEOOSr5NrLMWHBA"}, {"type": "gen_len", "value": 45.2173, "name": "gen_len", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiNWZhNzQ5OTQ5Yjg5YjhlOTZiZmJhZjZiODNmY2E2OTg4YTg4NWVhYzRkNzM2Mzk4NzdlMDgxM2M4NjY2YzhhYSIsInZlcnNpb24iOjF9.tDEEsPUclZDygAdGhNrBGrF24vR8ao08Nw7hmtUt5lmSZZZK_u-8rpz97QgVS6MCJdjFVnbYC4bkFnlQWI_FAA"}]}, {"task": {"type": "summarization", "name": "Summarization"}, "dataset": {"name": "launch/gov_report", "type": "launch/gov_report", "config": "plain_text", "split": "test"}, "metrics": [{"type": "rouge", "value": 39.7279, "name": "ROUGE-1", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiOTAxODk3OTUwMTIzODU3NzU2YzAzZjE2NTM3MzBjNDA0ZWRmZGU3NWUzNTg1YThhNDQ1NjQ5ZmM3OWI2YzBhNSIsInZlcnNpb24iOjF9.vnNKucBNt2-nIyODj9P2HeaWPX5AQR8L-DL8QzrO7kj58-vZnjT6hsAGmepRNzdZ1TLF-3j2J2plcNJ8lUO8Dg"}, {"type": "rouge", "value": 10.8944, "name": "ROUGE-2", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiNjYzMmIxOTJmZjkxOGI5N2U0NTRmMmQwOGJhMzMxYWIzMWMzYzUwMDEyMDdiZDQ2YTUzOWU0OTViMTI2YTAwYiIsInZlcnNpb24iOjF9.De0PaAikWqfWpoIXTCYP-mSFu3PUATLX08Qq74OHXM8784heFVDX1E1sXlh_QbbKJbuMuZtTKM4qr7oLUizOAw"}, {"type": "rouge", "value": 19.7018, "name": "ROUGE-L", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiYzI3MjQzOGQ3MGE3NDNkZTEyMWRkYjUyYTYzNDEwOWVjMGFmNTBiZjE4ZTBhMGYzMmI1Yzk0YjBmYmIzMWMxZSIsInZlcnNpb24iOjF9.FVikJ5Ma0gUgM-tpbomWXnC4jtmvhxqikPqCk84t4IbIdU0CIYGTQEONiz-VqI0fJeNrnTS6lxpBv7XxKoq3BQ"}, {"type": "rouge", "value": 36.5634, "name": "ROUGE-LSUM", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiOTI2OTVmNDZiZWE5ZjNkODIwZjJiNTU2ZjJjYjczODUwM2JiNDEzYmE3N2U5YWM5NzJjOWEzMmYzZjdlYWJmYyIsInZlcnNpb24iOjF9.poR4zcqRvdaierfWFdTa53Cv6ZbNbnRwyRTi9HukHF5AWAQgc6zpBLkwOYFYoWjuSH83ohWeMM3MoIdw3zypBw"}, {"type": "loss", "value": 2.473011016845703, "name": "loss", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiNDFmMjg3NWQ2YTMxMTc1OGZiYWYzNjg5NDY3MWE4MjY5ZDQxZDZhZGI1OTc5MzZkZGEzYmVlNWFiMzZjNDdhNCIsInZlcnNpb24iOjF9.05nKB3SmEfFKSduJqlleF4Fd2_IhwJS8eTOrnzZYCQQfLCfpJAZLhp3eLQCuBY4htd-FNrZftrThL66zVxyrCQ"}, {"type": "gen_len", "value": 212.8243, "name": "gen_len", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiOGNjMTg4ZDZlZjAxZGNhN2M0NWI0ZTA0OWEzNDkzNDAzOTJhODA2MmVkODI4YjYzN2FiOTU1ZDMwM2VlNWMyYyIsInZlcnNpb24iOjF9.WYx6XJFKokY2heoN-jpAMp1Z1gsyJus3zpktQgNd0FOYJxOUqW40A0kkHtd15y4dUhsbccLpuJGY1fNJgHOiDw"}]}, {"task": {"type": "summarization", "name": "Summarization"}, "dataset": {"name": "billsum", "type": "billsum", "config": "default", "split": "test"}, "metrics": [{"type": "rouge", "value": 42.1065, "name": "ROUGE-1", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiZDJhNDM2MWEwMjJlYjRmZTVkYzljODcwMzlmMGUxMDA4ZmRjNjM0NmY3ZWJlMmZjNGI3NDQ3NTQyOTQ3MjBkNSIsInZlcnNpb24iOjF9.l1MiZbXyFyXAcsfFChMrTvSaBhzBR6AuDnBuII8zY3Csz3ShWK0vo09MkQdZ1epe8PKWV9wwUBuJyKk3wL7MDw"}, {"type": "rouge", "value": 15.4079, "name": "ROUGE-2", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiNTY3NDBkYTVkNjdhY2I0ZmY0NTA4YzVkMGE5YWE5ODdjOGE1MDhkOTJhOWY3NmI2ZWI1MGU2MGI1NDRlYjI3MSIsInZlcnNpb24iOjF9.VN-5eK2SzFDCJnFTHHu7XCU_lynaxW_JEDc3llmcNo_ffDgRmISHHGaqV7fPFymBBMXpPly7XblO_sukyqj1Cg"}, {"type": "rouge", "value": 24.8814, "name": "ROUGE-L", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiZDYyNGZmNDY3MTY4YzI4ZjZhODE0NGIyN2ZkOGEyYzM3MWZjM2QzZTg5ZjNmZmYzZDE5NzhiZDQ4OGM1YjNiMyIsInZlcnNpb24iOjF9.L73M1M5XdMQkf8zSdfLN0MUrxtO0r6UiLjoOkHfrIGbWNsNJ8tU5lciYFNIhJrICUL8LchCsFqR9LAClKS4bCg"}, {"type": "rouge", "value": 36.0375, "name": "ROUGE-LSUM", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiMTBlMTQ5OTQxNTA3ZmFiMGYyZWQ0MGM0ODY2YWI3MzgyNjkwNzQyM2FmNGRjMzc3MjJmZDZkOWY4M2RhZTg2MSIsInZlcnNpb24iOjF9.IiMSSVahBgH8n34bGCC_DDGpujDXQbIvGhlcpVV2EBVQLLWUqcCy5WwBdbRrxPC-asBRCNERQxj8Uii4FvPsDQ"}, {"type": "loss", "value": 1.9130958318710327, "name": "loss", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiNTg2NTMxZDE3MDg3MDFkMTYxNjY1OTc5YjQ4ODcyMGUxMTFiZjJiNDgyYWZhN2NjZmE1MDQ1NTRmZGY0NjQzZSIsInZlcnNpb24iOjF9.kADUBMO8i6-oGDDt1cOiGMrGcMkF_Qc1jSpS2NSFyksDRusQa_YuuShefF4DuHVEr3CS0hNjjRH9_JBeX9ZQDg"}, {"type": "gen_len", "value": 179.2184, "name": "gen_len", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiNjM4NGNiMTY3YzZjMzg4MTRiMDdiZDFiMzA1ZDIyMDM2MDk1OWRhYWQzN2UxZDNlODIxOWVhY2JlYjk4Mjk5YyIsInZlcnNpb24iOjF9.nU8ImMNWgjg9BKjUBJQLFaJOBq3kyIne8ldlpL0OV0e4888wOntIAcJP0dCCYfRSLVmZuXQ1M8cpDuTf50hNCw"}]}, {"task": {"type": "summarization", "name": "Summarization"}, "dataset": {"name": "kmfoda/booksum", "type": "kmfoda/booksum", "config": "kmfoda--booksum", "split": "test"}, "metrics": [{"type": "rouge", "value": 35.2154, "name": "ROUGE-1", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiZWQ5MGMzNDc4MDBiNmRiNDY5ZDM4N2QzYTJlYTNiYTcwNDBlMzdlM2I4N2VmM2ZjMmQ3NGU3OTRlMTMzMTg3NyIsInZlcnNpb24iOjF9.E55gu7HvMwc4HejF3YOD6yqQJj7_6GCoCMWm78sY5_w2glR-oM98tu9IsG27VaPva7UklxsspzT2DIVaVKY0CQ"}, {"type": "rouge", "value": 6.8702, "name": "ROUGE-2", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiZjFhN2JlYzlmMGZmYzkwYjBlNjY4YzhlYzNmMTdmZWYyYmU3NWI0ZTRkMTgxNmRiM2EyZWMyMWFjY2JkNzg1MCIsInZlcnNpb24iOjF9.I9BoHbGt8LLNtLAssIXm9tQ4lHqFCMt0zJS_zTezzxGRMS5On71c3jnlzrDtwEm6wjmZEwYIJK8qqJh-Qa5YAA"}, {"type": "rouge", "value": 17.6693, "name": "ROUGE-L", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiOGZlZjcwOTZjMmNjZWFkM2M5Zjg1OTgzMzcxOTM2Y2RkMzY4NGU2NDE2MTVjMjcyMWIwNWI4ODc0YTY3YTA2MSIsInZlcnNpb24iOjF9.Ou1C6U6PrOtXPxlk9PMucdJ_vlnVnSk94QrLJL4b_g2pcY3D80Xrw09iz4BTOPzZ2UTNBLyn8YdLY3m2vHpiAQ"}, {"type": "rouge", "value": 32.8365, "name": "ROUGE-LSUM", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiMmIzMGQ5MzQ1MjI4MTU0ZGZkZTRhODllNWQyOTQ4ZjA5YWE4ZTJjMzQ2ZWQzOGFiMWUzZDMxOTU5NzkxYjliZiIsInZlcnNpb24iOjF9.2mYURQZYo7e3AY0tfkpqFMNhoHvrysvBXza-XYYrX_xLpruMU9Gzrwc3jvpi2wtp4eeyhzIiZJvH0O6la6zxCg"}, {"type": "loss", "value": 2.9878039360046387, "name": "loss", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiZGU0ODBmN2I3OGFkNTFiM2I3YWQyNmUzNzUwYzEwNzczZWEwZjIxYTAwZDE2ZTIwMGE3ZGNmMDQzNTFmNjEwYyIsInZlcnNpb24iOjF9.0IKWIImKTXqysQUb2IMPk2eeHlOcBjndiPcU42nfFBMhRTqeXdBqOCP6cidlho7pVN4hsC-77ArJ9pZlbTFuBg"}, {"type": "gen_len", "value": 200.6785, "name": "gen_len", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiMDUzYTE3MmIxZGM3MWI1MjNhMTU3MTdkMjJjNjY5Y2UzYTdjYWRiY2I4MmUxMDY4NTA5NWZjYWU0NzliODdkYiIsInZlcnNpb24iOjF9.BqmCaWzbCMNUied6zNO744Dl-0LC47FCIv-l8kDjkhSkwQcb_hi93VYts5PTsrFY_MmM8j7AsY1PiFr6nNFMBQ"}]}, {"task": {"type": "summarization", "name": "Summarization"}, "dataset": {"name": "big_patent", "type": "big_patent", "config": "y", "split": "test"}, "metrics": [{"type": "rouge", "value": 37.376, "name": "ROUGE-1", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiMWI4ZjMxODcxMThiMzE3NjQ3Zjg0NzhmZjlhY2ZmYjQwMGY5ZjlkZGY1MzZmY2M5YTU4NmY1Y2NhZDA3YWFkOCIsInZlcnNpb24iOjF9.sYh4IynXgOpVetYYSWUp0v5QZWvXC1x7_uJR0LZUxaeYKEc4yfICNmDOPzNzoroaV4ELeOaPjHQpYVm-lpAHBA"}, {"type": "rouge", "value": 11.4432, "name": "ROUGE-2", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiZTZkOGIyYzU3YTQ5ZTFmMDU3MjQ5ZWM2NGQ1MzgwMDYyZDkxN2Q2YjgyZTkzMTEyYjczMGJiYmNkZmU5MTQ3NSIsInZlcnNpb24iOjF9.Qk38acpjPjU64Z1nXEuqMXjKZrGvdC9oY586EjuCPeEAJCSzKimp8FsB-1QrjMH73q6rN2CdumJUxih6HF-KAA"}, {"type": "rouge", "value": 22.2754, "name": "ROUGE-L", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiNzlmOTUxYmEzYzYyYmVjNGZlNzNiZWIwZmQ5OWVlY2U3NTBiZDExYWUwODQ0Y2ZjMmQyMTNmMTlmNjdmZWUwNCIsInZlcnNpb24iOjF9.bUVhxaepySyaityby71j6h4YO_l4x8OSeZoblagwUMYGXRc0Ej286QzEtZFeRGygMJ5sjUN_loWCtOmAnHY2BA"}, {"type": "rouge", "value": 32.5087, "name": "ROUGE-LSUM", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiNDEyNjM5NjAzYTNjN2MwZTY4MWY2Y2U5YWUyM2Y1YjAyNjBhZTM0YTAyZjM5N2M1ZDkxOWUxNzE2OWZkYTBmMSIsInZlcnNpb24iOjF9.QfMHkcoAR3xqzsgL1xjHk3Lui1xhE12pJKvYujQ_h5o6PBXT79dsENsrqDGGBjiKdTKNwWqADgaviy1VrWMDCQ"}, {"type": "loss", "value": 2.9867310523986816, "name": "loss", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiZTUzM2Q5MmE5MzU4YmFlMjFiMmUzZGU2NDAzMTQ1Y2NjZDVlYWI3NGE5MjM0NmMxMjdiOWI3MTU0NDk3NmNkZiIsInZlcnNpb24iOjF9.VoQqu6ZU3AR_cji82UkpvbLnTmZ17fZmR2E4DeonjCyTZpyyfvUsQ2nbKDovQf34DBkYXENk42EUsUF1mBZNBg"}, {"type": "gen_len", "value": 172.7776, "name": "gen_len", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiNTEzNTMyMDY1N2Q5ZTMxNjNlMTI0Nzk5ZDc1ZWQ5Y2IwZWM0NWNhNWY2MTk3YTRkYzUwMTI4NjZiOWVhOGQwYSIsInZlcnNpb24iOjF9.-Rek2VFmGqIEgqeFoxU_0aCWdFbGYi9BV5c7x-izm9_4vtZdYQ4ITXm4T8C3UlpOax60veJQt2Uax5vyiFc9Ag"}]}]}]}, "description": "\n\n# pszemraj/pegasus-x-large-book-summary\n\n\n\n \"Open\n\n\nGet SparkNotes-esque summaries of arbitrary text! Due to the model size, it's recommended to try it out in Colab (linked above) as the API textbox may time out.\n\nThis model is a fine-tuned version of [google/pegasus-x-large](https://huggingface.co/google/pegasus-x-large) on the `kmfoda/booksum` dataset for approx eight epochs.\n\n## Model description\n\nMore information needed\n\n## Intended uses & limitations\n\n- This seems to be the GPU-hungriest summarization model yet.\n\n## Training and evaluation data\n\nMore information needed\n\n## Training procedure\n\n### Training hyperparameters\n\n#### Epochs 1-4\n\nTODO\n\n#### Epochs 5 & 6\nThe following hyperparameters were used during training:\n\n- learning_rate: 6e-05\n- train_batch_size: 4\n- eval_batch_size: 1\n- seed: 42\n- distributed_type: multi-GPU\n- gradient_accumulation_steps: 32\n- total_train_batch_size: 128\n- optimizer: _ADAN_ using lucidrains' `adan-pytorch` with default betas\n- lr_scheduler_type: constant_with_warmup\n- data type: TF32\n- num_epochs: 2\n\n#### Epochs 7 & 8\n\n- epochs 5 & 6 were trained with 12288 tokens input\n- this fixes that with 2 epochs at 16384 tokens input\n\nThe following hyperparameters were used during training:\n- learning_rate: 0.0004\n- train_batch_size: 4\n- eval_batch_size: 1\n- seed: 42\n- distributed_type: multi-GPU\n- gradient_accumulation_steps: 16\n- total_train_batch_size: 64\n- optimizer: _ADAN_ using lucidrains' `adan-pytorch` with default betas\n- lr_scheduler_type: cosine\n- lr_scheduler_warmup_ratio: 0.03\n- num_epochs: 2\n\n### Framework versions\n\n- Transformers 4.22.0\n- Pytorch 1.11.0a0+17540c5\n- Datasets 2.4.0\n- Tokenizers 0.12.1\n"} {"downloads": 45, "id": "ml6team/distilbart-tos-summarizer-tosdr", "likes": 14, "pipeline_tag": "summarization", "task": "summarization", "meta": {"language": ["en"], "tags": ["summarization", "t&c", "tos", "distilbart", "distilbart-6-6"], "datasets": ["tosdr"], "metrics": ["rouge1", "rouge2", "rougel"], "inference": {"parameters": {"min_length": 5, "max_length": 512, "do_sample": false}}, "widget": [{"text": "In addition, certain portions of the Web Site may be subject to additional terms of use that we make available for your review or otherwise link to that portion of the Web Site to which such additional terms apply. By using such portions, or any part thereof, you agree to be bound by the additional terms of use applicable to such portions. Age Restrictions The Web Site may be accessed and used only by individuals who can form legally binding contracts under applicable laws, who are at least 18 years of age or the age of majority in their state or territory of residence (if higher than 18), and who are not barred from using the Web Site under applicable laws. Our Technology may not be copied, modified, reproduced, republished, posted, transmitted, sold, offered for sale, or redistributed in any way without our prior written permission and the prior written permission of our applicable licensors. Nothing in these Site Terms of Use grants you any right to receive delivery of a copy of Our Technology or to obtain access to Our Technology except as generally and ordinarily permitted through the Web Site according to these Site Terms of Use. Furthermore, nothing in these Site Terms of Use will be deemed to grant you, by implication, estoppel or otherwise, a license to Our Technology. Certain of the names, logos, and other materials displayed via the Web site constitute trademarks, tradenames, service marks or logos (\u201cMarks\u201d) of us or other entities. You are not authorized to use any such Marks. Ownership of all such Marks and the goodwill associated therewith remains with us or those other entities. Any use of third party software provided in connection with the Web Site will be governed by such third parties\u2019 licenses and not by these Site Terms of Use. Information on this Web Site may contain technical inaccuracies or typographical errors. Lenovo provides no assurances that any reported problems may be resolved with the use of any information that Lenovo provides."}]}, "description": "\n\n# T&C Summarization Model \n\nT&C Summarization Model based on [sshleifer/distilbart-cnn-6-6](https://huggingface.co/sshleifer/distilbart-cnn-6-6), \n\nThis abstractive summarization model is a part of a bigger end-to-end T&C summarizer pipeline \nwhich is preceded by LSA (Latent Semantic Analysis) extractive summarization. The extractive \nsummarization shortens the T&C to be further summarized by this model.\n\n## Finetuning Corpus\n\nWe collaborated with [TOSDR](https://tosdr.org/) to work with their data, and the model is finetuned accordingly. The article and \nsummarization text is reduced via extractive summarization before it is finetuned to the model.\n\n## Contact Us\n\nhttps://ml6.eu/ . \n\nThis abstractive model finetuning is the continuation of the Christmas Project 2021 done in ML6: https://bit.ly/XmasProjects .\n\n## Load Finetuned Model\n\n```\nfrom transformers import AutoTokenizer, AutoModelForSeq2SeqLM\n\ntokenizer = AutoTokenizer.from_pretrained(\"ml6team/distilbart-tos-summarizer-tosdr\")\n\nmodel = AutoModelForSeq2SeqLM.from_pretrained(\"ml6team/distilbart-tos-summarizer-tosdr\")\n```\n\n## Code Sample\n\nThis sample requires [sumy](https://pypi.org/project/sumy/), the LSA Extractive Summarization library, as additional package to \nrun.\n\n```\nimport re\nimport nltk\nnltk.download('punkt')\nfrom sumy.parsers.plaintext import PlaintextParser\nfrom sumy.nlp.tokenizers import Tokenizer\nfrom sumy.nlp.stemmers import Stemmer\nfrom sumy.summarizers.lsa import LsaSummarizer\nfrom transformers import AutoTokenizer, AutoModelForSeq2SeqLM\n\nLANGUAGE = \"english\"\nEXTRACTED_ARTICLE_SENTENCES_LEN = 12\n\nstemmer = Stemmer(LANGUAGE)\nlsa_summarizer = LsaSummarizer(stemmer)\ntokenizer = AutoTokenizer.from_pretrained(\"ml6team/distilbart-tos-summarizer-tosdr\")\nmodel = AutoModelForSeq2SeqLM.from_pretrained(\"ml6team/distilbart-tos-summarizer-tosdr\")\n\ndef get_extractive_summary(text, sentences_count):\n parser = PlaintextParser.from_string(text, Tokenizer(LANGUAGE))\n summarized_info = lsa_summarizer(parser.document, sentences_count)\n summarized_info = [element._text for element in summarized_info]\n return ' '.join(summarized_info)\n\ndef get_summary(dict_summarizer_model, dict_tokenizer, text_content):\n text_content = get_extractive_summary(text_content, EXTRACTED_ARTICLE_SENTENCES_LEN)\n tokenizer = dict_tokenizer['tokenizer']\n model = dict_summarizer_model['model']\n\n inputs = tokenizer(text_content, max_length=dict_tokenizer['max_length'], truncation=True, return_tensors=\"pt\")\n outputs = model.generate(\n inputs[\"input_ids\"], max_length=dict_summarizer_model['max_length'], min_length=dict_summarizer_model['min_length'], \n )\n\n summarized_text = tokenizer.decode(outputs[0])\n match = re.search(r\"(.*)\", summarized_text)\n if match is not None: summarized_text = match.group(1)\n\n return summarized_text.replace('', '').replace('', '') \n \ntest_tos = \"\"\"\n In addition, certain portions of the Web Site may be subject to additional terms of use that we make available for your review or otherwise link to that portion of the Web Site to which such additional terms apply. By using such portions, or any part thereof, you agree to be bound by the additional terms of use applicable to such portions. \n Age Restrictions The Web Site may be accessed and used only by individuals who can form legally binding contracts under applicable laws, who are at least 18 years of age or the age of majority in their state or territory of residence (if higher than 18), and who are not barred from using the Web Site under applicable laws. \n Our Technology may not be copied, modified, reproduced, republished, posted, transmitted, sold, offered for sale, or redistributed in any way without our prior written permission and the prior written permission of our applicable licensors. Nothing in these Site Terms of Use grants you any right to receive delivery of a copy of Our Technology or to obtain access to Our Technology except as generally and ordinarily permitted through the Web Site according to these Site Terms of Use. \n Furthermore, nothing in these Site Terms of Use will be deemed to grant you, by implication, estoppel or otherwise, a license to Our Technology. Certain of the names, logos, and other materials displayed via the Web site constitute trademarks, tradenames, service marks or logos (\u201cMarks\u201d) of us or other entities. You are not authorized to use any such Marks. Ownership of all such Marks and the goodwill associated therewith remains with us or those other entities. \n Any use of third party software provided in connection with the Web Site will be governed by such third parties\u2019 licenses and not by these Site Terms of Use. Information on this Web Site may contain technical inaccuracies or typographical errors. Lenovo provides no assurances that any reported problems may be resolved with the use of any information that Lenovo provides\n\"\"\"\n\nmodel_dict = {\n 'model': model, \n 'max_length': 512,\n 'min_length': 4\n}\n\ntokenizer_dict = {\n 'tokenizer': tokenizer, \n 'max_length': 1024\n}\n\nprint(get_summary(model_dict, tokenizer_dict, test_tos))\n```\n"} {"downloads": 1976, "id": "IDEA-CCNL/Randeng-Pegasus-523M-Summary-Chinese", "likes": 13, "pipeline_tag": "summarization", "task": "summarization", "meta": {"language": "zh", "tags": ["summarization"], "inference": false}, "description": "\n\n# Randeng-Pegasus-523M-Summary-Chinese\n\n- Github: [Fengshenbang-LM](https://github.com/IDEA-CCNL/Fengshenbang-LM/blob/main/fengshen/examples/summary/randeng_pegasus_523M_summary.sh)\n- Docs: [Fengshenbang-Docs](https://fengshenbang-doc.readthedocs.io/zh/latest/docs/%E7%87%83%E7%81%AF%E7%B3%BB%E5%88%97/Randeng-Pegasus-523M-Summary-Chinese.html)\n\n## \u7b80\u4ecb Brief Introduction\n\n\u5584\u4e8e\u5904\u7406\u6458\u8981\u4efb\u52a1\uff0c\u5728\u6570\u4e2a\u4e2d\u6587\u6458\u8981\u6570\u636e\u96c6\u4e0a\u5fae\u8c03\u540e\u7684\uff0c\u4e2d\u6587\u7248\u7684PAGASUS-large\u3002\n\nGood at solving text summarization tasks, after fine-tuning on multiple Chinese text summarization datasets, Chinese PAGASUS-large.\n\n## \u6a21\u578b\u5206\u7c7b Model Taxonomy\n\n| \u9700\u6c42 Demand | \u4efb\u52a1 Task | \u7cfb\u5217 Series | \u6a21\u578b Model | \u53c2\u6570 Parameter | \u989d\u5916 Extra |\n| :"} {"downloads": 449, "id": "slauw87/bart_summarisation", "likes": 13, "pipeline_tag": "summarization", "task": "summarization", "meta": {"language": "en", "tags": ["sagemaker", "bart", "summarization"], "license": "apache-2.0", "datasets": ["samsum"], "model-index": [{"name": "bart-large-cnn-samsum", "results": [{"task": {"name": "Abstractive Text Summarization", "type": "abstractive-text-summarization"}, "dataset": {"name": "SAMSum Corpus: A Human-annotated Dialogue Dataset for Abstractive Summarization", "type": "samsum"}, "metrics": [{"name": "Validation ROGUE-1", "type": "rogue-1", "value": 43.2111}, {"name": "Validation ROGUE-2", "type": "rogue-2", "value": 22.3519}, {"name": "Validation ROGUE-L", "type": "rogue-l", "value": 33.315}, {"name": "Test ROGUE-1", "type": "rogue-1", "value": 41.8283}, {"name": "Test ROGUE-2", "type": "rogue-2", "value": 20.9857}, {"name": "Test ROGUE-L", "type": "rogue-l", "value": 32.3602}]}]}], "widget": [{"text": "Sugi: I am tired of everything in my life. \nTommy: What? How happy you life is! I do envy you.\nSugi: You don't know that I have been over-protected by my mother these years. I am really about to leave the family and spread my wings.\nTommy: Maybe you are right. \n"}]}, "description": "\n## `bart-large-cnn-samsum`\nThis model was trained using Amazon SageMaker and the new Hugging Face Deep Learning container.\nFor more information look at:\n- [\ud83e\udd17 Transformers Documentation: Amazon SageMaker](https://huggingface.co/transformers/sagemaker.html)\n- [Example Notebooks](https://github.com/huggingface/notebooks/tree/master/sagemaker)\n- [Amazon SageMaker documentation for Hugging Face](https://docs.aws.amazon.com/sagemaker/latest/dg/hugging-face.html)\n- [Python SDK SageMaker documentation for Hugging Face](https://sagemaker.readthedocs.io/en/stable/frameworks/huggingface/index.html)\n- [Deep Learning Container](https://github.com/aws/deep-learning-containers/blob/master/available_images.md#huggingface-training-containers)\n## Hyperparameters\n {\n \"dataset_name\": \"samsum\",\n \"do_eval\": true,\n \"do_predict\": true,\n \"do_train\": true,\n \"fp16\": true,\n \"learning_rate\": 5e-05,\n \"model_name_or_path\": \"facebook/bart-large-cnn\",\n \"num_train_epochs\": 3,\n \"output_dir\": \"/opt/ml/model\",\n \"per_device_eval_batch_size\": 4,\n \"per_device_train_batch_size\": 4,\n \"predict_with_generate\": true,\n \"seed\": 7\n}\n## Usage\n from transformers import pipeline\n summarizer = pipeline(\"summarization\", model=\"slauw87/bart-large-cnn-samsum\")\n conversation = '''Sugi: I am tired of everything in my life. \n Tommy: What? How happy you life is! I do envy you.\n Sugi: You don't know that I have been over-protected by my mother these years. I am really about to leave the family and spread my wings.\n Tommy: Maybe you are right. \n '''\n nlp(conversation)\n## Results\n| key | value |\n| "} {"downloads": 67375, "id": "plguillou/t5-base-fr-sum-cnndm", "likes": 10, "pipeline_tag": "summarization", "task": "summarization", "meta": {"language": "fr", "tags": ["pytorch", "t5", "seq2seq", "summarization"], "datasets": "cnn_dailymail", "widget": [{"text": "Apollo 11 est une mission du programme spatial am\u00e9ricain Apollo au cours de laquelle, pour la premi\u00e8re fois, des hommes se sont pos\u00e9s sur la Lune, le lundi 21 juillet 1969. L'agence spatiale am\u00e9ricaine, la NASA, remplit ainsi l'objectif fix\u00e9 par le pr\u00e9sident John F. Kennedy en 1961 de poser un \u00e9quipage sur la Lune avant la fin de la d\u00e9cennie 1960. Il s'agissait de d\u00e9montrer la sup\u00e9riorit\u00e9 des \u00c9tats-Unis sur l'Union sovi\u00e9tique qui avait \u00e9t\u00e9 mise \u00e0 mal par les succ\u00e8s sovi\u00e9tiques au d\u00e9but de l'\u00e8re spatiale dans le contexte de la guerre froide qui oppose alors ces deux pays. Ce d\u00e9fi est lanc\u00e9 alors que la NASA n'a pas encore plac\u00e9 en orbite un seul astronaute. Gr\u00e2ce \u00e0 une mobilisation de moyens humains et financiers consid\u00e9rables, l'agence spatiale rattrape puis d\u00e9passe le programme spatial sovi\u00e9tique.", "example_title": "Apollo 11"}]}, "description": "\n\n# French T5 Abstractive Text Summarization\n\n~~Version 1.0 (I will keep improving the model's performances.)~~\n\nVersion 2.0 is here! (with improved performances of course)\n\nI trained the model on 13x more data than v1.\n\nROUGE-1: 44.5252\n\nROUGE-2: 22.652\n\nROUGE-L: 29.8866\n\n## Model description\n\nThis model is a T5 Transformers model (JDBN/t5-base-fr-qg-fquad) that was fine-tuned in french for abstractive text summarization.\n\n## How to use\n\n```python\nfrom transformers import T5Tokenizer, T5ForConditionalGeneration\ntokenizer = T5Tokenizer.from_pretrained(\"plguillou/t5-base-fr-sum-cnndm\")\nmodel = T5ForConditionalGeneration.from_pretrained(\"plguillou/t5-base-fr-sum-cnndm\")\n```\n\nTo summarize an ARTICLE, just modify the string like this : \"summarize: ARTICLE\".\n\n## Training data\n\nThe base model I used is JDBN/t5-base-fr-qg-fquad (it can perform question generation, question answering and answer extraction).\n\nI used the \"t5-base\" model from the transformers library to translate in french the CNN / Daily Mail summarization dataset.\n\n"} {"downloads": 1359, "id": "linydub/bart-large-samsum", "likes": 10, "pipeline_tag": "summarization", "task": "summarization", "meta": {"language": ["en"], "license": "apache-2.0", "tags": ["summarization", "azureml", "azure", "codecarbon", "bart"], "datasets": ["samsum"], "metrics": ["rouge"], "model-index": [{"name": "bart-large-samsum", "results": [{"task": {"name": "Abstractive Text Summarization", "type": "abstractive-text-summarization"}, "dataset": {"name": "SAMSum Corpus: A Human-annotated Dialogue Dataset for Abstractive Summarization", "type": "samsum"}, "metrics": [{"name": "Validation ROGUE-1", "type": "rouge-1", "value": 55.0234}, {"name": "Validation ROGUE-2", "type": "rouge-2", "value": 29.6005}, {"name": "Validation ROGUE-L", "type": "rouge-L", "value": 44.914}, {"name": "Validation ROGUE-Lsum", "type": "rouge-Lsum", "value": 50.464}, {"name": "Test ROGUE-1", "type": "rouge-1", "value": 53.4345}, {"name": "Test ROGUE-2", "type": "rouge-2", "value": 28.7445}, {"name": "Test ROGUE-L", "type": "rouge-L", "value": 44.1848}, {"name": "Test ROGUE-Lsum", "type": "rouge-Lsum", "value": 49.1874}]}]}], "widget": [{"text": "Henry: Hey, is Nate coming over to watch the movie tonight?\nKevin: Yea, he said he'll be arriving a bit later at around 7 since he gets off of work at 6. Have you taken out the garbage yet?\nHenry: Oh I forgot. I'll do that once I'm finished with my assignment for my math class.\nKevin: Yea, you should take it out as soon as possible. And also, Nate is bringing his girlfriend.\nHenry: Nice, I'm really looking forward to seeing them again.\n"}]}, "description": "\n\n## `bart-large-samsum`\nThis model was trained using Microsoft's [`Azure Machine Learning Service`](https://azure.microsoft.com/en-us/services/machine-learning). It was fine-tuned on the [`samsum`](https://huggingface.co/datasets/samsum) corpus from [`facebook/bart-large`](https://huggingface.co/facebook/bart-large) checkpoint.\n\n## Usage (Inference)\n```python\nfrom transformers import pipeline\nsummarizer = pipeline(\"summarization\", model=\"linydub/bart-large-samsum\")\n\ninput_text = '''\n Henry: Hey, is Nate coming over to watch the movie tonight?\n Kevin: Yea, he said he'll be arriving a bit later at around 7 since he gets off of work at 6. Have you taken out the garbage yet?\n Henry: Oh I forgot. I'll do that once I'm finished with my assignment for my math class.\n Kevin: Yea, you should take it out as soon as possible. And also, Nate is bringing his girlfriend.\n Henry: Nice, I'm really looking forward to seeing them again.\n'''\nsummarizer(input_text)\n```\n\n## Fine-tune on AzureML\n[![Deploy to Azure](https://aka.ms/deploytoazurebutton)](https://portal.azure.com/#create/Microsoft.Template/uri/https%3A%2F%2Fraw.githubusercontent.com%2Flinydub%2Fazureml-greenai-txtsum%2Fmain%2F.cloud%2Ftemplate-hub%2Flinydub%2Farm-bart-large-samsum.json) [![Visualize](https://raw.githubusercontent.com/Azure/azure-quickstart-templates/master/1-CONTRIBUTION-GUIDE/images/visualizebutton.svg?sanitize=true)](http://armviz.io/#/?load=https://raw.githubusercontent.com/linydub/azureml-greenai-txtsum/main/.cloud/template-hub/linydub/arm-bart-large-samsum.json)\n\nMore information about the fine-tuning process (including samples and benchmarks): \n**[Preview]** https://github.com/linydub/azureml-greenai-txtsum\n\n## Resource Usage\nThese results were retrieved from [`Azure Monitor Metrics`](https://docs.microsoft.com/en-us/azure/azure-monitor/essentials/data-platform-metrics). All experiments were ran on AzureML low priority compute clusters.\n\n| Key | Value |\n| "} {"downloads": 924, "id": "jordiclive/flan-t5-3b-summarizer", "likes": 10, "pipeline_tag": "summarization", "task": "summarization", "meta": {"language": ["en"], "license": ["apache-2.0", "bsd-3-clause"], "tags": ["summarization", "extractive", "summary", "abstractive", "multi-task", "document summary"], "datasets": ["jordiclive/scored_summarization_datasets"], "metrics": ["rouge"]}, "description": "\n\n# Multi-purpose Summarizer (Fine-tuned 3B google/flan-t5-xl on several Summarization datasets)\n\n \n \"Open\n\n\nA fine-tuned version of [google/flan-t5-xl](https://huggingface.co/google/flan-t5-xl) on various summarization datasets (xsum, wikihow, cnn_dailymail/3.0.0, samsum, scitldr/AIC, billsum, TLDR)\n\nGoal: a model that can be used for a general-purpose summarizer for academic and general usage. Control over the type of summary can be given by varying the instruction prepended to the source document. The result works well on lots of text, although trained with a max source length of 512 tokens and 150 max summary length. \n\n"} {"downloads": 416, "id": "jordiclive/flan-t5-11b-summarizer-filtered", "likes": 10, "pipeline_tag": "summarization", "task": "summarization", "meta": {"language": ["en"], "license": ["apache-2.0", "bsd-3-clause"], "tags": ["summarization", "extractive", "summary", "abstractive", "multi-task", "document summary"], "datasets": ["jordiclive/scored_summarization_datasets", "jordiclive/wikipedia-summary-dataset"], "metrics": ["rouge"]}, "description": "\n\n# Multi-purpose Summarizer (Fine-tuned 11B google/flan-t5-xxl on several Summarization datasets)\n\n \n \"Open\n\n\nA fine-tuned version of [google/flan-t5-xxl](https://huggingface.co/google/flan-t5-xxl) on various summarization datasets (xsum, wikihow, cnn_dailymail/3.0.0, samsum, scitldr/AIC, billsum, TLDR, wikipedia-summary)\n\n70% of the data was also filtered with the use of the [contriever](https://github.com/facebookresearch/contriever) with a cosine similarity between text and summary of 0.6 as threshold.\n\nGoal: a model that can be used for a general-purpose summarizer for academic and general usage. Control over the type of summary can be given by varying the instruction prepended to the source document. The result works well on lots of text, although trained with a max source length of 512 tokens and 150 max summary length. \n\n"} {"downloads": 282, "id": "phpaiola/ptt5-base-summ-xlsum", "likes": 10, "pipeline_tag": "summarization", "task": "summarization", "meta": {"language": "pt", "license": "mit", "tags": ["t5", "pytorch", "pt", "pt-br", "summarization", "abstractive summarization"], "datasets": ["csebuetnlp/xlsum"], "inference": {"parameters": {"min_length": 32, "max_length": 256, "top_k": 5}}, "widget": [{"text": "O homem, Wilmer Antonio Marin, conhecido como Hugo, seria um alto comandante das For\u00e7as Armadas Revolucion\u00e1rias da Col\u00f4mbia (Farc), o maior grupo rebelde do pa\u00eds. Ele \u00e9 acusado de ter perpetrado um ataque num clube noturno em fevereiro que matou 35 pessoas e feriu 160. Hugo tamb\u00e9m estaria envolvido no assassinato do empres\u00e1rio japon\u00eas Chikao Muramatsu que foi encontrado morto a tiros em novembro, quase tr\u00eas anos depois de ter sido seq\u00fcestrado. Golpe O resgate de US$ 19 milh\u00f5es (R$ 55 milh\u00f5es) tinha sido pedido para a liberta\u00e7\u00e3o de Muramatsu. As autoridades colombianas acreditam que a deten\u00e7\u00e3o de Hugo representa um grande golpe na estrutura da Farc em Bogot\u00e1. Wilmer Antonio Marin \u00e9 acusado de administrar uma rede de seq\u00fcestros que teria, como alvo, empres\u00e1rios ricos e estrangeiros. Ele seria repons\u00e1vel por seq\u00fcestr\u00e1-los no meio da rua e lev\u00e1-los para as montanhas onde a guerrilha tem suas bases.", "example_title": "Not\u00edcia 1"}, {"text": "Terminou a rebeli\u00e3o de presos no Centro de Cust\u00f3dia de Presos de Justi\u00e7a (CCPJ), em S\u00e3o Lu\u00eds, no come\u00e7o da tarde desta quarta-feira (17). Os presos entregaram as armas e a pol\u00edcia faz uma revista dentro da unidade. O motim come\u00e7ou durante a festa do Dia das Crian\u00e7as, realizada na ter\u00e7a-feira (16). As 16 crian\u00e7as e 14 adultos foram libertados. Segundo informa\u00e7\u00f5es da pol\u00edcia, o l\u00edder da rebeli\u00e3o foi transferido para o Pres\u00eddio de Pedrinhas, na capital maranhense. Os presos receberam garantias, por parte do diretor da unidade, de que n\u00e3o haveria repres\u00e1lias e novas transfer\u00eancias. Os presos tentaram fugir durante a festa, mas o plano foi descoberto. No come\u00e7o da rebeli\u00e3o quatro pessoas ficaram feridas, entre elas uma auxiliar de enfermagem e um agente de pol\u00edcia que trabalham no pres\u00eddio. A unidade ficou sem luz e \u00e1gua e as negocia\u00e7\u00f5es para a liberta\u00e7\u00e3o dos ref\u00e9ns foi retomada na manh\u00e3 desta quarta-feira. Segundo informa\u00e7\u00f5es da pol\u00edcia, os presos temiam uma transfer\u00eancia em massa depois de terem iniciado uma outra rebeli\u00e3o durante a greve de policiais no estado, na semana passada. A CCPJ tem capacidade para cerca de 80 presos, mas abriga 203 homens.", "example_title": "Not\u00edcia 2"}]}, "description": "\n\n# Portuguese T5 for Abstractive Summarization (PTT5 Summ)\n\n## Introduction\nPTT5 Summ is a fine-tuned [PTT5](https://github.com/unicamp-dl/PTT5) model to perform Abstractive Summarization in Brazilian Portuguese texts. This model was fine-tuned on the datasets: [WikiLingua](https://github.com/esdurmus/Wikilingua), [XL-Sum](https://github.com/csebuetnlp/xl-sum), [TeM\u00e1rio](http://www.nilc.icmc.usp.br/nilc/download/NILCTR0706-MazieroEtAl(2).pdf) and [CSTNews](http://nilc.icmc.usp.br/CSTNews/login/?next=/CSTNews/).\n\nFor further information, please go to [PTT5 Summ repository](https://github.com/pedropaiola/ptt5-summ).\n\n## Available models\n| Model | Dataset used in fine-tuning| \n| :-: | :-: | \n| [phpaiola/ptt5-base-summ-wikilingua](https://huggingface.co/phpaiola/ptt5-base-summ-wikilingua) | WikiLingua |\n| [phpaiola/ptt5-base-summ-xlsum](https://huggingface.co/phpaiola/ptt5-base-summ-xlsum) | XL-Sum |\n| [phpaiola/ptt5-base-summ-temario](https://huggingface.co/phpaiola/ptt5-base-summ-temario) | 1st phase: WikiLingua. 2nd phase: TeMario |\n| [phpaiola/ptt5-base-summ-cstnews](https://huggingface.co/phpaiola/ptt5-base-summ-cstnews) | 1st phase: WikiLingua. 2nd phase: CSTNews|\n\n## Usage example\n```python\n# Tokenizer \nfrom transformers import T5Tokenizer\n\n# PyTorch model \nfrom transformers import T5Model, T5ForConditionalGeneration\n\ntoken_name = 'unicamp-dl/ptt5-base-portuguese-vocab'\nmodel_name = 'phpaiola/ptt5-base-summ-xlsum'\n\ntokenizer = T5Tokenizer.from_pretrained(token_name )\nmodel_pt = T5ForConditionalGeneration.from_pretrained(model_name)\n\ntext = '''\n\u201cA tend\u00eancia de queda da taxa de juros no Brasil \u00e9 real, \u00e9 vis\u00edvel\u201d, disse Meirelles, que participou na capital americana de uma s\u00e9rie de reuni\u00f5es e encontros com banqueiros e investidores que aconteceram paralelamente \u00e0s reuni\u00f5es do Fundo Monet\u00e1rio Internacional (FMI) e do Banco Mundial (Bird) no fim de semana.\nPara o presidente do BC, a atual pol\u00edtica econ\u00f4mica do governo e a manuten\u00e7\u00e3o da taxa de infla\u00e7\u00e3o dentro da meta s\u00e3o fatores que garantem queda na taxa de juros a longo prazo.\n\u201cMas \u00e9 importante que n\u00f3s n\u00e3o olhemos para isso apenas no curto prazo. Temos que olhar no m\u00e9dio e longo prazos\u201d, disse Meirelles.\nPara ele, o trabalho que o Banco Central tem feito para conter a infla\u00e7\u00e3o dentro da meta vai gerar queda gradual da taxa de juros.\nBC do ano\nNeste domingo, Meirelles participou da cerim\u00f4nia de entrega do pr\u00eamio \u201cBanco Central do ano\u201d, oferecido pela revista The Banker \u00e0 institui\u00e7\u00e3o que preside.\n\u201cEste \u00e9 um sinal importante de reconhecimento do nosso trabalho, de que o Brasil est\u00e1 indo na dire\u00e7\u00e3o correta\u201d, disse ele.\nSegundo Meirelles, o Banco Central do Brasil est\u00e1 sendo percebido como uma institui\u00e7\u00e3o comprometida com a meta de infla\u00e7\u00e3o.\n\u201cIsso tem um ganho importante, na medida em que os agentes formadores de pre\u00e7os come\u00e7am a apostar que a infla\u00e7\u00e3o vai estar na meta, que isso \u00e9 levado a s\u00e9rio no Brasil\u201d, completou.\nO presidente do Banco Central disse ainda que a crise pol\u00edtica brasileira n\u00e3o foi um assunto de interesse priorit\u00e1rio dos investidores que encontrou no fim de semana.\n'''\n\ninputs = tokenizer.encode(text, max_length=512, truncation=True, return_tensors='pt')\nsummary_ids = model_pt.generate(inputs, max_length=256, min_length=32, num_beams=5, no_repeat_ngram_size=3, early_stopping=True)\nsummary = tokenizer.decode(summary_ids[0])\nprint(summary)\n# O presidente do Banco Central, Henrique Meirelles, disse neste domingo, em Washington, que a taxa de juros no Brasil \u00e9 real, mas que o Brasil est\u00e1 indo na dire\u00e7\u00e3o correta.\n\n```\n\n# Citation\n\n @aInProceedings{ptt5summ_bracis,\n author=\"Paiola, Pedro H.\n and de Rosa, Gustavo H.\n and Papa, Jo{\\~a}o P.\",\n editor=\"Xavier-Junior, Jo{\\~a}o Carlos\n and Rios, Ricardo Ara{\\'u}jo\",\n title=\"Deep Learning-Based Abstractive Summarization for\u00a0Brazilian Portuguese Texts\",\n booktitle=\"BRACIS 2022: Intelligent Systems\",\n year=\"2022\",\n publisher=\"Springer International Publishing\",\n address=\"Cham\",\n pages=\"479--493\",\n isbn=\"978-3-031-21689-3\"}\n"} {"downloads": 50, "id": "ml6team/mbart-large-cc25-cnn-dailymail-nl-finetune", "likes": 10, "pipeline_tag": "summarization", "task": "summarization", "meta": {"language": ["nl"], "tags": ["mbart", "bart", "summarization"], "datasets": ["ml6team/cnn_dailymail_nl"], "pipeline_tag": "summarization", "widget": [{"text": "Het jongetje werd eind april met zwaar letsel naar het ziekenhuis gebracht in Maastricht. Drie weken later overleed het kindje als gevolg van het letsel. Onderzoek moet nog uitwijzen wat voor verwondingen de baby precies had en hoe hij gewond is geraakt. Daarnaast doet de politie onderzoek in de woning van de ouders. Het is nog niet duidelijk wanneer de onderzoeken zijn afgerond, meldt 1Limburg. De verdachten zitten in beperkingen en mogen alleen contact hebben met hun advocaat."}, {"text": "Volgens De Vries gaat het om \"de hoogste beloning die ooit is uitgeloofd in Nederland\". De stichting heeft een website waar donateurs geld kunnen storten, schrijft NH Nieuws. Volgens De Vries is dit initiatief ook bedoeld voor andere zaken waar beloningen voor een gouden tip worden uitgereikt. \"Het is dus niet eenmalig\", aldus De Vries. Het is de eerste keer dat zoiets wordt opgezet, stelt hij: De 18-jarige Tanja Groen verdween spoorloos tijdens de ontgroeningsweek van de Universiteit Maastricht in augustus 1993. Ze werd voor het laatst gezien nadat ze was vertrokken van een feestje. De studente zou vandaag 46 jaar zijn geworden. Ook de ouders van Groen waren op de persconferentie aanwezig. \"Het is vandaag de verjaardag van Tanja Groen, die haar ouders al 27 jaar niet meer hebben kunnen vieren, omdat zij eind augustus 1993 spoorloos is verdwenen\", zei De Vries. \"Haar ouders zitten in tergende onzekerheid. Ze geloven dat ze niet meer leeft. Maar die ene promille vreet aan ze. Ze hebben recht op duidelijkheid. Ze komen op leeftijd. Grootste angst is nooit te weten wat er met hun kind is gebeurd.\" De Vries wil dat het miljoen binnen een jaar is ingezameld. Als het bedrag na een jaar lager uitkomt, dan is dat de uit te loven beloning. Is het meer, dan zal de rest van het geld gebruikt worden in beloningen in andere zaken. Het initiatief wordt gesteund door de politie en justitie. De afgelopen jaren is er vaker uitgebreid naar sporen van Tanja Groen gezocht, maar die zoekacties hebben niets concreets opgeleverd. Vorige week werd opnieuw naar de vrouw gezocht, op de Strabrechtse Heide in Noord-Brabant. Ook die zoektocht leverde niets op."}]}, "description": "\n# mbart-large-cc25-cnn-dailymail-nl\n## Model description\nFinetuned version of [mbart](https://huggingface.co/facebook/mbart-large-cc25). We also wrote a **blog post** about this model [here](https://blog.ml6.eu/why-we-open-sourced-two-dutch-summarization-datasets-1047445abc97)\n## Intended uses & limitations\nIt's meant for summarizing Dutch news articles.\n#### How to use\n```python\nimport transformers\nundisputed_best_model = transformers.MBartForConditionalGeneration.from_pretrained(\n \"ml6team/mbart-large-cc25-cnn-dailymail-nl-finetune\"\n)\ntokenizer = transformers.MBartTokenizer.from_pretrained(\"facebook/mbart-large-cc25\")\nsummarization_pipeline = transformers.pipeline(\n task=\"summarization\",\n model=undisputed_best_model,\n tokenizer=tokenizer,\n)\nsummarization_pipeline.model.config.decoder_start_token_id = tokenizer.lang_code_to_id[\n \"nl_XX\"\n]\narticle = \"Kan je dit even samenvatten alsjeblief.\" # Dutch\nsummarization_pipeline(\n article,\n do_sample=True,\n top_p=0.75,\n top_k=50,\n # num_beams=4,\n min_length=50,\n early_stopping=True,\n truncation=True,\n)[0][\"summary_text\"]\n```\n## Training data\nFinetuned [mbart](https://huggingface.co/facebook/mbart-large-cc25) with [this dataset](https://huggingface.co/datasets/ml6team/cnn_dailymail_nl) and another smaller dataset that we can't open source because we scraped it from the internet. For more information check out our blog post [here](https://blog.ml6.eu/)."} {"downloads": 6275171, "id": "t5-base", "likes": 162, "pipeline_tag": "translation", "task": "translation", "meta": {"language": ["en", "fr", "ro", "de"], "datasets": ["c4"], "tags": ["summarization", "translation"], "license": "apache-2.0"}, "description": "\n\n# Model Card for T5 Base\n\n![model image](https://camo.githubusercontent.com/623b4dea0b653f2ad3f36c71ebfe749a677ac0a1/68747470733a2f2f6d69726f2e6d656469756d2e636f6d2f6d61782f343030362f312a44304a31674e51663876727255704b657944387750412e706e67)\n\n# Table of Contents\n\n1. [Model Details](#model-details)\n2. [Uses](#uses)\n3. [Bias, Risks, and Limitations](#bias-risks-and-limitations)\n4. [Training Details](#training-details)\n5. [Evaluation](#evaluation)\n6. [Environmental Impact](#environmental-impact)\n7. [Citation](#citation)\n8. [Model Card Authors](#model-card-authors)\n9. [How To Get Started With the Model](#how-to-get-started-with-the-model)\n\n# Model Details\n\n## Model Description\n\nThe developers of the Text-To-Text Transfer Transformer (T5) [write](https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html): \n\n> With T5, we propose reframing all NLP tasks into a unified text-to-text-format where the input and output are always text strings, in contrast to BERT-style models that can only output either a class label or a span of the input. Our text-to-text framework allows us to use the same model, loss function, and hyperparameters on any NLP task.\n\nT5-Base is the checkpoint with 220 million parameters. \n\n- **Developed by:** Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu. See [associated paper](https://jmlr.org/papers/volume21/20-074/20-074.pdf) and [GitHub repo](https://github.com/google-research/text-to-text-transfer-transformer#released-model-checkpoints)\n- **Model type:** Language model\n- **Language(s) (NLP):** English, French, Romanian, German\n- **License:** Apache 2.0\n- **Related Models:** [All T5 Checkpoints](https://huggingface.co/models?search=t5)\n- **Resources for more information:**\n - [Research paper](https://jmlr.org/papers/volume21/20-074/20-074.pdf)\n - [Google's T5 Blog Post](https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html) \n - [GitHub Repo](https://github.com/google-research/text-to-text-transfer-transformer)\n - [Hugging Face T5 Docs](https://huggingface.co/docs/transformers/model_doc/t5)\n \n# Uses\n\n## Direct Use and Downstream Use\n\nThe developers write in a [blog post](https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html) that the model: \n\n> Our text-to-text framework allows us to use the same model, loss function, and hyperparameters on any NLP task, including machine translation, document summarization, question answering, and classification tasks (e.g., sentiment analysis). We can even apply T5 to regression tasks by training it to predict the string representation of a number instead of the number itself.\n\nSee the [blog post](https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html) and [research paper](https://jmlr.org/papers/volume21/20-074/20-074.pdf) for further details.\n\n## Out-of-Scope Use\n\nMore information needed.\n\n# Bias, Risks, and Limitations\n\nMore information needed.\n\n## Recommendations\n\nMore information needed.\n\n# Training Details\n\n## Training Data\n\nThe model is pre-trained on the [Colossal Clean Crawled Corpus (C4)](https://www.tensorflow.org/datasets/catalog/c4), which was developed and released in the context of the same [research paper](https://jmlr.org/papers/volume21/20-074/20-074.pdf) as T5.\n\nThe model was pre-trained on a on a **multi-task mixture of unsupervised (1.) and supervised tasks (2.)**.\nThereby, the following datasets were being used for (1.) and (2.):\n\n1. **Datasets used for Unsupervised denoising objective**:\n\n- [C4](https://huggingface.co/datasets/c4)\n- [Wiki-DPR](https://huggingface.co/datasets/wiki_dpr)\n\n\n2. **Datasets used for Supervised text-to-text language modeling objective**\n\n- Sentence acceptability judgment\n - CoLA [Warstadt et al., 2018](https://arxiv.org/abs/1805.12471)\n- Sentiment analysis \n - SST-2 [Socher et al., 2013](https://nlp.stanford.edu/~socherr/EMNLP2013_RNTN.pdf)\n- Paraphrasing/sentence similarity\n - MRPC [Dolan and Brockett, 2005](https://aclanthology.org/I05-5002)\n - STS-B [Ceret al., 2017](https://arxiv.org/abs/1708.00055)\n - QQP [Iyer et al., 2017](https://quoradata.quora.com/First-Quora-Dataset-Release-Question-Pairs)\n- Natural language inference\n - MNLI [Williams et al., 2017](https://arxiv.org/abs/1704.05426)\n - QNLI [Rajpurkar et al.,2016](https://arxiv.org/abs/1606.05250)\n - RTE [Dagan et al., 2005](https://link.springer.com/chapter/10.1007/11736790_9) \n - CB [De Marneff et al., 2019](https://semanticsarchive.net/Archive/Tg3ZGI2M/Marneffe.pdf)\n- Sentence completion\n - COPA [Roemmele et al., 2011](https://www.researchgate.net/publication/221251392_Choice_of_Plausible_Alternatives_An_Evaluation_of_Commonsense_Causal_Reasoning)\n- Word sense disambiguation\n - WIC [Pilehvar and Camacho-Collados, 2018](https://arxiv.org/abs/1808.09121)\n- Question answering\n - MultiRC [Khashabi et al., 2018](https://aclanthology.org/N18-1023)\n - ReCoRD [Zhang et al., 2018](https://arxiv.org/abs/1810.12885)\n - BoolQ [Clark et al., 2019](https://arxiv.org/abs/1905.10044)\n\n## Training Procedure\n\nIn their [abstract](https://jmlr.org/papers/volume21/20-074/20-074.pdf), the model developers write: \n\n> In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts every language problem into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks. \n\nThe framework introduced, the T5 framework, involves a training procedure that brings together the approaches studied in the paper. See the [research paper](https://jmlr.org/papers/volume21/20-074/20-074.pdf) for further details.\n\n# Evaluation\n\n## Testing Data, Factors & Metrics\n\nThe developers evaluated the model on 24 tasks, see the [research paper](https://jmlr.org/papers/volume21/20-074/20-074.pdf) for full details.\n\n## Results \n\nFor full results for T5-Base, see the [research paper](https://jmlr.org/papers/volume21/20-074/20-074.pdf), Table 14.\n\n# Environmental Impact\n\nCarbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).\n\n- **Hardware Type:** Google Cloud TPU Pods\n- **Hours used:** More information needed\n- **Cloud Provider:** GCP\n- **Compute Region:** More information needed\n- **Carbon Emitted:** More information needed\n\n# Citation\n\n**BibTeX:**\n\n```bibtex\n@article{2020t5,\n author = {Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu},\n title = {Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer},\n journal = {Journal of Machine Learning Research},\n year = {2020},\n volume = {21},\n number = {140},\n pages = {1-67},\n url = {http://jmlr.org/papers/v21/20-074.html}\n}\n```\n\n**APA:**\n- Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., ... & Liu, P. J. (2020). Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res., 21(140), 1-67.\n\n# Model Card Authors\n\nThis model card was written by the team at Hugging Face.\n\n# How to Get Started with the Model\n\nUse the code below to get started with the model.\n\n
\n Click to expand \n\n```python\nfrom transformers import T5Tokenizer, T5Model\n\ntokenizer = T5Tokenizer.from_pretrained(\"t5-base\")\nmodel = T5Model.from_pretrained(\"t5-base\")\n\ninput_ids = tokenizer(\n \"Studies have been shown that owning a dog is good for you\", return_tensors=\"pt\"\n).input_ids # Batch size 1\ndecoder_input_ids = tokenizer(\"Studies show that\", return_tensors=\"pt\").input_ids # Batch size 1\n\n# forward pass\noutputs = model(input_ids=input_ids, decoder_input_ids=decoder_input_ids)\nlast_hidden_states = outputs.last_hidden_state\n```\n\nSee the [Hugging Face T5](https://huggingface.co/docs/transformers/model_doc/t5#transformers.T5Model) docs and a [Colab Notebook](https://colab.research.google.com/github/google-research/text-to-text-transfer-transformer/blob/main/notebooks/t5-trivia.ipynb) created by the model developers for more examples.\n
\n"} {"downloads": 184038, "id": "facebook/nllb-200-distilled-600M", "likes": 103, "pipeline_tag": "translation", "task": "translation", "meta": {"language": ["ace", "acm", "acq", "aeb", "af", "ajp", "ak", "als", "am", "apc", "ar", "ars", "ary", "arz", "as", "ast", "awa", "ayr", "azb", "azj", "ba", "bm", "ban", "be", "bem", "bn", "bho", "bjn", "bo", "bs", "bug", "bg", "ca", "ceb", "cs", "cjk", "ckb", "crh", "cy", "da", "de", "dik", "dyu", "dz", "el", "en", "eo", "et", "eu", "ee", "fo", "fj", "fi", "fon", "fr", "fur", "fuv", "gaz", "gd", "ga", "gl", "gn", "gu", "ht", "ha", "he", "hi", "hne", "hr", "hu", "hy", "ig", "ilo", "id", "is", "it", "jv", "ja", "kab", "kac", "kam", "kn", "ks", "ka", "kk", "kbp", "kea", "khk", "km", "ki", "rw", "ky", "kmb", "kmr", "knc", "kg", "ko", "lo", "lij", "li", "ln", "lt", "lmo", "ltg", "lb", "lua", "lg", "luo", "lus", "lvs", "mag", "mai", "ml", "mar", "min", "mk", "mt", "mni", "mos", "mi", "my", "nl", "nn", "nb", "npi", "nso", "nus", "ny", "oc", "ory", "pag", "pa", "pap", "pbt", "pes", "plt", "pl", "pt", "prs", "quy", "ro", "rn", "ru", "sg", "sa", "sat", "scn", "shn", "si", "sk", "sl", "sm", "sn", "sd", "so", "st", "es", "sc", "sr", "ss", "su", "sv", "swh", "szl", "ta", "taq", "tt", "te", "tg", "tl", "th", "ti", "tpi", "tn", "ts", "tk", "tum", "tr", "tw", "tzm", "ug", "uk", "umb", "ur", "uzn", "vec", "vi", "war", "wo", "xh", "ydd", "yo", "yue", "zh", "zsm", "zu"], "language_details": "ace_Arab, ace_Latn, acm_Arab, acq_Arab, aeb_Arab, afr_Latn, ajp_Arab, aka_Latn, amh_Ethi, apc_Arab, arb_Arab, ars_Arab, ary_Arab, arz_Arab, asm_Beng, ast_Latn, awa_Deva, ayr_Latn, azb_Arab, azj_Latn, bak_Cyrl, bam_Latn, ban_Latn,bel_Cyrl, bem_Latn, ben_Beng, bho_Deva, bjn_Arab, bjn_Latn, bod_Tibt, bos_Latn, bug_Latn, bul_Cyrl, cat_Latn, ceb_Latn, ces_Latn, cjk_Latn, ckb_Arab, crh_Latn, cym_Latn, dan_Latn, deu_Latn, dik_Latn, dyu_Latn, dzo_Tibt, ell_Grek, eng_Latn, epo_Latn, est_Latn, eus_Latn, ewe_Latn, fao_Latn, pes_Arab, fij_Latn, fin_Latn, fon_Latn, fra_Latn, fur_Latn, fuv_Latn, gla_Latn, gle_Latn, glg_Latn, grn_Latn, guj_Gujr, hat_Latn, hau_Latn, heb_Hebr, hin_Deva, hne_Deva, hrv_Latn, hun_Latn, hye_Armn, ibo_Latn, ilo_Latn, ind_Latn, isl_Latn, ita_Latn, jav_Latn, jpn_Jpan, kab_Latn, kac_Latn, kam_Latn, kan_Knda, kas_Arab, kas_Deva, kat_Geor, knc_Arab, knc_Latn, kaz_Cyrl, kbp_Latn, kea_Latn, khm_Khmr, kik_Latn, kin_Latn, kir_Cyrl, kmb_Latn, kon_Latn, kor_Hang, kmr_Latn, lao_Laoo, lvs_Latn, lij_Latn, lim_Latn, lin_Latn, lit_Latn, lmo_Latn, ltg_Latn, ltz_Latn, lua_Latn, lug_Latn, luo_Latn, lus_Latn, mag_Deva, mai_Deva, mal_Mlym, mar_Deva, min_Latn, mkd_Cyrl, plt_Latn, mlt_Latn, mni_Beng, khk_Cyrl, mos_Latn, mri_Latn, zsm_Latn, mya_Mymr, nld_Latn, nno_Latn, nob_Latn, npi_Deva, nso_Latn, nus_Latn, nya_Latn, oci_Latn, gaz_Latn, ory_Orya, pag_Latn, pan_Guru, pap_Latn, pol_Latn, por_Latn, prs_Arab, pbt_Arab, quy_Latn, ron_Latn, run_Latn, rus_Cyrl, sag_Latn, san_Deva, sat_Beng, scn_Latn, shn_Mymr, sin_Sinh, slk_Latn, slv_Latn, smo_Latn, sna_Latn, snd_Arab, som_Latn, sot_Latn, spa_Latn, als_Latn, srd_Latn, srp_Cyrl, ssw_Latn, sun_Latn, swe_Latn, swh_Latn, szl_Latn, tam_Taml, tat_Cyrl, tel_Telu, tgk_Cyrl, tgl_Latn, tha_Thai, tir_Ethi, taq_Latn, taq_Tfng, tpi_Latn, tsn_Latn, tso_Latn, tuk_Latn, tum_Latn, tur_Latn, twi_Latn, tzm_Tfng, uig_Arab, ukr_Cyrl, umb_Latn, urd_Arab, uzn_Latn, vec_Latn, vie_Latn, war_Latn, wol_Latn, xho_Latn, ydd_Hebr, yor_Latn, yue_Hant, zho_Hans, zho_Hant, zul_Latn", "tags": ["nllb", "translation"], "license": "cc-by-nc-4.0", "datasets": ["flores-200"], "metrics": ["bleu", "spbleu", "chrf++"], "inference": false}, "description": "\n\n# NLLB-200\n\nThis is the model card of NLLB-200's distilled 600M variant.\n\nHere are the [metrics](https://tinyurl.com/nllb200densedst600mmetrics) for that particular checkpoint.\n\n- Information about training algorithms, parameters, fairness constraints or other applied approaches, and features. The exact training algorithm, data and the strategies to handle data imbalances for high and low resource languages that were used to train NLLB-200 is described in the paper.\n- Paper or other resource for more information NLLB Team et al, No Language Left Behind: Scaling Human-Centered Machine Translation, Arxiv, 2022\n- License: CC-BY-NC\n- Where to send questions or comments about the model: https://github.com/facebookresearch/fairseq/issues\n\n\n\n## Intended Use\n- Primary intended uses: NLLB-200 is a machine translation model primarily intended for research in machine translation, - especially for low-resource languages. It allows for single sentence translation among 200 languages. Information on how to - use the model can be found in Fairseq code repository along with the training code and references to evaluation and training data.\n- Primary intended users: Primary users are researchers and machine translation research community.\n- Out-of-scope use cases: NLLB-200 is a research model and is not released for production deployment. NLLB-200 is trained on general domain text data and is not intended to be used with domain specific texts, such as medical domain or legal domain. The model is not intended to be used for document translation. The model was trained with input lengths not exceeding 512 tokens, therefore translating longer sequences might result in quality degradation. NLLB-200 translations can not be used as certified translations. \n\n## Metrics\n\u2022 Model performance measures: NLLB-200 model was evaluated using BLEU, spBLEU, and chrF++ metrics widely adopted by machine translation community. Additionally, we performed human evaluation with the XSTS protocol and measured the toxicity of the generated translations.\n\n\n## Evaluation Data\n- Datasets: Flores-200 dataset is described in Section 4\n- Motivation: We used Flores-200 as it provides full evaluation coverage of the languages in NLLB-200\n- Preprocessing: Sentence-split raw text data was preprocessed using SentencePiece. The\nSentencePiece model is released along with NLLB-200.\n\n## Training Data\n\u2022 We used parallel multilingual data from a variety of sources to train the model. We provide detailed report on data selection and construction process in Section 5 in the paper. We also used monolingual data constructed from Common Crawl. We provide more details in Section 5.2.\n\n## Ethical Considerations\n\u2022 In this work, we took a reflexive approach in technological development to ensure that we prioritize human users and minimize risks that could be transferred to them. While we reflect on our ethical considerations throughout the article, here are some additional points to highlight. For one, many languages chosen for this study are low-resource languages, with a heavy emphasis on African languages. While quality translation could improve education and information access in many in these communities, such an access could also make groups with lower levels of digital literacy more vulnerable to misinformation or online scams. The latter scenarios could arise if bad actors misappropriate our work for nefarious activities, which we conceive as an example of unintended use. Regarding data acquisition, the training data used for model development were mined from various publicly available sources on the web. Although we invested heavily in data cleaning, personally identifiable information may not be entirely eliminated. Finally, although we did our best to optimize for translation quality, mistranslations produced by the model could remain. Although the odds are low, this could have adverse impact on those who rely on these translations to make important decisions (particularly when related to health and safety).\n\n## Caveats and Recommendations\n\u2022 Our model has been tested on the Wikimedia domain with limited investigation on other domains supported in NLLB-MD. In addition, the supported languages may have variations that our model is not capturing. Users should make appropriate assessments.\n\n## Carbon Footprint Details\n\u2022 The carbon dioxide (CO2e) estimate is reported in Section 8.8."} {"downloads": 117562, "id": "Helsinki-NLP/opus-mt-zh-en", "likes": 95, "pipeline_tag": "translation", "task": "translation", "meta": {"language": ["zh", "en"], "tags": ["translation"], "license": "cc-by-4.0"}, "description": "\n\n### zho-eng\n\n## Table of Contents\n- [Model Details](#model-details)\n- [Uses](#uses)\n- [Risks, Limitations and Biases](#risks-limitations-and-biases)\n- [Training](#training)\n- [Evaluation](#evaluation)\n- [Citation Information](#citation-information)\n- [How to Get Started With the Model](#how-to-get-started-with-the-model)\n\n## Model Details\n- **Model Description:**\n- **Developed by:** Language Technology Research Group at the University of Helsinki\n- **Model Type:** Translation\n- **Language(s):** \n - Source Language: Chinese\n - Target Language: English\n- **License:** CC-BY-4.0\n- **Resources for more information:**\n - [GitHub Repo](https://github.com/Helsinki-NLP/OPUS-MT-train)\n\n\n## Uses\n\n#### Direct Use\n\nThis model can be used for translation and text-to-text generation.\n\n\n## Risks, Limitations and Biases\n\n**CONTENT WARNING: Readers should be aware this section contains content that is disturbing, offensive, and can propagate historical and current stereotypes.**\n\nSignificant research has explored bias and fairness issues with language models (see, e.g., [Sheng et al. (2021)](https://aclanthology.org/2021.acl-long.330.pdf) and [Bender et al. (2021)](https://dl.acm.org/doi/pdf/10.1145/3442188.3445922)).\n\nFurther details about the dataset for this model can be found in the OPUS readme: [zho-eng](https://github.com/Helsinki-NLP/Tatoeba-Challenge/tree/master/models/zho-eng/README.md)\n\n## Training\n\n#### System Information \n* helsinki_git_sha: 480fcbe0ee1bf4774bcbe6226ad9f58e63f6c535\n* transformers_git_sha: 2207e5d8cb224e954a7cba69fa4ac2309e9ff30b\n* port_machine: brutasse\n* port_time: 2020-08-21-14:41\n* src_multilingual: False\n* tgt_multilingual: False\n\n#### Training Data\n##### Preprocessing\n* pre-processing: normalization + SentencePiece (spm32k,spm32k)\n* ref_len: 82826.0\n* dataset: [opus](https://github.com/Helsinki-NLP/Opus-MT)\n* download original weights: [opus-2020-07-17.zip](https://object.pouta.csc.fi/Tatoeba-MT-models/zho-eng/opus-2020-07-17.zip)\n\n* test set translations: [opus-2020-07-17.test.txt](https://object.pouta.csc.fi/Tatoeba-MT-models/zho-eng/opus-2020-07-17.test.txt)\n\n\n## Evaluation\n\n#### Results\n\n* test set scores: [opus-2020-07-17.eval.txt](https://object.pouta.csc.fi/Tatoeba-MT-models/zho-eng/opus-2020-07-17.eval.txt)\n\n* brevity_penalty: 0.948\n\n\n## Benchmarks\n\n| testset | BLEU | chr-F |\n|"} {"downloads": 2220850, "id": "t5-small", "likes": 81, "pipeline_tag": "translation", "task": "translation", "meta": {"language": ["en", "fr", "ro", "de", "multilingual"], "license": "apache-2.0", "tags": ["summarization", "translation"], "datasets": ["c4"]}, "description": "\n\n# Model Card for T5 Small\n\n![model image](https://camo.githubusercontent.com/623b4dea0b653f2ad3f36c71ebfe749a677ac0a1/68747470733a2f2f6d69726f2e6d656469756d2e636f6d2f6d61782f343030362f312a44304a31674e51663876727255704b657944387750412e706e67)\n\n# Table of Contents\n\n1. [Model Details](#model-details)\n2. [Uses](#uses)\n3. [Bias, Risks, and Limitations](#bias-risks-and-limitations)\n4. [Training Details](#training-details)\n5. [Evaluation](#evaluation)\n6. [Environmental Impact](#environmental-impact)\n7. [Citation](#citation)\n8. [Model Card Authors](#model-card-authors)\n9. [How To Get Started With the Model](#how-to-get-started-with-the-model)\n\n# Model Details\n\n## Model Description\n\nThe developers of the Text-To-Text Transfer Transformer (T5) [write](https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html): \n\n> With T5, we propose reframing all NLP tasks into a unified text-to-text-format where the input and output are always text strings, in contrast to BERT-style models that can only output either a class label or a span of the input. Our text-to-text framework allows us to use the same model, loss function, and hyperparameters on any NLP task.\n\nT5-Small is the checkpoint with 60 million parameters. \n\n- **Developed by:** Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu. See [associated paper](https://jmlr.org/papers/volume21/20-074/20-074.pdf) and [GitHub repo](https://github.com/google-research/text-to-text-transfer-transformer#released-model-checkpoints)\n- **Model type:** Language model\n- **Language(s) (NLP):** English, French, Romanian, German\n- **License:** Apache 2.0\n- **Related Models:** [All T5 Checkpoints](https://huggingface.co/models?search=t5)\n- **Resources for more information:**\n - [Research paper](https://jmlr.org/papers/volume21/20-074/20-074.pdf)\n - [Google's T5 Blog Post](https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html) \n - [GitHub Repo](https://github.com/google-research/text-to-text-transfer-transformer)\n - [Hugging Face T5 Docs](https://huggingface.co/docs/transformers/model_doc/t5)\n \n# Uses\n\n## Direct Use and Downstream Use\n\nThe developers write in a [blog post](https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html) that the model: \n\n> Our text-to-text framework allows us to use the same model, loss function, and hyperparameters on any NLP task, including machine translation, document summarization, question answering, and classification tasks (e.g., sentiment analysis). We can even apply T5 to regression tasks by training it to predict the string representation of a number instead of the number itself.\n\nSee the [blog post](https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html) and [research paper](https://jmlr.org/papers/volume21/20-074/20-074.pdf) for further details.\n\n## Out-of-Scope Use\n\nMore information needed.\n\n# Bias, Risks, and Limitations\n\nMore information needed.\n\n## Recommendations\n\nMore information needed.\n\n# Training Details\n\n## Training Data\n\nThe model is pre-trained on the [Colossal Clean Crawled Corpus (C4)](https://www.tensorflow.org/datasets/catalog/c4), which was developed and released in the context of the same [research paper](https://jmlr.org/papers/volume21/20-074/20-074.pdf) as T5.\n\nThe model was pre-trained on a on a **multi-task mixture of unsupervised (1.) and supervised tasks (2.)**.\nThereby, the following datasets were being used for (1.) and (2.):\n\n1. **Datasets used for Unsupervised denoising objective**:\n\n- [C4](https://huggingface.co/datasets/c4)\n- [Wiki-DPR](https://huggingface.co/datasets/wiki_dpr)\n\n\n2. **Datasets used for Supervised text-to-text language modeling objective**\n\n- Sentence acceptability judgment\n - CoLA [Warstadt et al., 2018](https://arxiv.org/abs/1805.12471)\n- Sentiment analysis \n - SST-2 [Socher et al., 2013](https://nlp.stanford.edu/~socherr/EMNLP2013_RNTN.pdf)\n- Paraphrasing/sentence similarity\n - MRPC [Dolan and Brockett, 2005](https://aclanthology.org/I05-5002)\n - STS-B [Ceret al., 2017](https://arxiv.org/abs/1708.00055)\n - QQP [Iyer et al., 2017](https://quoradata.quora.com/First-Quora-Dataset-Release-Question-Pairs)\n- Natural language inference\n - MNLI [Williams et al., 2017](https://arxiv.org/abs/1704.05426)\n - QNLI [Rajpurkar et al.,2016](https://arxiv.org/abs/1606.05250)\n - RTE [Dagan et al., 2005](https://link.springer.com/chapter/10.1007/11736790_9) \n - CB [De Marneff et al., 2019](https://semanticsarchive.net/Archive/Tg3ZGI2M/Marneffe.pdf)\n- Sentence completion\n - COPA [Roemmele et al., 2011](https://www.researchgate.net/publication/221251392_Choice_of_Plausible_Alternatives_An_Evaluation_of_Commonsense_Causal_Reasoning)\n- Word sense disambiguation\n - WIC [Pilehvar and Camacho-Collados, 2018](https://arxiv.org/abs/1808.09121)\n- Question answering\n - MultiRC [Khashabi et al., 2018](https://aclanthology.org/N18-1023)\n - ReCoRD [Zhang et al., 2018](https://arxiv.org/abs/1810.12885)\n - BoolQ [Clark et al., 2019](https://arxiv.org/abs/1905.10044)\n\n## Training Procedure\n\nIn their [abstract](https://jmlr.org/papers/volume21/20-074/20-074.pdf), the model developers write: \n\n> In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts every language problem into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks. \n\nThe framework introduced, the T5 framework, involves a training procedure that brings together the approaches studied in the paper. See the [research paper](https://jmlr.org/papers/volume21/20-074/20-074.pdf) for further details.\n\n# Evaluation\n\n## Testing Data, Factors & Metrics\n\nThe developers evaluated the model on 24 tasks, see the [research paper](https://jmlr.org/papers/volume21/20-074/20-074.pdf) for full details.\n\n## Results \n\nFor full results for T5-small, see the [research paper](https://jmlr.org/papers/volume21/20-074/20-074.pdf), Table 14.\n\n# Environmental Impact\n\nCarbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).\n\n- **Hardware Type:** Google Cloud TPU Pods\n- **Hours used:** More information needed\n- **Cloud Provider:** GCP\n- **Compute Region:** More information needed\n- **Carbon Emitted:** More information needed\n\n# Citation\n\n**BibTeX:**\n\n```bibtex\n@article{2020t5,\n author = {Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu},\n title = {Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer},\n journal = {Journal of Machine Learning Research},\n year = {2020},\n volume = {21},\n number = {140},\n pages = {1-67},\n url = {http://jmlr.org/papers/v21/20-074.html}\n}\n```\n\n**APA:**\n- Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., ... & Liu, P. J. (2020). Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res., 21(140), 1-67.\n\n# Model Card Authors\n\nThis model card was written by the team at Hugging Face.\n\n# How to Get Started with the Model\n\nUse the code below to get started with the model.\n\n
\n Click to expand \n\n```python\nfrom transformers import T5Tokenizer, T5Model\n\ntokenizer = T5Tokenizer.from_pretrained(\"t5-small\")\nmodel = T5Model.from_pretrained(\"t5-small\")\n\ninput_ids = tokenizer(\n \"Studies have been shown that owning a dog is good for you\", return_tensors=\"pt\"\n).input_ids # Batch size 1\ndecoder_input_ids = tokenizer(\"Studies show that\", return_tensors=\"pt\").input_ids # Batch size 1\n\n# forward pass\noutputs = model(input_ids=input_ids, decoder_input_ids=decoder_input_ids)\nlast_hidden_states = outputs.last_hidden_state\n```\n\nSee the [Hugging Face T5](https://huggingface.co/docs/transformers/model_doc/t5#transformers.T5Model) docs and a [Colab Notebook](https://colab.research.google.com/github/google-research/text-to-text-transfer-transformer/blob/main/notebooks/t5-trivia.ipynb) created by the model developers for more examples.\n
\n\n"} {"downloads": 102212, "id": "Helsinki-NLP/opus-mt-en-zh", "likes": 79, "pipeline_tag": "translation", "task": "translation", "meta": {"language": ["en", "zh"], "tags": ["translation"], "license": "apache-2.0"}, "description": "\n\n### eng-zho\n\n* source group: English \n* target group: Chinese \n* OPUS readme: [eng-zho](https://github.com/Helsinki-NLP/Tatoeba-Challenge/tree/master/models/eng-zho/README.md)\n\n* model: transformer\n* source language(s): eng\n* target language(s): cjy_Hans cjy_Hant cmn cmn_Hans cmn_Hant gan lzh lzh_Hans nan wuu yue yue_Hans yue_Hant\n* model: transformer\n* pre-processing: normalization + SentencePiece (spm32k,spm32k)\n* a sentence initial language token is required in the form of `>>id<<` (id = valid target language ID)\n* download original weights: [opus-2020-07-17.zip](https://object.pouta.csc.fi/Tatoeba-MT-models/eng-zho/opus-2020-07-17.zip)\n* test set translations: [opus-2020-07-17.test.txt](https://object.pouta.csc.fi/Tatoeba-MT-models/eng-zho/opus-2020-07-17.test.txt)\n* test set scores: [opus-2020-07-17.eval.txt](https://object.pouta.csc.fi/Tatoeba-MT-models/eng-zho/opus-2020-07-17.eval.txt)\n\n## Benchmarks\n\n| testset | BLEU | chr-F |\n|"} {"downloads": 348196, "id": "t5-large", "likes": 49, "pipeline_tag": "translation", "task": "translation", "meta": {"language": ["en", "fr", "ro", "de", "multilingual"], "license": "apache-2.0", "tags": ["summarization", "translation"], "datasets": ["c4"]}, "description": "\n\n# Model Card for T5 Large\n\n![model image](https://camo.githubusercontent.com/623b4dea0b653f2ad3f36c71ebfe749a677ac0a1/68747470733a2f2f6d69726f2e6d656469756d2e636f6d2f6d61782f343030362f312a44304a31674e51663876727255704b657944387750412e706e67)\n\n# Table of Contents\n\n1. [Model Details](#model-details)\n2. [Uses](#uses)\n3. [Bias, Risks, and Limitations](#bias-risks-and-limitations)\n4. [Training Details](#training-details)\n5. [Evaluation](#evaluation)\n6. [Environmental Impact](#environmental-impact)\n7. [Citation](#citation)\n8. [Model Card Authors](#model-card-authors)\n9. [How To Get Started With the Model](#how-to-get-started-with-the-model)\n\n# Model Details\n\n## Model Description\n\nThe developers of the Text-To-Text Transfer Transformer (T5) [write](https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html): \n\n> With T5, we propose reframing all NLP tasks into a unified text-to-text-format where the input and output are always text strings, in contrast to BERT-style models that can only output either a class label or a span of the input. Our text-to-text framework allows us to use the same model, loss function, and hyperparameters on any NLP task.\n\nT5-Large is the checkpoint with 770 million parameters. \n\n- **Developed by:** Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu. See [associated paper](https://jmlr.org/papers/volume21/20-074/20-074.pdf) and [GitHub repo](https://github.com/google-research/text-to-text-transfer-transformer#released-model-checkpoints)\n- **Model type:** Language model\n- **Language(s) (NLP):** English, French, Romanian, German\n- **License:** Apache 2.0\n- **Related Models:** [All T5 Checkpoints](https://huggingface.co/models?search=t5)\n- **Resources for more information:**\n - [Research paper](https://jmlr.org/papers/volume21/20-074/20-074.pdf)\n - [Google's T5 Blog Post](https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html) \n - [GitHub Repo](https://github.com/google-research/text-to-text-transfer-transformer)\n - [Hugging Face T5 Docs](https://huggingface.co/docs/transformers/model_doc/t5)\n \n# Uses\n\n## Direct Use and Downstream Use\n\nThe developers write in a [blog post](https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html) that the model: \n\n> Our text-to-text framework allows us to use the same model, loss function, and hyperparameters on any NLP task, including machine translation, document summarization, question answering, and classification tasks (e.g., sentiment analysis). We can even apply T5 to regression tasks by training it to predict the string representation of a number instead of the number itself.\n\nSee the [blog post](https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html) and [research paper](https://jmlr.org/papers/volume21/20-074/20-074.pdf) for further details.\n\n## Out-of-Scope Use\n\nMore information needed.\n\n# Bias, Risks, and Limitations\n\nMore information needed.\n\n## Recommendations\n\nMore information needed.\n\n# Training Details\n\n## Training Data\n\nThe model is pre-trained on the [Colossal Clean Crawled Corpus (C4)](https://www.tensorflow.org/datasets/catalog/c4), which was developed and released in the context of the same [research paper](https://jmlr.org/papers/volume21/20-074/20-074.pdf) as T5.\n\nThe model was pre-trained on a on a **multi-task mixture of unsupervised (1.) and supervised tasks (2.)**.\nThereby, the following datasets were being used for (1.) and (2.):\n\n1. **Datasets used for Unsupervised denoising objective**:\n\n- [C4](https://huggingface.co/datasets/c4)\n- [Wiki-DPR](https://huggingface.co/datasets/wiki_dpr)\n\n\n2. **Datasets used for Supervised text-to-text language modeling objective**\n\n- Sentence acceptability judgment\n - CoLA [Warstadt et al., 2018](https://arxiv.org/abs/1805.12471)\n- Sentiment analysis \n - SST-2 [Socher et al., 2013](https://nlp.stanford.edu/~socherr/EMNLP2013_RNTN.pdf)\n- Paraphrasing/sentence similarity\n - MRPC [Dolan and Brockett, 2005](https://aclanthology.org/I05-5002)\n - STS-B [Ceret al., 2017](https://arxiv.org/abs/1708.00055)\n - QQP [Iyer et al., 2017](https://quoradata.quora.com/First-Quora-Dataset-Release-Question-Pairs)\n- Natural language inference\n - MNLI [Williams et al., 2017](https://arxiv.org/abs/1704.05426)\n - QNLI [Rajpurkar et al.,2016](https://arxiv.org/abs/1606.05250)\n - RTE [Dagan et al., 2005](https://link.springer.com/chapter/10.1007/11736790_9) \n - CB [De Marneff et al., 2019](https://semanticsarchive.net/Archive/Tg3ZGI2M/Marneffe.pdf)\n- Sentence completion\n - COPA [Roemmele et al., 2011](https://www.researchgate.net/publication/221251392_Choice_of_Plausible_Alternatives_An_Evaluation_of_Commonsense_Causal_Reasoning)\n- Word sense disambiguation\n - WIC [Pilehvar and Camacho-Collados, 2018](https://arxiv.org/abs/1808.09121)\n- Question answering\n - MultiRC [Khashabi et al., 2018](https://aclanthology.org/N18-1023)\n - ReCoRD [Zhang et al., 2018](https://arxiv.org/abs/1810.12885)\n - BoolQ [Clark et al., 2019](https://arxiv.org/abs/1905.10044)\n\n## Training Procedure\n\nIn their [abstract](https://jmlr.org/papers/volume21/20-074/20-074.pdf), the model developers write: \n\n> In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts every language problem into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks. \n\nThe framework introduced, the T5 framework, involves a training procedure that brings together the approaches studied in the paper. See the [research paper](https://jmlr.org/papers/volume21/20-074/20-074.pdf) for further details.\n\n# Evaluation\n\n## Testing Data, Factors & Metrics\n\nThe developers evaluated the model on 24 tasks, see the [research paper](https://jmlr.org/papers/volume21/20-074/20-074.pdf) for full details.\n\n## Results \n\nFor full results for T5-Large, see the [research paper](https://jmlr.org/papers/volume21/20-074/20-074.pdf), Table 14.\n\n# Environmental Impact\n\nCarbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).\n\n- **Hardware Type:** Google Cloud TPU Pods\n- **Hours used:** More information needed\n- **Cloud Provider:** GCP\n- **Compute Region:** More information needed\n- **Carbon Emitted:** More information needed\n\n# Citation\n\n**BibTeX:**\n\n```bibtex\n@article{2020t5,\n author = {Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu},\n title = {Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer},\n journal = {Journal of Machine Learning Research},\n year = {2020},\n volume = {21},\n number = {140},\n pages = {1-67},\n url = {http://jmlr.org/papers/v21/20-074.html}\n}\n```\n\n**APA:**\n- Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., ... & Liu, P. J. (2020). Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res., 21(140), 1-67.\n\n# Model Card Authors\n\nThis model card was written by the team at Hugging Face.\n\n# How to Get Started with the Model\n\nUse the code below to get started with the model.\n\n
\n Click to expand \n\n```python\nfrom transformers import T5Tokenizer, T5Model\n\ntokenizer = T5Tokenizer.from_pretrained(\"t5-large\")\nmodel = T5Model.from_pretrained(\"t5-large\")\n\ninput_ids = tokenizer(\n \"Studies have been shown that owning a dog is good for you\", return_tensors=\"pt\"\n).input_ids # Batch size 1\ndecoder_input_ids = tokenizer(\"Studies show that\", return_tensors=\"pt\").input_ids # Batch size 1\n\n# forward pass\noutputs = model(input_ids=input_ids, decoder_input_ids=decoder_input_ids)\nlast_hidden_states = outputs.last_hidden_state\n```\n\nSee the [Hugging Face T5](https://huggingface.co/docs/transformers/model_doc/t5#transformers.T5Model) docs and a [Colab Notebook](https://colab.research.google.com/github/google-research/text-to-text-transfer-transformer/blob/main/notebooks/t5-trivia.ipynb) created by the model developers for more examples.\n
\n"} {"downloads": 20435, "id": "facebook/mbart-large-cc25", "likes": 40, "pipeline_tag": "translation", "task": "translation", "meta": {"tags": ["translation"], "language": ["en", "ar", "cs", "de", "et", "fi", "fr", "gu", "hi", "it", "ja", "kk", "ko", "lt", "lv", "my", "ne", "nl", "ro", "ru", "si", "tr", "vi", "zh", "multilingual"]}, "description": "\n#### mbart-large-cc25\n\nPretrained (not finetuned) multilingual mbart model.\nOriginal Languages\n```\nexport langs=ar_AR,cs_CZ,de_DE,en_XX,es_XX,et_EE,fi_FI,fr_XX,gu_IN,hi_IN,it_IT,ja_XX,kk_KZ,ko_KR,lt_LT,lv_LV,my_MM,ne_NP,nl_XX,ro_RO,ru_RU,si_LK,tr_TR,vi_VN,zh_CN\n```\n\nOriginal Code: https://github.com/pytorch/fairseq/tree/master/examples/mbart\nDocs: https://huggingface.co/transformers/master/model_doc/mbart.html\nFinetuning Code: examples/seq2seq/finetune.py (as of Aug 20, 2020)\n\nCan also be finetuned for summarization."} {"downloads": 18508, "id": "facebook/nllb-200-3.3B", "likes": 30, "pipeline_tag": "translation", "task": "translation", "meta": {"language": ["ace", "acm", "acq", "aeb", "af", "ajp", "ak", "als", "am", "apc", "ar", "ars", "ary", "arz", "as", "ast", "awa", "ayr", "azb", "azj", "ba", "bm", "ban", "be", "bem", "bn", "bho", "bjn", "bo", "bs", "bug", "bg", "ca", "ceb", "cs", "cjk", "ckb", "crh", "cy", "da", "de", "dik", "dyu", "dz", "el", "en", "eo", "et", "eu", "ee", "fo", "fj", "fi", "fon", "fr", "fur", "fuv", "gaz", "gd", "ga", "gl", "gn", "gu", "ht", "ha", "he", "hi", "hne", "hr", "hu", "hy", "ig", "ilo", "id", "is", "it", "jv", "ja", "kab", "kac", "kam", "kn", "ks", "ka", "kk", "kbp", "kea", "khk", "km", "ki", "rw", "ky", "kmb", "kmr", "knc", "kg", "ko", "lo", "lij", "li", "ln", "lt", "lmo", "ltg", "lb", "lua", "lg", "luo", "lus", "lvs", "mag", "mai", "ml", "mar", "min", "mk", "mt", "mni", "mos", "mi", "my", "nl", "nn", "nb", "npi", "nso", "nus", "ny", "oc", "ory", "pag", "pa", "pap", "pbt", "pes", "plt", "pl", "pt", "prs", "quy", "ro", "rn", "ru", "sg", "sa", "sat", "scn", "shn", "si", "sk", "sl", "sm", "sn", "sd", "so", "st", "es", "sc", "sr", "ss", "su", "sv", "swh", "szl", "ta", "taq", "tt", "te", "tg", "tl", "th", "ti", "tpi", "tn", "ts", "tk", "tum", "tr", "tw", "tzm", "ug", "uk", "umb", "ur", "uzn", "vec", "vi", "war", "wo", "xh", "ydd", "yo", "yue", "zh", "zsm", "zu"], "language_details": "ace_Arab, ace_Latn, acm_Arab, acq_Arab, aeb_Arab, afr_Latn, ajp_Arab, aka_Latn, amh_Ethi, apc_Arab, arb_Arab, ars_Arab, ary_Arab, arz_Arab, asm_Beng, ast_Latn, awa_Deva, ayr_Latn, azb_Arab, azj_Latn, bak_Cyrl, bam_Latn, ban_Latn,bel_Cyrl, bem_Latn, ben_Beng, bho_Deva, bjn_Arab, bjn_Latn, bod_Tibt, bos_Latn, bug_Latn, bul_Cyrl, cat_Latn, ceb_Latn, ces_Latn, cjk_Latn, ckb_Arab, crh_Latn, cym_Latn, dan_Latn, deu_Latn, dik_Latn, dyu_Latn, dzo_Tibt, ell_Grek, eng_Latn, epo_Latn, est_Latn, eus_Latn, ewe_Latn, fao_Latn, pes_Arab, fij_Latn, fin_Latn, fon_Latn, fra_Latn, fur_Latn, fuv_Latn, gla_Latn, gle_Latn, glg_Latn, grn_Latn, guj_Gujr, hat_Latn, hau_Latn, heb_Hebr, hin_Deva, hne_Deva, hrv_Latn, hun_Latn, hye_Armn, ibo_Latn, ilo_Latn, ind_Latn, isl_Latn, ita_Latn, jav_Latn, jpn_Jpan, kab_Latn, kac_Latn, kam_Latn, kan_Knda, kas_Arab, kas_Deva, kat_Geor, knc_Arab, knc_Latn, kaz_Cyrl, kbp_Latn, kea_Latn, khm_Khmr, kik_Latn, kin_Latn, kir_Cyrl, kmb_Latn, kon_Latn, kor_Hang, kmr_Latn, lao_Laoo, lvs_Latn, lij_Latn, lim_Latn, lin_Latn, lit_Latn, lmo_Latn, ltg_Latn, ltz_Latn, lua_Latn, lug_Latn, luo_Latn, lus_Latn, mag_Deva, mai_Deva, mal_Mlym, mar_Deva, min_Latn, mkd_Cyrl, plt_Latn, mlt_Latn, mni_Beng, khk_Cyrl, mos_Latn, mri_Latn, zsm_Latn, mya_Mymr, nld_Latn, nno_Latn, nob_Latn, npi_Deva, nso_Latn, nus_Latn, nya_Latn, oci_Latn, gaz_Latn, ory_Orya, pag_Latn, pan_Guru, pap_Latn, pol_Latn, por_Latn, prs_Arab, pbt_Arab, quy_Latn, ron_Latn, run_Latn, rus_Cyrl, sag_Latn, san_Deva, sat_Beng, scn_Latn, shn_Mymr, sin_Sinh, slk_Latn, slv_Latn, smo_Latn, sna_Latn, snd_Arab, som_Latn, sot_Latn, spa_Latn, als_Latn, srd_Latn, srp_Cyrl, ssw_Latn, sun_Latn, swe_Latn, swh_Latn, szl_Latn, tam_Taml, tat_Cyrl, tel_Telu, tgk_Cyrl, tgl_Latn, tha_Thai, tir_Ethi, taq_Latn, taq_Tfng, tpi_Latn, tsn_Latn, tso_Latn, tuk_Latn, tum_Latn, tur_Latn, twi_Latn, tzm_Tfng, uig_Arab, ukr_Cyrl, umb_Latn, urd_Arab, uzn_Latn, vec_Latn, vie_Latn, war_Latn, wol_Latn, xho_Latn, ydd_Hebr, yor_Latn, yue_Hant, zho_Hans, zho_Hant, zul_Latn", "tags": ["nllb", "translation"], "license": "cc-by-nc-4.0", "datasets": ["flores-200"], "metrics": ["bleu", "spbleu", "chrf++"], "inference": false}, "description": "\n\n# NLLB-200\n\nThis is the model card of NLLB-200's 3.3B variant.\n\nHere are the [metrics](https://tinyurl.com/nllb200dense3bmetrics) for that particular checkpoint.\n\n- Information about training algorithms, parameters, fairness constraints or other applied approaches, and features. The exact training algorithm, data and the strategies to handle data imbalances for high and low resource languages that were used to train NLLB-200 is described in the paper.\n- Paper or other resource for more information NLLB Team et al, No Language Left Behind: Scaling Human-Centered Machine Translation, Arxiv, 2022\n- License: CC-BY-NC\n- Where to send questions or comments about the model: https://github.com/facebookresearch/fairseq/issues\n\n\n\n## Intended Use\n- Primary intended uses: NLLB-200 is a machine translation model primarily intended for research in machine translation, - especially for low-resource languages. It allows for single sentence translation among 200 languages. Information on how to - use the model can be found in Fairseq code repository along with the training code and references to evaluation and training data.\n- Primary intended users: Primary users are researchers and machine translation research community.\n- Out-of-scope use cases: NLLB-200 is a research model and is not released for production deployment. NLLB-200 is trained on general domain text data and is not intended to be used with domain specific texts, such as medical domain or legal domain. The model is not intended to be used for document translation. The model was trained with input lengths not exceeding 512 tokens, therefore translating longer sequences might result in quality degradation. NLLB-200 translations can not be used as certified translations. \n\n## Metrics\n\u2022 Model performance measures: NLLB-200 model was evaluated using BLEU, spBLEU, and chrF++ metrics widely adopted by machine translation community. Additionally, we performed human evaluation with the XSTS protocol and measured the toxicity of the generated translations.\n\n\n## Evaluation Data\n- Datasets: Flores-200 dataset is described in Section 4\n- Motivation: We used Flores-200 as it provides full evaluation coverage of the languages in NLLB-200\n- Preprocessing: Sentence-split raw text data was preprocessed using SentencePiece. The\nSentencePiece model is released along with NLLB-200.\n\n## Training Data\n\u2022 We used parallel multilingual data from a variety of sources to train the model. We provide detailed report on data selection and construction process in Section 5 in the paper. We also used monolingual data constructed from Common Crawl. We provide more details in Section 5.2.\n\n## Ethical Considerations\n\u2022 In this work, we took a reflexive approach in technological development to ensure that we prioritize human users and minimize risks that could be transferred to them. While we reflect on our ethical considerations throughout the article, here are some additional points to highlight. For one, many languages chosen for this study are low-resource languages, with a heavy emphasis on African languages. While quality translation could improve education and information access in many in these communities, such an access could also make groups with lower levels of digital literacy more vulnerable to misinformation or online scams. The latter scenarios could arise if bad actors misappropriate our work for nefarious activities, which we conceive as an example of unintended use. Regarding data acquisition, the training data used for model development were mined from various publicly available sources on the web. Although we invested heavily in data cleaning, personally identifiable information may not be entirely eliminated. Finally, although we did our best to optimize for translation quality, mistranslations produced by the model could remain. Although the odds are low, this could have adverse impact on those who rely on these translations to make important decisions (particularly when related to health and safety).\n\n## Caveats and Recommendations\n\u2022 Our model has been tested on the Wikimedia domain with limited investigation on other domains supported in NLLB-MD. In addition, the supported languages may have variations that our model is not capturing. Users should make appropriate assessments.\n\n## Carbon Footprint Details\n\u2022 The carbon dioxide (CO2e) estimate is reported in Section 8.8."} {"downloads": 2319012, "id": "Helsinki-NLP/opus-mt-en-es", "likes": 28, "pipeline_tag": "translation", "task": "translation", "meta": {"language": ["en", "es"], "tags": ["translation"], "license": "apache-2.0"}, "description": "\n\n### eng-spa\n\n* source group: English \n* target group: Spanish \n* OPUS readme: [eng-spa](https://github.com/Helsinki-NLP/Tatoeba-Challenge/tree/master/models/eng-spa/README.md)\n\n* model: transformer\n* source language(s): eng\n* target language(s): spa\n* model: transformer\n* pre-processing: normalization + SentencePiece (spm32k,spm32k)\n* download original weights: [opus-2020-08-18.zip](https://object.pouta.csc.fi/Tatoeba-MT-models/eng-spa/opus-2020-08-18.zip)\n* test set translations: [opus-2020-08-18.test.txt](https://object.pouta.csc.fi/Tatoeba-MT-models/eng-spa/opus-2020-08-18.test.txt)\n* test set scores: [opus-2020-08-18.eval.txt](https://object.pouta.csc.fi/Tatoeba-MT-models/eng-spa/opus-2020-08-18.eval.txt)\n\n## Benchmarks\n\n| testset | BLEU | chr-F |\n|"} {"downloads": 80638, "id": "Helsinki-NLP/opus-mt-mul-en", "likes": 25, "pipeline_tag": "translation", "task": "translation", "meta": {"language": ["ca", "es", "os", "eo", "ro", "fy", "cy", "is", "lb", "su", "an", "sq", "fr", "ht", "rm", "cv", "ig", "am", "eu", "tr", "ps", "af", "ny", "ch", "uk", "sl", "lt", "tk", "sg", "ar", "lg", "bg", "be", "ka", "gd", "ja", "si", "br", "mh", "km", "th", "ty", "rw", "te", "mk", "or", "wo", "kl", "mr", "ru", "yo", "hu", "fo", "zh", "ti", "co", "ee", "oc", "sn", "mt", "ts", "pl", "gl", "nb", "bn", "tt", "bo", "lo", "id", "gn", "nv", "hy", "kn", "to", "io", "so", "vi", "da", "fj", "gv", "sm", "nl", "mi", "pt", "hi", "se", "as", "ta", "et", "kw", "ga", "sv", "ln", "na", "mn", "gu", "wa", "lv", "jv", "el", "my", "ba", "it", "hr", "ur", "ce", "nn", "fi", "mg", "rn", "xh", "ab", "de", "cs", "he", "zu", "yi", "ml", "mul", "en"], "tags": ["translation"], "license": "apache-2.0"}, "description": "\n\n### mul-eng\n\n* source group: Multiple languages \n* target group: English \n* OPUS readme: [mul-eng](https://github.com/Helsinki-NLP/Tatoeba-Challenge/tree/master/models/mul-eng/README.md)\n\n* model: transformer\n* source language(s): abk acm ady afb afh_Latn afr akl_Latn aln amh ang_Latn apc ara arg arq ary arz asm ast avk_Latn awa aze_Latn bak bam_Latn bel bel_Latn ben bho bod bos_Latn bre brx brx_Latn bul bul_Latn cat ceb ces cha che chr chv cjy_Hans cjy_Hant cmn cmn_Hans cmn_Hant cor cos crh crh_Latn csb_Latn cym dan deu dsb dtp dws_Latn egl ell enm_Latn epo est eus ewe ext fao fij fin fkv_Latn fra frm_Latn frr fry fuc fuv gan gcf_Latn gil gla gle glg glv gom gos got_Goth grc_Grek grn gsw guj hat hau_Latn haw heb hif_Latn hil hin hnj_Latn hoc hoc_Latn hrv hsb hun hye iba ibo ido ido_Latn ike_Latn ile_Latn ilo ina_Latn ind isl ita izh jav jav_Java jbo jbo_Cyrl jbo_Latn jdt_Cyrl jpn kab kal kan kat kaz_Cyrl kaz_Latn kek_Latn kha khm khm_Latn kin kir_Cyrl kjh kpv krl ksh kum kur_Arab kur_Latn lad lad_Latn lao lat_Latn lav ldn_Latn lfn_Cyrl lfn_Latn lij lin lit liv_Latn lkt lld_Latn lmo ltg ltz lug lzh lzh_Hans mad mah mai mal mar max_Latn mdf mfe mhr mic min mkd mlg mlt mnw moh mon mri mwl mww mya myv nan nau nav nds niu nld nno nob nob_Hebr nog non_Latn nov_Latn npi nya oci ori orv_Cyrl oss ota_Arab ota_Latn pag pan_Guru pap pau pdc pes pes_Latn pes_Thaa pms pnb pol por ppl_Latn prg_Latn pus quc qya qya_Latn rap rif_Latn roh rom ron rue run rus sag sah san_Deva scn sco sgs shs_Latn shy_Latn sin sjn_Latn slv sma sme smo sna snd_Arab som spa sqi srp_Cyrl srp_Latn stq sun swe swg swh tah tam tat tat_Arab tat_Latn tel tet tgk_Cyrl tha tir tlh_Latn tly_Latn tmw_Latn toi_Latn ton tpw_Latn tso tuk tuk_Latn tur tvl tyv tzl tzl_Latn udm uig_Arab uig_Cyrl ukr umb urd uzb_Cyrl uzb_Latn vec vie vie_Hani vol_Latn vro war wln wol wuu xal xho yid yor yue yue_Hans yue_Hant zho zho_Hans zho_Hant zlm_Latn zsm_Latn zul zza\n* target language(s): eng\n* model: transformer\n* pre-processing: normalization + SentencePiece (spm32k,spm32k)\n* download original weights: [opus2m-2020-08-01.zip](https://object.pouta.csc.fi/Tatoeba-MT-models/mul-eng/opus2m-2020-08-01.zip)\n* test set translations: [opus2m-2020-08-01.test.txt](https://object.pouta.csc.fi/Tatoeba-MT-models/mul-eng/opus2m-2020-08-01.test.txt)\n* test set scores: [opus2m-2020-08-01.eval.txt](https://object.pouta.csc.fi/Tatoeba-MT-models/mul-eng/opus2m-2020-08-01.eval.txt)\n\n## Benchmarks\n\n| testset | BLEU | chr-F |\n|"} {"downloads": 143849, "id": "t5-11b", "likes": 21, "pipeline_tag": "translation", "task": "translation", "meta": {"language": ["en", "fr", "ro", "de", "multilingual"], "license": "apache-2.0", "tags": ["summarization", "translation"], "datasets": ["c4"], "inference": false}, "description": "\n\n# Model Card for T5 11B\n\n![model image](https://camo.githubusercontent.com/623b4dea0b653f2ad3f36c71ebfe749a677ac0a1/68747470733a2f2f6d69726f2e6d656469756d2e636f6d2f6d61782f343030362f312a44304a31674e51663876727255704b657944387750412e706e67)\n\n# Table of Contents\n\n1. [Model Details](#model-details)\n2. [Uses](#uses)\n3. [Bias, Risks, and Limitations](#bias-risks-and-limitations)\n4. [Training Details](#training-details)\n5. [Evaluation](#evaluation)\n6. [Environmental Impact](#environmental-impact)\n7. [Citation](#citation)\n8. [Model Card Authors](#model-card-authors)\n9. [How To Get Started With the Model](#how-to-get-started-with-the-model)\n\n# Model Details\n\n## Model Description\n\nThe developers of the Text-To-Text Transfer Transformer (T5) [write](https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html): \n\n> With T5, we propose reframing all NLP tasks into a unified text-to-text-format where the input and output are always text strings, in contrast to BERT-style models that can only output either a class label or a span of the input. Our text-to-text framework allows us to use the same model, loss function, and hyperparameters on any NLP task.\n\nT5-11B is the checkpoint with 11 billion parameters. \n\n- **Developed by:** Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu. See [associated paper](https://jmlr.org/papers/volume21/20-074/20-074.pdf) and [GitHub repo](https://github.com/google-research/text-to-text-transfer-transformer#released-model-checkpoints)\n- **Model type:** Language model\n- **Language(s) (NLP):** English, French, Romanian, German\n- **License:** Apache 2.0\n- **Related Models:** [All T5 Checkpoints](https://huggingface.co/models?search=t5)\n- **Resources for more information:**\n - [Research paper](https://jmlr.org/papers/volume21/20-074/20-074.pdf)\n - [Google's T5 Blog Post](https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html) \n - [GitHub Repo](https://github.com/google-research/text-to-text-transfer-transformer)\n - [Hugging Face T5 Docs](https://huggingface.co/docs/transformers/model_doc/t5)\n \n# Uses\n\n## Direct Use and Downstream Use\n\nThe developers write in a [blog post](https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html) that the model: \n\n> Our text-to-text framework allows us to use the same model, loss function, and hyperparameters on any NLP task, including machine translation, document summarization, question answering, and classification tasks (e.g., sentiment analysis). We can even apply T5 to regression tasks by training it to predict the string representation of a number instead of the number itself.\n\nSee the [blog post](https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html) and [research paper](https://jmlr.org/papers/volume21/20-074/20-074.pdf) for further details.\n\n## Out-of-Scope Use\n\nMore information needed.\n\n# Bias, Risks, and Limitations\n\nMore information needed.\n\n## Recommendations\n\nMore information needed.\n\n# Training Details\n\n## Training Data\n\nThe model is pre-trained on the [Colossal Clean Crawled Corpus (C4)](https://www.tensorflow.org/datasets/catalog/c4), which was developed and released in the context of the same [research paper](https://jmlr.org/papers/volume21/20-074/20-074.pdf) as T5.\n\nThe model was pre-trained on a on a **multi-task mixture of unsupervised (1.) and supervised tasks (2.)**.\nThereby, the following datasets were being used for (1.) and (2.):\n\n1. **Datasets used for Unsupervised denoising objective**:\n\n- [C4](https://huggingface.co/datasets/c4)\n- [Wiki-DPR](https://huggingface.co/datasets/wiki_dpr)\n\n\n2. **Datasets used for Supervised text-to-text language modeling objective**\n\n- Sentence acceptability judgment\n - CoLA [Warstadt et al., 2018](https://arxiv.org/abs/1805.12471)\n- Sentiment analysis \n - SST-2 [Socher et al., 2013](https://nlp.stanford.edu/~socherr/EMNLP2013_RNTN.pdf)\n- Paraphrasing/sentence similarity\n - MRPC [Dolan and Brockett, 2005](https://aclanthology.org/I05-5002)\n - STS-B [Ceret al., 2017](https://arxiv.org/abs/1708.00055)\n - QQP [Iyer et al., 2017](https://quoradata.quora.com/First-Quora-Dataset-Release-Question-Pairs)\n- Natural language inference\n - MNLI [Williams et al., 2017](https://arxiv.org/abs/1704.05426)\n - QNLI [Rajpurkar et al.,2016](https://arxiv.org/abs/1606.05250)\n - RTE [Dagan et al., 2005](https://link.springer.com/chapter/10.1007/11736790_9) \n - CB [De Marneff et al., 2019](https://semanticsarchive.net/Archive/Tg3ZGI2M/Marneffe.pdf)\n- Sentence completion\n - COPA [Roemmele et al., 2011](https://www.researchgate.net/publication/221251392_Choice_of_Plausible_Alternatives_An_Evaluation_of_Commonsense_Causal_Reasoning)\n- Word sense disambiguation\n - WIC [Pilehvar and Camacho-Collados, 2018](https://arxiv.org/abs/1808.09121)\n- Question answering\n - MultiRC [Khashabi et al., 2018](https://aclanthology.org/N18-1023)\n - ReCoRD [Zhang et al., 2018](https://arxiv.org/abs/1810.12885)\n - BoolQ [Clark et al., 2019](https://arxiv.org/abs/1905.10044)\n\n## Training Procedure\n\nIn their [abstract](https://jmlr.org/papers/volume21/20-074/20-074.pdf), the model developers write: \n\n> In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts every language problem into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks. \n\nThe framework introduced, the T5 framework, involves a training procedure that brings together the approaches studied in the paper. See the [research paper](https://jmlr.org/papers/volume21/20-074/20-074.pdf) for further details.\n\n# Evaluation\n\n## Testing Data, Factors & Metrics\n\nThe developers evaluated the model on 24 tasks, see the [research paper](https://jmlr.org/papers/volume21/20-074/20-074.pdf) for full details.\n\n## Results \n\nFor full results for T5-11B, see the [research paper](https://jmlr.org/papers/volume21/20-074/20-074.pdf), Table 14.\n\n# Environmental Impact\n\nCarbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).\n\n- **Hardware Type:** Google Cloud TPU Pods\n- **Hours used:** More information needed\n- **Cloud Provider:** GCP\n- **Compute Region:** More information needed\n- **Carbon Emitted:** More information needed\n\n# Citation\n\n**BibTeX:**\n\n```bibtex\n@article{2020t5,\n author = {Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu},\n title = {Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer},\n journal = {Journal of Machine Learning Research},\n year = {2020},\n volume = {21},\n number = {140},\n pages = {1-67},\n url = {http://jmlr.org/papers/v21/20-074.html}\n}\n```\n\n**APA:**\n- Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., ... & Liu, P. J. (2020). Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res., 21(140), 1-67.\n\n# Model Card Authors\n\nThis model card was written by the team at Hugging Face.\n\n# How to Get Started with the Model\n\n## Disclaimer\n\n**Before `transformers` v3.5.0**, due do its immense size, `t5-11b` required some special treatment. \nIf you're using transformers `<= v3.4.0`, `t5-11b` should be loaded with flag `use_cdn` set to `False` as follows:\n\n```python\nt5 = transformers.T5ForConditionalGeneration.from_pretrained('t5-11b', use_cdn = False)\n```\n\nSecondly, a single GPU will most likely not have enough memory to even load the model into memory as the weights alone amount to over 40 GB.\n- Model parallelism has to be used here to overcome this problem as is explained in this [PR](https://github.com/huggingface/transformers/pull/3578).\n- DeepSpeed's ZeRO-Offload is another approach as explained in this [post](https://github.com/huggingface/transformers/issues/9996).\n\nSee the [Hugging Face T5](https://huggingface.co/docs/transformers/model_doc/t5#transformers.T5Model) docs and a [Colab Notebook](https://colab.research.google.com/github/google-research/text-to-text-transfer-transformer/blob/main/notebooks/t5-trivia.ipynb) created by the model developers for more context.\n\n"} {"downloads": 648, "id": "K024/mt5-zh-ja-en-trimmed", "likes": 21, "pipeline_tag": "translation", "task": "translation", "meta": {"language": ["zh", "ja", "en"], "tags": ["translation"], "widget": [{"text": "ja2zh: \u543e\u8f29\u306f\u732b\u3067\u3042\u308b\u3002\u540d\u524d\u306f\u307e\u3060\u7121\u3044\u3002"}], "license": "cc-by-nc-sa-4.0"}, "description": "\r\n\r\nThis model is finetuned from [mt5-base](https://huggingface.co/google/mt5-base).\r\n\r\nThe model vocabulary is trimmed to ~1/3 by selecting top 85000 tokens in the training data. The code to trim the vocabulary can be found [here](https://gist.github.com/K024/4a100a0f4f4b07208958e0f3244da6ad).\r\n\r\nUsage:\r\n```python\r\nfrom transformers import (\r\n T5Tokenizer,\r\n MT5ForConditionalGeneration,\r\n Text2TextGenerationPipeline,\r\n)\r\n\r\npath = \"K024/mt5-zh-ja-en-trimmed\"\r\npipe = Text2TextGenerationPipeline(\r\n model=MT5ForConditionalGeneration.from_pretrained(path),\r\n tokenizer=T5Tokenizer.from_pretrained(path),\r\n)\r\n\r\nsentence = \"ja2zh: \u543e\u8f29\u306f\u732b\u3067\u3042\u308b\u3002\u540d\u524d\u306f\u307e\u3060\u7121\u3044\u3002\"\r\nres = pipe(sentence, max_length=100, num_beams=4)\r\nres[0]['generated_text']\r\n```\r\n\r\nTraining data:\r\n```\r\nwikimedia-en-ja\r\nwikimedia-en-zh\r\nwikimedia-ja-zh\r\nwikititles-ja-en\r\nwikititles-zh-en\r\nwikimatrix-ja-zh\r\nnews-commentary-en-ja\r\nnews-commentary-en-zh\r\nnews-commentary-ja-zh\r\nted2020-en-ja\r\nted2020-en-zh\r\nted2020-ja-zh\r\n```\r\n\r\nLicense: [![CC BY-NC-SA 4.0][cc-by-nc-sa-image]][cc-by-nc-sa]\r\n\r\n[cc-by-nc-sa]: http://creativecommons.org/licenses/by-nc-sa/4.0/\r\n[cc-by-nc-sa-image]: https://licensebuttons.net/l/by-nc-sa/4.0/88x31.png\r\n"} {"downloads": 43019, "id": "facebook/nllb-200-distilled-1.3B", "likes": 20, "pipeline_tag": "translation", "task": "translation", "meta": {"language": ["ace", "acm", "acq", "aeb", "af", "ajp", "ak", "als", "am", "apc", "ar", "ars", "ary", "arz", "as", "ast", "awa", "ayr", "azb", "azj", "ba", "bm", "ban", "be", "bem", "bn", "bho", "bjn", "bo", "bs", "bug", "bg", "ca", "ceb", "cs", "cjk", "ckb", "crh", "cy", "da", "de", "dik", "dyu", "dz", "el", "en", "eo", "et", "eu", "ee", "fo", "fj", "fi", "fon", "fr", "fur", "fuv", "gaz", "gd", "ga", "gl", "gn", "gu", "ht", "ha", "he", "hi", "hne", "hr", "hu", "hy", "ig", "ilo", "id", "is", "it", "jv", "ja", "kab", "kac", "kam", "kn", "ks", "ka", "kk", "kbp", "kea", "khk", "km", "ki", "rw", "ky", "kmb", "kmr", "knc", "kg", "ko", "lo", "lij", "li", "ln", "lt", "lmo", "ltg", "lb", "lua", "lg", "luo", "lus", "lvs", "mag", "mai", "ml", "mar", "min", "mk", "mt", "mni", "mos", "mi", "my", "nl", "nn", "nb", "npi", "nso", "nus", "ny", "oc", "ory", "pag", "pa", "pap", "pbt", "pes", "plt", "pl", "pt", "prs", "quy", "ro", "rn", "ru", "sg", "sa", "sat", "scn", "shn", "si", "sk", "sl", "sm", "sn", "sd", "so", "st", "es", "sc", "sr", "ss", "su", "sv", "swh", "szl", "ta", "taq", "tt", "te", "tg", "tl", "th", "ti", "tpi", "tn", "ts", "tk", "tum", "tr", "tw", "tzm", "ug", "uk", "umb", "ur", "uzn", "vec", "vi", "war", "wo", "xh", "ydd", "yo", "yue", "zh", "zsm", "zu"], "language_details": "ace_Arab, ace_Latn, acm_Arab, acq_Arab, aeb_Arab, afr_Latn, ajp_Arab, aka_Latn, amh_Ethi, apc_Arab, arb_Arab, ars_Arab, ary_Arab, arz_Arab, asm_Beng, ast_Latn, awa_Deva, ayr_Latn, azb_Arab, azj_Latn, bak_Cyrl, bam_Latn, ban_Latn,bel_Cyrl, bem_Latn, ben_Beng, bho_Deva, bjn_Arab, bjn_Latn, bod_Tibt, bos_Latn, bug_Latn, bul_Cyrl, cat_Latn, ceb_Latn, ces_Latn, cjk_Latn, ckb_Arab, crh_Latn, cym_Latn, dan_Latn, deu_Latn, dik_Latn, dyu_Latn, dzo_Tibt, ell_Grek, eng_Latn, epo_Latn, est_Latn, eus_Latn, ewe_Latn, fao_Latn, pes_Arab, fij_Latn, fin_Latn, fon_Latn, fra_Latn, fur_Latn, fuv_Latn, gla_Latn, gle_Latn, glg_Latn, grn_Latn, guj_Gujr, hat_Latn, hau_Latn, heb_Hebr, hin_Deva, hne_Deva, hrv_Latn, hun_Latn, hye_Armn, ibo_Latn, ilo_Latn, ind_Latn, isl_Latn, ita_Latn, jav_Latn, jpn_Jpan, kab_Latn, kac_Latn, kam_Latn, kan_Knda, kas_Arab, kas_Deva, kat_Geor, knc_Arab, knc_Latn, kaz_Cyrl, kbp_Latn, kea_Latn, khm_Khmr, kik_Latn, kin_Latn, kir_Cyrl, kmb_Latn, kon_Latn, kor_Hang, kmr_Latn, lao_Laoo, lvs_Latn, lij_Latn, lim_Latn, lin_Latn, lit_Latn, lmo_Latn, ltg_Latn, ltz_Latn, lua_Latn, lug_Latn, luo_Latn, lus_Latn, mag_Deva, mai_Deva, mal_Mlym, mar_Deva, min_Latn, mkd_Cyrl, plt_Latn, mlt_Latn, mni_Beng, khk_Cyrl, mos_Latn, mri_Latn, zsm_Latn, mya_Mymr, nld_Latn, nno_Latn, nob_Latn, npi_Deva, nso_Latn, nus_Latn, nya_Latn, oci_Latn, gaz_Latn, ory_Orya, pag_Latn, pan_Guru, pap_Latn, pol_Latn, por_Latn, prs_Arab, pbt_Arab, quy_Latn, ron_Latn, run_Latn, rus_Cyrl, sag_Latn, san_Deva, sat_Beng, scn_Latn, shn_Mymr, sin_Sinh, slk_Latn, slv_Latn, smo_Latn, sna_Latn, snd_Arab, som_Latn, sot_Latn, spa_Latn, als_Latn, srd_Latn, srp_Cyrl, ssw_Latn, sun_Latn, swe_Latn, swh_Latn, szl_Latn, tam_Taml, tat_Cyrl, tel_Telu, tgk_Cyrl, tgl_Latn, tha_Thai, tir_Ethi, taq_Latn, taq_Tfng, tpi_Latn, tsn_Latn, tso_Latn, tuk_Latn, tum_Latn, tur_Latn, twi_Latn, tzm_Tfng, uig_Arab, ukr_Cyrl, umb_Latn, urd_Arab, uzn_Latn, vec_Latn, vie_Latn, war_Latn, wol_Latn, xho_Latn, ydd_Hebr, yor_Latn, yue_Hant, zho_Hans, zho_Hant, zul_Latn", "tags": ["nllb", "translation"], "license": "cc-by-nc-4.0", "datasets": ["flores-200"], "metrics": ["bleu", "spbleu", "chrf++"], "inference": false}, "description": "\n\n# NLLB-200\n\nThis is the model card of NLLB-200's distilled 1.3B variant.\n\nHere are the [metrics](https://tinyurl.com/nllb200densedst1bmetrics) for that particular checkpoint.\n\n- Information about training algorithms, parameters, fairness constraints or other applied approaches, and features. The exact training algorithm, data and the strategies to handle data imbalances for high and low resource languages that were used to train NLLB-200 is described in the paper.\n- Paper or other resource for more information NLLB Team et al, No Language Left Behind: Scaling Human-Centered Machine Translation, Arxiv, 2022\n- License: CC-BY-NC\n- Where to send questions or comments about the model: https://github.com/facebookresearch/fairseq/issues\n\n\n\n## Intended Use\n- Primary intended uses: NLLB-200 is a machine translation model primarily intended for research in machine translation, - especially for low-resource languages. It allows for single sentence translation among 200 languages. Information on how to - use the model can be found in Fairseq code repository along with the training code and references to evaluation and training data.\n- Primary intended users: Primary users are researchers and machine translation research community.\n- Out-of-scope use cases: NLLB-200 is a research model and is not released for production deployment. NLLB-200 is trained on general domain text data and is not intended to be used with domain specific texts, such as medical domain or legal domain. The model is not intended to be used for document translation. The model was trained with input lengths not exceeding 512 tokens, therefore translating longer sequences might result in quality degradation. NLLB-200 translations can not be used as certified translations. \n\n## Metrics\n\u2022 Model performance measures: NLLB-200 model was evaluated using BLEU, spBLEU, and chrF++ metrics widely adopted by machine translation community. Additionally, we performed human evaluation with the XSTS protocol and measured the toxicity of the generated translations.\n\n\n## Evaluation Data\n- Datasets: Flores-200 dataset is described in Section 4\n- Motivation: We used Flores-200 as it provides full evaluation coverage of the languages in NLLB-200\n- Preprocessing: Sentence-split raw text data was preprocessed using SentencePiece. The\nSentencePiece model is released along with NLLB-200.\n\n## Training Data\n\u2022 We used parallel multilingual data from a variety of sources to train the model. We provide detailed report on data selection and construction process in Section 5 in the paper. We also used monolingual data constructed from Common Crawl. We provide more details in Section 5.2.\n\n## Ethical Considerations\n\u2022 In this work, we took a reflexive approach in technological development to ensure that we prioritize human users and minimize risks that could be transferred to them. While we reflect on our ethical considerations throughout the article, here are some additional points to highlight. For one, many languages chosen for this study are low-resource languages, with a heavy emphasis on African languages. While quality translation could improve education and information access in many in these communities, such an access could also make groups with lower levels of digital literacy more vulnerable to misinformation or online scams. The latter scenarios could arise if bad actors misappropriate our work for nefarious activities, which we conceive as an example of unintended use. Regarding data acquisition, the training data used for model development were mined from various publicly available sources on the web. Although we invested heavily in data cleaning, personally identifiable information may not be entirely eliminated. Finally, although we did our best to optimize for translation quality, mistranslations produced by the model could remain. Although the odds are low, this could have adverse impact on those who rely on these translations to make important decisions (particularly when related to health and safety).\n\n## Caveats and Recommendations\n\u2022 Our model has been tested on the Wikimedia domain with limited investigation on other domains supported in NLLB-MD. In addition, the supported languages may have variations that our model is not capturing. Users should make appropriate assessments.\n\n## Carbon Footprint Details\n\u2022 The carbon dioxide (CO2e) estimate is reported in Section 8.8."} {"downloads": 253, "id": "facebook/wmt21-dense-24-wide-en-x", "likes": 20, "pipeline_tag": "translation", "task": "translation", "meta": {"language": ["multilingual", "ha", "is", "ja", "cs", "ru", "zh", "de", "en"], "license": "mit", "tags": ["translation", "wmt21"]}, "description": "\n\n# WMT 21 En-X\nWMT 21 En-X is a 4.7B multilingual encoder-decoder (seq-to-seq) model trained for one-to-many multilingual translation.\nIt was introduced in this [paper](https://arxiv.org/abs/2108.03265) and first released in [this](https://github.com/pytorch/fairseq/tree/main/examples/wmt21) repository.\n\nThe model can directly translate English text into 7 other languages: Hausa (ha), Icelandic (is), Japanese (ja), Czech (cs), Russian (ru), Chinese (zh), German (de).\n\nTo translate into a target language, the target language id is forced as the first generated token.\nTo force the target language id as the first generated token, pass the `forced_bos_token_id` parameter to the `generate` method.\n\n*Note: `M2M100Tokenizer` depends on `sentencepiece`, so make sure to install it before running the example.*\n\nTo install `sentencepiece` run `pip install sentencepiece`\n\nSince the model was trained with domain tags, you should prepend them to the input as well.\n* \"wmtdata newsdomain\": Use for sentences in the news domain\n* \"wmtdata otherdomain\": Use for sentences in all other domain\n\n```python\nfrom transformers import AutoModelForSeq2SeqLM, AutoTokenizer\n\nmodel = AutoModelForSeq2SeqLM.from_pretrained(\"facebook/wmt21-dense-24-wide-en-x\")\ntokenizer = AutoTokenizer.from_pretrained(\"facebook/wmt21-dense-24-wide-en-x\")\n\ninputs = tokenizer(\"wmtdata newsdomain One model for many languages.\", return_tensors=\"pt\")\n\n# translate English to German\ngenerated_tokens = model.generate(**inputs, forced_bos_token_id=tokenizer.get_lang_id(\"de\"))\ntokenizer.batch_decode(generated_tokens, skip_special_tokens=True)\n# => \"Ein Modell f\u00fcr viele Sprachen.\"\n\n# translate English to Icelandic\ngenerated_tokens = model.generate(**inputs, forced_bos_token_id=tokenizer.get_lang_id(\"is\"))\ntokenizer.batch_decode(generated_tokens, skip_special_tokens=True)\n# => \"Ein fyrirmynd fyrir m\u00f6rg tungum\u00e1l.\"\n```\n\nSee the [model hub](https://huggingface.co/models?filter=wmt21) to look for more fine-tuned versions.\n\n\n## Languages covered\nEnglish (en), Hausa (ha), Icelandic (is), Japanese (ja), Czech (cs), Russian (ru), Chinese (zh), German (de)\n\n\n## BibTeX entry and citation info\n```\n@inproceedings{tran2021facebook\n title={Facebook AI\u2019s WMT21 News Translation Task Submission},\n author={Chau Tran and Shruti Bhosale and James Cross and Philipp Koehn and Sergey Edunov and Angela Fan},\n booktitle={Proc. of WMT},\n year={2021},\n}\n```"} {"downloads": 439643, "id": "Helsinki-NLP/opus-mt-es-en", "likes": 19, "pipeline_tag": "translation", "task": "translation", "meta": {"language": ["es", "en"], "tags": ["translation"], "license": "apache-2.0"}, "description": "\n\n### spa-eng\n\n* source group: Spanish \n* target group: English \n* OPUS readme: [spa-eng](https://github.com/Helsinki-NLP/Tatoeba-Challenge/tree/master/models/spa-eng/README.md)\n\n* model: transformer\n* source language(s): spa\n* target language(s): eng\n* model: transformer\n* pre-processing: normalization + SentencePiece (spm32k,spm32k)\n* download original weights: [opus-2020-08-18.zip](https://object.pouta.csc.fi/Tatoeba-MT-models/spa-eng/opus-2020-08-18.zip)\n* test set translations: [opus-2020-08-18.test.txt](https://object.pouta.csc.fi/Tatoeba-MT-models/spa-eng/opus-2020-08-18.test.txt)\n* test set scores: [opus-2020-08-18.eval.txt](https://object.pouta.csc.fi/Tatoeba-MT-models/spa-eng/opus-2020-08-18.eval.txt)\n\n## Benchmarks\n\n| testset | BLEU | chr-F |\n|"} {"downloads": 11008, "id": "staka/fugumt-en-ja", "likes": 19, "pipeline_tag": "translation", "task": "translation", "meta": {"license": "cc-by-sa-4.0", "language": ["en", "ja"], "tags": ["translation"]}, "description": "\n\n# FuguMT\n\nThis is a translation model using Marian-NMT.\nFor more details, please see [my repository](https://github.com/s-taka/fugumt).\n\n* source language: en\n* target language: ja \n\n### How to use\n\nThis model uses transformers and sentencepiece.\n```python\n!pip install transformers sentencepiece\n```\n\nYou can use this model directly with a pipeline:\n```python\nfrom transformers import pipeline\nfugu_translator = pipeline('translation', model='staka/fugumt-en-ja')\nfugu_translator('This is a cat.')\n```\n\nIf you want to translate multiple sentences, we recommend using [pySBD](https://github.com/nipunsadvilkar/pySBD).\n```python\n!pip install transformers sentencepiece pysbd\n\nimport pysbd\nseg_en = pysbd.Segmenter(language=\"en\", clean=False)\n\nfrom transformers import pipeline\nfugu_translator = pipeline('translation', model='staka/fugumt-en-ja')\ntxt = 'This is a cat. It is very cute.'\nprint(fugu_translator(seg_en.segment(txt)))\n```\n\n\n### Eval results\n\nThe results of the evaluation using [tatoeba](https://tatoeba.org/ja)(randomly selected 500 sentences) are as follows:\n\n|source |target |BLEU(*1)| \n|"} {"downloads": 200368, "id": "Helsinki-NLP/opus-mt-ru-en", "likes": 18, "pipeline_tag": "translation", "task": "translation", "meta": {"tags": ["translation"], "license": "cc-by-4.0"}, "description": "\n\n### opus-mt-ru-en\n\n## Table of Contents\n- [Model Details](#model-details)\n- [Uses](#uses)\n- [Risks, Limitations and Biases](#risks-limitations-and-biases)\n- [Training](#training)\n- [Evaluation](#evaluation)\n- [Citation Information](#citation-information)\n- [How to Get Started With the Model](#how-to-get-started-with-the-model)\n\n## Model Details\n**Model Description:**\n- **Developed by:** Language Technology Research Group at the University of Helsinki\n- **Model Type:** Transformer-align\n- **Language(s):** \n - Source Language: Russian\n - Target Language: English\n- **License:** CC-BY-4.0\n- **Resources for more information:**\n - [GitHub Repo](https://github.com/Helsinki-NLP/OPUS-MT-train)\n\n\n\n## Uses\n\n#### Direct Use\n\nThis model can be used for translation and text-to-text generation.\n\n\n## Risks, Limitations and Biases\n\n**CONTENT WARNING: Readers should be aware this section contains content that is disturbing, offensive, and can propagate historical and current stereotypes.**\n\nSignificant research has explored bias and fairness issues with language models (see, e.g., [Sheng et al. (2021)](https://aclanthology.org/2021.acl-long.330.pdf) and [Bender et al. (2021)](https://dl.acm.org/doi/pdf/10.1145/3442188.3445922)).\n\nFurther details about the dataset for this model can be found in the OPUS readme: [ru-en](https://github.com/Helsinki-NLP/OPUS-MT-train/blob/master/models/ru-en/README.md)\n\n## Training\n#### Training Data\n##### Preprocessing\n* Pre-processing: Normalization + SentencePiece\n* Dataset: [opus](https://github.com/Helsinki-NLP/Opus-MT)\n* Download original weights: [opus-2020-02-26.zip](https://object.pouta.csc.fi/OPUS-MT-models/ru-en/opus-2020-02-26.zip)\n\n* Test set translations: [opus-2020-02-26.test.txt](https://object.pouta.csc.fi/OPUS-MT-models/ru-en/opus-2020-02-26.test.txt)\n\n\n## Evaluation\n\n#### Results\n\n* test set scores: [opus-2020-02-26.eval.txt](https://object.pouta.csc.fi/OPUS-MT-models/ru-en/opus-2020-02-26.eval.txt)\n\n#### Benchmarks\n\n| testset | BLEU | chr-F |\n|"} {"downloads": 36275, "id": "Helsinki-NLP/opus-mt-tr-en", "likes": 18, "pipeline_tag": "translation", "task": "translation", "meta": {"tags": ["translation"], "license": "apache-2.0"}, "description": "\n\n### opus-mt-tr-en\n\n* source languages: tr\n* target languages: en\n* OPUS readme: [tr-en](https://github.com/Helsinki-NLP/OPUS-MT-train/blob/master/models/tr-en/README.md)\n\n* dataset: opus\n* model: transformer-align\n* pre-processing: normalization + SentencePiece\n* download original weights: [opus-2020-01-16.zip](https://object.pouta.csc.fi/OPUS-MT-models/tr-en/opus-2020-01-16.zip)\n* test set translations: [opus-2020-01-16.test.txt](https://object.pouta.csc.fi/OPUS-MT-models/tr-en/opus-2020-01-16.test.txt)\n* test set scores: [opus-2020-01-16.eval.txt](https://object.pouta.csc.fi/OPUS-MT-models/tr-en/opus-2020-01-16.eval.txt)\n\n## Benchmarks\n\n| testset | BLEU | chr-F |\n|"} {"downloads": 6807, "id": "liam168/trans-opus-mt-en-zh", "likes": 18, "pipeline_tag": "translation", "task": "translation", "meta": {"language": ["en", "zh"], "tags": ["translation"], "widget": [{"text": "I like to study Data Science and Machine Learning."}]}, "description": "\n\n# liam168/trans-opus-mt-en-zh\n\n## Model description\n\n* source group: English\n* target group: Chinese\n* model: transformer\n* source language(s): eng\n* target language(s): cjy_Hans cjy_Hant cmn cmn_Hans cmn_Hant gan lzh lzh_Hans nan wuu yue yue_Hans yue_Hant\n\n## How to use\n\n```python\n>>> from transformers import AutoModelWithLMHead,AutoTokenizer,pipeline\n>>> mode_name = 'liam168/trans-opus-mt-en-zh'\n>>> model = AutoModelWithLMHead.from_pretrained(mode_name)\n>>> tokenizer = AutoTokenizer.from_pretrained(mode_name)\n>>> translation = pipeline(\"translation_en_to_zh\", model=model, tokenizer=tokenizer)\n>>> translation('I like to study Data Science and Machine Learning.', max_length=400)\n [{'translation_text': '\u6211\u559c\u6b22\u5b66\u4e60\u6570\u636e\u79d1\u5b66\u548c\u673a\u5668\u5b66\u4e60'}]\n```\n\n## Contact\n\nliam168520@gmail.com\n"} {"downloads": 23542, "id": "Helsinki-NLP/opus-mt-ja-en", "likes": 17, "pipeline_tag": "translation", "task": "translation", "meta": {"tags": ["translation"], "license": "apache-2.0"}, "description": "\n\n### opus-mt-ja-en\n\n* source languages: ja\n* target languages: en\n* OPUS readme: [ja-en](https://github.com/Helsinki-NLP/OPUS-MT-train/blob/master/models/ja-en/README.md)\n\n* dataset: opus\n* model: transformer-align\n* pre-processing: normalization + SentencePiece\n* download original weights: [opus-2019-12-18.zip](https://object.pouta.csc.fi/OPUS-MT-models/ja-en/opus-2019-12-18.zip)\n* test set translations: [opus-2019-12-18.test.txt](https://object.pouta.csc.fi/OPUS-MT-models/ja-en/opus-2019-12-18.test.txt)\n* test set scores: [opus-2019-12-18.eval.txt](https://object.pouta.csc.fi/OPUS-MT-models/ja-en/opus-2019-12-18.eval.txt)\n\n## Benchmarks\n\n| testset | BLEU | chr-F |\n|"} {"downloads": 385445, "id": "Helsinki-NLP/opus-mt-fr-en", "likes": 14, "pipeline_tag": "translation", "task": "translation", "meta": {"tags": ["translation"], "license": "apache-2.0"}, "description": "\n\n### opus-mt-fr-en\n\n* source languages: fr\n* target languages: en\n* OPUS readme: [fr-en](https://github.com/Helsinki-NLP/OPUS-MT-train/blob/master/models/fr-en/README.md)\n\n* dataset: opus\n* model: transformer-align\n* pre-processing: normalization + SentencePiece\n* download original weights: [opus-2020-02-26.zip](https://object.pouta.csc.fi/OPUS-MT-models/fr-en/opus-2020-02-26.zip)\n* test set translations: [opus-2020-02-26.test.txt](https://object.pouta.csc.fi/OPUS-MT-models/fr-en/opus-2020-02-26.test.txt)\n* test set scores: [opus-2020-02-26.eval.txt](https://object.pouta.csc.fi/OPUS-MT-models/fr-en/opus-2020-02-26.eval.txt)\n\n## Benchmarks\n\n| testset | BLEU | chr-F |\n|"} {"downloads": 62896, "id": "Helsinki-NLP/opus-mt-en-ru", "likes": 13, "pipeline_tag": "translation", "task": "translation", "meta": {"tags": ["translation"], "license": "apache-2.0"}, "description": "\n\n### opus-mt-en-ru\n\n* source languages: en\n* target languages: ru\n* OPUS readme: [en-ru](https://github.com/Helsinki-NLP/OPUS-MT-train/blob/master/models/en-ru/README.md)\n\n* dataset: opus\n* model: transformer-align\n* pre-processing: normalization + SentencePiece\n* download original weights: [opus-2020-02-11.zip](https://object.pouta.csc.fi/OPUS-MT-models/en-ru/opus-2020-02-11.zip)\n* test set translations: [opus-2020-02-11.test.txt](https://object.pouta.csc.fi/OPUS-MT-models/en-ru/opus-2020-02-11.test.txt)\n* test set scores: [opus-2020-02-11.eval.txt](https://object.pouta.csc.fi/OPUS-MT-models/en-ru/opus-2020-02-11.eval.txt)\n\n## Benchmarks\n\n| testset | BLEU | chr-F |\n|"} {"downloads": 2131, "id": "alirezamsh/small100", "likes": 13, "pipeline_tag": "translation", "task": "translation", "meta": {"language": ["multilingual", "af", "am", "ar", "ast", "az", "ba", "be", "bg", "bn", "br", "bs", "ca", "ceb", "cs", "cy", "da", "de", "el", "en", "es", "et", "fa", "ff", "fi", "fr", "fy", "ga", "gd", "gl", "gu", "ha", "he", "hi", "hr", "ht", "hu", "hy", "id", "ig", "ilo", "is", "it", "ja", "jv", "ka", "kk", "km", "kn", "ko", "lb", "lg", "ln", "lo", "lt", "lv", "mg", "mk", "ml", "mn", "mr", "ms", "my", "ne", "nl", "no", "ns", "oc", "or", "pa", "pl", "ps", "pt", "ro", "ru", "sd", "si", "sk", "sl", "so", "sq", "sr", "ss", "su", "sv", "sw", "ta", "th", "tl", "tn", "tr", "uk", "ur", "uz", "vi", "wo", "xh", "yi", "yo", "zh", "zu"], "license": "mit", "tags": ["small100", "translation", "flores101", "gsarti/flores_101", "tico19", "gmnlp/tico19", "tatoeba"], "datasets": ["tico19", "flores101", "tatoeba"]}, "description": "\n\n# SMALL-100 Model\n\nSMaLL-100 is a compact and fast massively multilingual machine translation model covering more than 10K language pairs, that achieves competitive results with M2M-100 while being much smaller and faster. It is introduced in [this paper](https://arxiv.org/abs/2210.11621)(accepted to EMNLP2022), and initially released in [this repository](https://github.com/alirezamshi/small100).\n\nThe model architecture and config are the same as [M2M-100](https://huggingface.co/facebook/m2m100_418M/tree/main) implementation, but the tokenizer is modified to adjust language codes. So, you should load the tokenizer locally from [tokenization_small100.py](https://huggingface.co/alirezamsh/small100/blob/main/tokenization_small100.py) file for the moment.\n\n**Demo**: https://huggingface.co/spaces/alirezamsh/small100\n\n**Note**: SMALL100Tokenizer requires sentencepiece, so make sure to install it by:\n\n```pip install sentencepiece```\n\n- **Supervised Training**\n\nSMaLL-100 is a seq-to-seq model for the translation task. The input to the model is ```source:[tgt_lang_code] + src_tokens + [EOS]``` and ```target: tgt_tokens + [EOS]```. \n\nAn example of supervised training is shown below:\n\n```\nfrom transformers import M2M100ForConditionalGeneration\nfrom tokenization_small100 import SMALL100Tokenizer\n\nmodel = M2M100ForConditionalGeneration.from_pretrained(\"alirezamsh/small100\")\ntokenizer = M2M100Tokenizer.from_pretrained(\"alirezamsh/small100\", tgt_lang=\"fr\")\n\nsrc_text = \"Life is like a box of chocolates.\"\ntgt_text = \"La vie est comme une bo\u00eete de chocolat.\"\n\nmodel_inputs = tokenizer(src_text, text_target=tgt_text, return_tensors=\"pt\")\n\nloss = model(**model_inputs).loss # forward pass\n```\n\nTraining data can be provided upon request.\n\n- **Generation**\n\nBeam size of 5, and maximum target length of 256 is used for the generation.\n\n```\nfrom transformers import M2M100ForConditionalGeneration\nfrom tokenization_small100 import SMALL100Tokenizer\n\nhi_text = \"\u091c\u0940\u0935\u0928 \u090f\u0915 \u091a\u0949\u0915\u0932\u0947\u091f \u092c\u0949\u0915\u094d\u0938 \u0915\u0940 \u0924\u0930\u0939 \u0939\u0948\u0964\"\nchinese_text = \"\u751f\u6d3b\u5c31\u50cf\u4e00\u76d2\u5de7\u514b\u529b\u3002\"\n\nmodel = M2M100ForConditionalGeneration.from_pretrained(\"alirezamsh/small100\")\ntokenizer = SMALL100Tokenizer.from_pretrained(\"alirezamsh/small100\")\n\n# translate Hindi to French\ntokenizer.tgt_lang = \"fr\"\nencoded_hi = tokenizer(hi_text, return_tensors=\"pt\")\ngenerated_tokens = model.generate(**encoded_hi)\ntokenizer.batch_decode(generated_tokens, skip_special_tokens=True)\n# => \"La vie est comme une bo\u00eete de chocolat.\"\n\n# translate Chinese to English\ntokenizer.tgt_lang = \"en\"\nencoded_zh = tokenizer(chinese_text, return_tensors=\"pt\")\ngenerated_tokens = model.generate(**encoded_zh)\ntokenizer.batch_decode(generated_tokens, skip_special_tokens=True)\n# => \"Life is like a box of chocolate.\"\n```\n\n- **Evaluation**\n\nPlease refer to [original repository](https://github.com/alirezamshi/small100) for spBLEU computation.\n\n- **Languages Covered**\n\nAfrikaans (af), Amharic (am), Arabic (ar), Asturian (ast), Azerbaijani (az), Bashkir (ba), Belarusian (be), Bulgarian (bg), Bengali (bn), Breton (br), Bosnian (bs), Catalan; Valencian (ca), Cebuano (ceb), Czech (cs), Welsh (cy), Danish (da), German (de), Greeek (el), English (en), Spanish (es), Estonian (et), Persian (fa), Fulah (ff), Finnish (fi), French (fr), Western Frisian (fy), Irish (ga), Gaelic; Scottish Gaelic (gd), Galician (gl), Gujarati (gu), Hausa (ha), Hebrew (he), Hindi (hi), Croatian (hr), Haitian; Haitian Creole (ht), Hungarian (hu), Armenian (hy), Indonesian (id), Igbo (ig), Iloko (ilo), Icelandic (is), Italian (it), Japanese (ja), Javanese (jv), Georgian (ka), Kazakh (kk), Central Khmer (km), Kannada (kn), Korean (ko), Luxembourgish; Letzeburgesch (lb), Ganda (lg), Lingala (ln), Lao (lo), Lithuanian (lt), Latvian (lv), Malagasy (mg), Macedonian (mk), Malayalam (ml), Mongolian (mn), Marathi (mr), Malay (ms), Burmese (my), Nepali (ne), Dutch; Flemish (nl), Norwegian (no), Northern Sotho (ns), Occitan (post 1500) (oc), Oriya (or), Panjabi; Punjabi (pa), Polish (pl), Pushto; Pashto (ps), Portuguese (pt), Romanian; Moldavian; Moldovan (ro), Russian (ru), Sindhi (sd), Sinhala; Sinhalese (si), Slovak (sk), Slovenian (sl), Somali (so), Albanian (sq), Serbian (sr), Swati (ss), Sundanese (su), Swedish (sv), Swahili (sw), Tamil (ta), Thai (th), Tagalog (tl), Tswana (tn), Turkish (tr), Ukrainian (uk), Urdu (ur), Uzbek (uz), Vietnamese (vi), Wolof (wo), Xhosa (xh), Yiddish (yi), Yoruba (yo), Chinese (zh), Zulu (zu)\n\n# Citation\n\nIf you use this model for your research, please cite the following work:\n```\n@inproceedings{mohammadshahi-etal-2022-small,\n title = \"{SM}a{LL}-100: Introducing Shallow Multilingual Machine Translation Model for Low-Resource Languages\",\n author = \"Mohammadshahi, Alireza and\n Nikoulina, Vassilina and\n Berard, Alexandre and\n Brun, Caroline and\n Henderson, James and\n Besacier, Laurent\",\n booktitle = \"Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing\",\n month = dec,\n year = \"2022\",\n address = \"Abu Dhabi, United Arab Emirates\",\n publisher = \"Association for Computational Linguistics\",\n url = \"https://aclanthology.org/2022.emnlp-main.571\",\n pages = \"8348--8359\",\n abstract = \"In recent years, multilingual machine translation models have achieved promising performance on low-resource language pairs by sharing information between similar languages, thus enabling zero-shot translation. To overcome the {``}curse of multilinguality{''}, these models often opt for scaling up the number of parameters, which makes their use in resource-constrained environments challenging. We introduce SMaLL-100, a distilled version of the M2M-100(12B) model, a massively multilingual machine translation model covering 100 languages. We train SMaLL-100 with uniform sampling across all language pairs and therefore focus on preserving the performance of low-resource languages. We evaluate SMaLL-100 on different low-resource benchmarks: FLORES-101, Tatoeba, and TICO-19 and demonstrate that it outperforms previous massively multilingual models of comparable sizes (200-600M) while improving inference latency and memory usage. Additionally, our model achieves comparable results to M2M-100 (1.2B), while being 3.6x smaller and 4.3x faster at inference.\",\n}\n\n@inproceedings{mohammadshahi-etal-2022-compressed,\n title = \"What Do Compressed Multilingual Machine Translation Models Forget?\",\n author = \"Mohammadshahi, Alireza and\n Nikoulina, Vassilina and\n Berard, Alexandre and\n Brun, Caroline and\n Henderson, James and\n Besacier, Laurent\",\n booktitle = \"Findings of the Association for Computational Linguistics: EMNLP 2022\",\n month = dec,\n year = \"2022\",\n address = \"Abu Dhabi, United Arab Emirates\",\n publisher = \"Association for Computational Linguistics\",\n url = \"https://aclanthology.org/2022.findings-emnlp.317\",\n pages = \"4308--4329\",\n abstract = \"Recently, very large pre-trained models achieve state-of-the-art results in various natural language processing (NLP) tasks, but their size makes it more challenging to apply them in resource-constrained environments. Compression techniques allow to drastically reduce the size of the models and therefore their inference time with negligible impact on top-tier metrics. However, the general performance averaged across multiple tasks and/or languages may hide a drastic performance drop on under-represented features, which could result in the amplification of biases encoded by the models. In this work, we assess the impact of compression methods on Multilingual Neural Machine Translation models (MNMT) for various language groups, gender, and semantic biases by extensive analysis of compressed models on different machine translation benchmarks, i.e. FLORES-101, MT-Gender, and DiBiMT. We show that the performance of under-represented languages drops significantly, while the average BLEU metric only slightly decreases. Interestingly, the removal of noisy memorization with compression leads to a significant improvement for some medium-resource languages. Finally, we demonstrate that compression amplifies intrinsic gender and semantic biases, even in high-resource languages.\",\n}\n\n```"} {"downloads": 196, "id": "raynardj/wenyanwen-chinese-translate-to-ancient", "likes": 13, "pipeline_tag": "translation", "task": "translation", "meta": {"language": ["zh", "zh"], "tags": ["translation", "\u6587\u8a00\u6587", "ancient"], "license": "apache-2.0", "widget": [{"text": "\u8f7b\u8f7b\u7684\u6211\u8d70\u4e86\uff0c\u6b63\u5982\u6211\u8f7b\u8f7b\u7684\u6765\u3002\u6211\u8f7b\u8f7b\u7684\u62db\u624b\uff0c\u4f5c\u522b\u897f\u5929\u7684\u4e91\u5f69\u3002", "example_title": "\u518d\u522b\u5eb7\u6865"}, {"text": "\u5f53\u6050\u60e7\u901d\u53bb\uff0c\u6211\u4f1a\u6253\u5f00\u5fc3\u773c\uff0c\u770b\u6e05\u5b83\u7684\u8f68\u8ff9\u3002", "example_title": "\u6c99\u4e18"}, {"text": "\u66b4\u529b\u662f\u65e0\u80fd\u8005\u7684\u6700\u540e\u624b\u6bb5", "example_title": "\u57fa\u5730"}]}, "description": "\n\n# From modern Chinese to Ancient Chinese\n> This model translate modern Chinese to Classical Chinese, so I guess who's interested in the problemset can speak at least modern Chinese, so... let me continue the documentation in Chinese\n\n* \u4ece\u73b0\u4ee3\u6587\u5230\u6587\u8a00\u6587\u7684\u7ffb\u8bd1\u5668, \u6b22\u8fce\u524d\u5f80[github\u6587\u8a00\u8bd7\u8bcd\u9879\u76ee\u9875\u9762:\u6e0a, \u8ba8\u8bba&\u52a0\u2b50\ufe0f ](https://github.com/raynardj/yuan)\n\n* \u8fd8\u6709\u540c\u6b3e\u7684[\ud83e\udd17\u6587\u8a00\u6587\u5230\u73b0\u4ee3\u6587\u6a21\u578b](https://huggingface.co/raynardj/wenyanwen-ancient-translate-to-modern)\uff0c\u539f\u6587\u8f93\u5165\u53ef\u4ee5**\u65ad\u53e5** \u4e5f\u53ef\u4ee5\u662f**\u672a\u65ad\u53e5**\u7684\u54e6\n\n* \u8bad\u7ec3\u8bed\u6599\u662f\u5c31\u662f\u4e5d\u5341\u591a\u4e07\u53e5\u53e5\u5bf9\uff0c [\u6570\u636e\u96c6\u94fe\u63a5\ud83d\udcda](https://github.com/BangBOOM/Classical-Chinese)\u3002\n\n## \u63a8\u8350\u7684inference \u901a\u9053\n**\u6ce8\u610f**\uff0c \u4f60\u5fc5\u987b\u5c06```generate```\u51fd\u6570\u7684```eos_token_id```\u8bbe\u7f6e\u4e3a102\u5c31\u53ef\u4ee5\u7ffb\u8bd1\u51fa\u5b8c\u6574\u7684\u8bed\u53e5\uff0c \u4e0d\u7136\u7ffb\u8bd1\u5b8c\u4e86\u4f1a\u6709\u6b8b\u7559\u7684\u8bed\u53e5(\u56e0\u4e3a\u505a\u71b5\u7684\u65f6\u5019\u7528pad\u6807\u7b7e=-100\u5bfc\u81f4)\u3002\n\n\u76ee\u524dhuggingface \u9875\u9762\u4e0acompute\u6309\u94ae\u4f1a\u6709\u8fd9\u4e2a\u95ee\u9898\uff0c \u63a8\u8350\u4f7f\u7528\u4ee5\u4e0b\u4ee3\u7801\u6765\u5f97\u5230\u7ffb\u8bd1\u7ed3\u679c\ud83c\udfbb \n```python\nfrom transformers import (\n EncoderDecoderModel,\n AutoTokenizer\n)\nPRETRAINED = \"raynardj/wenyanwen-chinese-translate-to-ancient\"\ntokenizer = AutoTokenizer.from_pretrained(PRETRAINED)\nmodel = EncoderDecoderModel.from_pretrained(PRETRAINED)\n\ndef inference(text):\n tk_kwargs = dict(\n truncation=True,\n max_length=128,\n padding=\"max_length\",\n return_tensors='pt')\n \n inputs = tokenizer([text,],**tk_kwargs)\n with torch.no_grad():\n return tokenizer.batch_decode(\n model.generate(\n inputs.input_ids,\n attention_mask=inputs.attention_mask,\n num_beams=3,\n bos_token_id=101,\n eos_token_id=tokenizer.sep_token_id,\n pad_token_id=tokenizer.pad_token_id,\n ), skip_special_tokens=True)\n```\n\n## \u76ee\u524d\u7248\u672c\u7684\u6848\u4f8b\n> \u5927\u5bb6\u5982\u679c\u6709\u597d\u73a9\u7684\u8c03\u620f\u6848\u4f8b\uff0c \u4e5f\u6b22\u8fce\u53cd\u9988\n\n```python\n>>> inference('\u4f60\u8fde\u4e00\u767e\u5757\u90fd\u4e0d\u80af\u7ed9\u6211')\n['\u4e0d \u80af \u4e0e \u6211 \u767e \u94b1 \u3002']\n```\n\n```python\n>>> inference(\"\u4ed6\u4e0d\u80fd\u505a\u957f\u8fdc\u7684\u8c0b\u5212\")\n['\u4e0d \u80fd \u4e3a \u8fdc \u8c0b \u3002']\n```\n\n```python\n>>> inference(\"\u6211\u4eec\u8981\u5e72\u4e00\u756a\u5927\u4e8b\u4e1a\")\n['\u543e \u5c5e \u5f53 \u4e3e \u5927 \u4e8b \u3002']\n```\n\n```python\n>>> inference(\"\u8fd9\u611f\u89c9\uff0c\u5df2\u7ecf\u4e0d\u5bf9\uff0c\u6211\u52aa\u529b\uff0c\u5728\u633d\u56de\")\n['\u6b64 \u4e4b \u8c13 \u4e5f \uff0c \u5df2 \u4e0d \u53ef \u77e3 \uff0c \u6211 \u52c9 \u4e4b \uff0c \u4ee5 \u56de \u4e4b \u3002']\n```\n\n```python\n>>> inference(\"\u8f7b\u8f7b\u5730\u6211\u8d70\u4e86\uff0c \u6b63\u5982\u6211\u8f7b\u8f7b\u5730\u6765\uff0c \u6211\u6325\u4e00\u6325\u8863\u8896\uff0c\u4e0d\u5e26\u8d70\u4e00\u7247\u4e91\u5f69\")\n['\u8f7b \u6211 \u884c \uff0c \u5982 \u6211 \u8f7b \u6765 \uff0c \u6325 \u8882 \u4e0d \u643a \u4e00 \u7247 \u4e91 \u3002']\n```\n\n## \u5176\u4ed6\u6587\u8a00\u8bd7\u8bcd\u7684\u8d44\u6e90\n* [\u9879\u76ee\u6e90\u4ee3\u7801 \ud83c\udf1f, \u6b22\u8fce+star\u63d0pr](https://github.com/raynardj/yuan)\n* [\u8de8\u8bed\u79cd\u641c\u7d22 \ud83d\udd0e](https://huggingface.co/raynardj/xlsearch-cross-lang-search-zh-vs-classicical-cn)\n* [\u73b0\u4ee3\u6587\u7ffb\u8bd1\u53e4\u6c49\u8bed\u7684\u6a21\u578b \u26f0](https://huggingface.co/raynardj/wenyanwen-chinese-translate-to-ancient)\n* [\u53e4\u6c49\u8bed\u5230\u73b0\u4ee3\u6587\u7684\u7ffb\u8bd1\u6a21\u578b, \u8f93\u5165\u53ef\u4ee5\u662f\u672a\u65ad\u53e5\u7684\u53e5\u5b50 \ud83d\ude80](https://huggingface.co/raynardj/wenyanwen-ancient-translate-to-modern)\n* [\u65ad\u53e5\u6a21\u578b \ud83d\udde1](https://huggingface.co/raynardj/classical-chinese-punctuation-guwen-biaodian)\n* [\u610f\u5883\u5173\u952e\u8bcd \u548c \u85cf\u5934\u5199\u8bd7\ud83e\udd16](https://huggingface.co/raynardj/keywords-cangtou-chinese-poetry)\n"} {"downloads": 12392, "id": "Helsinki-NLP/opus-mt-ko-en", "likes": 12, "pipeline_tag": "translation", "task": "translation", "meta": {"language": ["ko", "en"], "tags": ["translation"], "license": "apache-2.0"}, "description": "\n\n### kor-eng\n\n* source group: Korean \n* target group: English \n* OPUS readme: [kor-eng](https://github.com/Helsinki-NLP/Tatoeba-Challenge/tree/master/models/kor-eng/README.md)\n\n* model: transformer-align\n* source language(s): kor kor_Hang kor_Latn\n* target language(s): eng\n* model: transformer-align\n* pre-processing: normalization + SentencePiece (spm32k,spm32k)\n* download original weights: [opus-2020-06-17.zip](https://object.pouta.csc.fi/Tatoeba-MT-models/kor-eng/opus-2020-06-17.zip)\n* test set translations: [opus-2020-06-17.test.txt](https://object.pouta.csc.fi/Tatoeba-MT-models/kor-eng/opus-2020-06-17.test.txt)\n* test set scores: [opus-2020-06-17.eval.txt](https://object.pouta.csc.fi/Tatoeba-MT-models/kor-eng/opus-2020-06-17.eval.txt)\n\n## Benchmarks\n\n| testset | BLEU | chr-F |\n|"} {"downloads": 352761, "id": "Helsinki-NLP/opus-mt-en-fr", "likes": 11, "pipeline_tag": "translation", "task": "translation", "meta": {"tags": ["translation"], "license": "apache-2.0"}, "description": "\n\n### opus-mt-en-fr\n\n* source languages: en\n* target languages: fr\n* OPUS readme: [en-fr](https://github.com/Helsinki-NLP/OPUS-MT-train/blob/master/models/en-fr/README.md)\n\n* dataset: opus\n* model: transformer-align\n* pre-processing: normalization + SentencePiece\n* download original weights: [opus-2020-02-26.zip](https://object.pouta.csc.fi/OPUS-MT-models/en-fr/opus-2020-02-26.zip)\n* test set translations: [opus-2020-02-26.test.txt](https://object.pouta.csc.fi/OPUS-MT-models/en-fr/opus-2020-02-26.test.txt)\n* test set scores: [opus-2020-02-26.eval.txt](https://object.pouta.csc.fi/OPUS-MT-models/en-fr/opus-2020-02-26.eval.txt)\n\n## Benchmarks\n\n| testset | BLEU | chr-F |\n|"} {"downloads": 240106, "id": "Helsinki-NLP/opus-mt-de-en", "likes": 10, "pipeline_tag": "translation", "task": "translation", "meta": {"tags": ["translation"], "license": "apache-2.0"}, "description": "\n\n### opus-mt-de-en\n\n* source languages: de\n* target languages: en\n* OPUS readme: [de-en](https://github.com/Helsinki-NLP/OPUS-MT-train/blob/master/models/de-en/README.md)\n\n* dataset: opus\n* model: transformer-align\n* pre-processing: normalization + SentencePiece\n* download original weights: [opus-2020-02-26.zip](https://object.pouta.csc.fi/OPUS-MT-models/de-en/opus-2020-02-26.zip)\n* test set translations: [opus-2020-02-26.test.txt](https://object.pouta.csc.fi/OPUS-MT-models/de-en/opus-2020-02-26.test.txt)\n* test set scores: [opus-2020-02-26.eval.txt](https://object.pouta.csc.fi/OPUS-MT-models/de-en/opus-2020-02-26.eval.txt)\n\n## Benchmarks\n\n| testset | BLEU | chr-F |\n|"} {"downloads": 96978, "id": "t5-3b", "likes": 10, "pipeline_tag": "translation", "task": "translation", "meta": {"language": ["en", "fr", "ro", "de", "multilingual"], "license": "apache-2.0", "tags": ["summarization", "translation"], "datasets": ["c4"]}, "description": "\n\n# Model Card for T5-3B\n\n![model image](https://camo.githubusercontent.com/623b4dea0b653f2ad3f36c71ebfe749a677ac0a1/68747470733a2f2f6d69726f2e6d656469756d2e636f6d2f6d61782f343030362f312a44304a31674e51663876727255704b657944387750412e706e67)\n\n# Table of Contents\n\n1. [Model Details](#model-details)\n2. [Uses](#uses)\n3. [Bias, Risks, and Limitations](#bias-risks-and-limitations)\n4. [Training Details](#training-details)\n5. [Evaluation](#evaluation)\n6. [Environmental Impact](#environmental-impact)\n7. [Citation](#citation)\n8. [Model Card Authors](#model-card-authors)\n9. [How To Get Started With the Model](#how-to-get-started-with-the-model)\n\n# Model Details\n\n## Model Description\n\nThe developers of the Text-To-Text Transfer Transformer (T5) [write](https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html): \n\n> With T5, we propose reframing all NLP tasks into a unified text-to-text-format where the input and output are always text strings, in contrast to BERT-style models that can only output either a class label or a span of the input. Our text-to-text framework allows us to use the same model, loss function, and hyperparameters on any NLP task.\n\nT5-3B is the checkpoint with 3 billion parameters. \n\n- **Developed by:** Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu. See [associated paper](https://jmlr.org/papers/volume21/20-074/20-074.pdf) and [GitHub repo](https://github.com/google-research/text-to-text-transfer-transformer#released-model-checkpoints)\n- **Model type:** Language model\n- **Language(s) (NLP):** English, French, Romanian, German\n- **License:** Apache 2.0\n- **Related Models:** [All T5 Checkpoints](https://huggingface.co/models?search=t5)\n- **Resources for more information:**\n - [Research paper](https://jmlr.org/papers/volume21/20-074/20-074.pdf)\n - [Google's T5 Blog Post](https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html) \n - [GitHub Repo](https://github.com/google-research/text-to-text-transfer-transformer)\n - [Hugging Face T5 Docs](https://huggingface.co/docs/transformers/model_doc/t5)\n \n# Uses\n\n## Direct Use and Downstream Use\n\nThe developers write in a [blog post](https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html) that the model: \n\n> Our text-to-text framework allows us to use the same model, loss function, and hyperparameters on any NLP task, including machine translation, document summarization, question answering, and classification tasks (e.g., sentiment analysis). We can even apply T5 to regression tasks by training it to predict the string representation of a number instead of the number itself.\n\nSee the [blog post](https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html) and [research paper](https://jmlr.org/papers/volume21/20-074/20-074.pdf) for further details.\n\n## Out-of-Scope Use\n\nMore information needed.\n\n# Bias, Risks, and Limitations\n\nMore information needed.\n\n## Recommendations\n\nMore information needed.\n\n# Training Details\n\n## Training Data\n\nThe model is pre-trained on the [Colossal Clean Crawled Corpus (C4)](https://www.tensorflow.org/datasets/catalog/c4), which was developed and released in the context of the same [research paper](https://jmlr.org/papers/volume21/20-074/20-074.pdf) as T5.\n\nThe model was pre-trained on a on a **multi-task mixture of unsupervised (1.) and supervised tasks (2.)**.\nThereby, the following datasets were being used for (1.) and (2.):\n\n1. **Datasets used for Unsupervised denoising objective**:\n\n- [C4](https://huggingface.co/datasets/c4)\n- [Wiki-DPR](https://huggingface.co/datasets/wiki_dpr)\n\n\n2. **Datasets used for Supervised text-to-text language modeling objective**\n\n- Sentence acceptability judgment\n - CoLA [Warstadt et al., 2018](https://arxiv.org/abs/1805.12471)\n- Sentiment analysis \n - SST-2 [Socher et al., 2013](https://nlp.stanford.edu/~socherr/EMNLP2013_RNTN.pdf)\n- Paraphrasing/sentence similarity\n - MRPC [Dolan and Brockett, 2005](https://aclanthology.org/I05-5002)\n - STS-B [Ceret al., 2017](https://arxiv.org/abs/1708.00055)\n - QQP [Iyer et al., 2017](https://quoradata.quora.com/First-Quora-Dataset-Release-Question-Pairs)\n- Natural language inference\n - MNLI [Williams et al., 2017](https://arxiv.org/abs/1704.05426)\n - QNLI [Rajpurkar et al.,2016](https://arxiv.org/abs/1606.05250)\n - RTE [Dagan et al., 2005](https://link.springer.com/chapter/10.1007/11736790_9) \n - CB [De Marneff et al., 2019](https://semanticsarchive.net/Archive/Tg3ZGI2M/Marneffe.pdf)\n- Sentence completion\n - COPA [Roemmele et al., 2011](https://www.researchgate.net/publication/221251392_Choice_of_Plausible_Alternatives_An_Evaluation_of_Commonsense_Causal_Reasoning)\n- Word sense disambiguation\n - WIC [Pilehvar and Camacho-Collados, 2018](https://arxiv.org/abs/1808.09121)\n- Question answering\n - MultiRC [Khashabi et al., 2018](https://aclanthology.org/N18-1023)\n - ReCoRD [Zhang et al., 2018](https://arxiv.org/abs/1810.12885)\n - BoolQ [Clark et al., 2019](https://arxiv.org/abs/1905.10044)\n\n## Training Procedure\n\nIn their [abstract](https://jmlr.org/papers/volume21/20-074/20-074.pdf), the model developers write: \n\n> In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts every language problem into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks. \n\nThe framework introduced, the T5 framework, involves a training procedure that brings together the approaches studied in the paper. See the [research paper](https://jmlr.org/papers/volume21/20-074/20-074.pdf) for further details.\n\n# Evaluation\n\n## Testing Data, Factors & Metrics\n\nThe developers evaluated the model on 24 tasks, see the [research paper](https://jmlr.org/papers/volume21/20-074/20-074.pdf) for full details.\n\n## Results \n\nFor full results for T5-3B, see the [research paper](https://jmlr.org/papers/volume21/20-074/20-074.pdf), Table 14.\n\n# Environmental Impact\n\nCarbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).\n\n- **Hardware Type:** Google Cloud TPU Pods\n- **Hours used:** More information needed\n- **Cloud Provider:** GCP\n- **Compute Region:** More information needed\n- **Carbon Emitted:** More information needed\n\n# Citation\n\n**BibTeX:**\n\n```bibtex\n@article{2020t5,\n author = {Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu},\n title = {Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer},\n journal = {Journal of Machine Learning Research},\n year = {2020},\n volume = {21},\n number = {140},\n pages = {1-67},\n url = {http://jmlr.org/papers/v21/20-074.html}\n}\n```\n\n**APA:**\n- Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., ... & Liu, P. J. (2020). Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res., 21(140), 1-67.\n\n# Model Card Authors\n\nThis model card was written by the team at Hugging Face.\n\n# How to Get Started with the Model\n\nSee the [Hugging Face T5](https://huggingface.co/docs/transformers/model_doc/t5#transformers.T5Model) docs and a [Colab Notebook](https://colab.research.google.com/github/google-research/text-to-text-transfer-transformer/blob/main/notebooks/t5-trivia.ipynb) created by the model developers for more context on how to get started with this checkpoint.\n\n"} {"downloads": 8552, "id": "staka/fugumt-ja-en", "likes": 10, "pipeline_tag": "translation", "task": "translation", "meta": {"license": "cc-by-sa-4.0", "language": ["en", "ja"], "tags": ["translation"], "widget": [{"text": "\u732b\u306f\u304b\u308f\u3044\u3044\u3067\u3059\u3002"}]}, "description": "\n\n# FuguMT\n\nThis is a translation model using Marian-NMT.\nFor more details, please see [my repository](https://github.com/s-taka/fugumt).\n\n* source language: ja\n* target language: en \n\n### How to use\n\nThis model uses transformers and sentencepiece.\n```python\n!pip install transformers sentencepiece\n```\n\nYou can use this model directly with a pipeline:\n\n```python\nfrom transformers import pipeline\nfugu_translator = pipeline('translation', model='staka/fugumt-ja-en')\nfugu_translator('\u732b\u306f\u304b\u308f\u3044\u3044\u3067\u3059\u3002')\n```\n\n### Eval results\n\nThe results of the evaluation using [tatoeba](https://tatoeba.org/ja)(randomly selected 500 sentences) are as follows:\n\n|source |target |BLEU(*1)| \n|"} {"downloads": 217706, "id": "Helsinki-NLP/opus-mt-en-de", "likes": 9, "pipeline_tag": "translation", "task": "translation", "meta": {"tags": ["translation"], "license": "cc-by-4.0"}, "description": "\n\n### opus-mt-en-de\n\n\n## Table of Contents\n- [Model Details](#model-details)\n- [Uses](#uses)\n- [Risks, Limitations and Biases](#risks-limitations-and-biases)\n- [Training](#training)\n- [Evaluation](#evaluation)\n- [Citation Information](#citation-information)\n- [How to Get Started With the Model](#how-to-get-started-with-the-model)\n\n## Model Details\n**Model Description:**\n- **Developed by:** Language Technology Research Group at the University of Helsinki\n- **Model Type:** Translation\n- **Language(s):** \n - Source Language: English\n - Target Language: German \n- **License:** CC-BY-4.0\n- **Resources for more information:**\n - [GitHub Repo](https://github.com/Helsinki-NLP/OPUS-MT-train)\n \n\n## Uses\n\n#### Direct Use\n\nThis model can be used for translation and text-to-text generation.\n\n\n## Risks, Limitations and Biases\n\n\n\n**CONTENT WARNING: Readers should be aware this section contains content that is disturbing, offensive, and can propagate historical and current stereotypes.**\n\nSignificant research has explored bias and fairness issues with language models (see, e.g., [Sheng et al. (2021)](https://aclanthology.org/2021.acl-long.330.pdf) and [Bender et al. (2021)](https://dl.acm.org/doi/pdf/10.1145/3442188.3445922)).\n\nFurther details about the dataset for this model can be found in the OPUS readme: [en-de](https://github.com/Helsinki-NLP/OPUS-MT-train/blob/master/models/en-de/README.md)\n\n\n#### Training Data\n##### Preprocessing\n* pre-processing: normalization + SentencePiece\n\n* dataset: [opus](https://github.com/Helsinki-NLP/Opus-MT)\n* download original weights: [opus-2020-02-26.zip](https://object.pouta.csc.fi/OPUS-MT-models/en-de/opus-2020-02-26.zip)\n\n* test set translations: [opus-2020-02-26.test.txt](https://object.pouta.csc.fi/OPUS-MT-models/en-de/opus-2020-02-26.test.txt)\n\n## Evaluation\n\n#### Results\n\n* test set scores: [opus-2020-02-26.eval.txt](https://object.pouta.csc.fi/OPUS-MT-models/en-de/opus-2020-02-26.eval.txt)\n\n\n#### Benchmarks\n\n| testset | BLEU | chr-F |\n|"} {"downloads": 848028, "id": "deepset/roberta-base-squad2", "likes": 243, "pipeline_tag": "question-answering", "task": "question-answering", "meta": {"language": "en", "license": "cc-by-4.0", "datasets": ["squad_v2"], "model-index": [{"name": "deepset/roberta-base-squad2", "results": [{"task": {"type": "question-answering", "name": "Question Answering"}, "dataset": {"name": "squad_v2", "type": "squad_v2", "config": "squad_v2", "split": "validation"}, "metrics": [{"type": "exact_match", "value": 79.9309, "name": "Exact Match", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiMDhhNjg5YzNiZGQ1YTIyYTAwZGUwOWEzZTRiYzdjM2QzYjA3ZTUxNDM1NjE1MTUyMjE1MGY1YzEzMjRjYzVjYiIsInZlcnNpb24iOjF9.EH5JJo8EEFwU7osPz3s7qanw_tigeCFhCXjSfyN0Y1nWVnSfulSxIk_DbAEI5iE80V4EKLyp5-mYFodWvL2KDA"}, {"type": "f1", "value": 82.9501, "name": "F1", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiMjk5ZDYwOGQyNjNkMWI0OTE4YzRmOTlkY2JjNjQ0YTZkNTMzMzNkYTA0MDFmNmI3NjA3NjNlMjhiMDQ2ZjJjNSIsInZlcnNpb24iOjF9.DDm0LNTkdLbGsue58bg1aH_s67KfbcmkvL-6ZiI2s8IoxhHJMSf29H_uV2YLyevwx900t-MwTVOW3qfFnMMEAQ"}, {"type": "total", "value": 11869, "name": "total", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiMGFkMmI2ODM0NmY5NGNkNmUxYWViOWYxZDNkY2EzYWFmOWI4N2VhYzY5MGEzMTVhOTU4Zjc4YWViOGNjOWJjMCIsInZlcnNpb24iOjF9.fexrU1icJK5_MiifBtZWkeUvpmFISqBLDXSQJ8E6UnrRof-7cU0s4tX_dIsauHWtUpIHMPZCf5dlMWQKXZuAAA"}]}]}]}, "description": "\n\n# roberta-base for QA \n\nThis is the [roberta-base](https://huggingface.co/roberta-base) model, fine-tuned using the [SQuAD2.0](https://huggingface.co/datasets/squad_v2) dataset. It's been trained on question-answer pairs, including unanswerable questions, for the task of Question Answering. \n\n\n## Overview\n**Language model:** roberta-base \n**Language:** English \n**Downstream-task:** Extractive QA \n**Training data:** SQuAD 2.0 \n**Eval data:** SQuAD 2.0 \n**Code:** See [an example QA pipeline on Haystack](https://haystack.deepset.ai/tutorials/first-qa-system) \n**Infrastructure**: 4x Tesla v100\n\n## Hyperparameters\n\n```\nbatch_size = 96\nn_epochs = 2\nbase_LM_model = \"roberta-base\"\nmax_seq_len = 386\nlearning_rate = 3e-5\nlr_schedule = LinearWarmup\nwarmup_proportion = 0.2\ndoc_stride=128\nmax_query_length=64\n``` \n\n## Using a distilled model instead\nPlease note that we have also released a distilled version of this model called [deepset/tinyroberta-squad2](https://huggingface.co/deepset/tinyroberta-squad2). The distilled model has a comparable prediction quality and runs at twice the speed of the base model.\n\n## Usage\n\n### In Haystack\nHaystack is an NLP framework by deepset. You can use this model in a Haystack pipeline to do question answering at scale (over many documents). To load the model in [Haystack](https://github.com/deepset-ai/haystack/):\n```python\nreader = FARMReader(model_name_or_path=\"deepset/roberta-base-squad2\")\n# or \nreader = TransformersReader(model_name_or_path=\"deepset/roberta-base-squad2\",tokenizer=\"deepset/roberta-base-squad2\")\n```\nFor a complete example of ``roberta-base-squad2`` being used for Question Answering, check out the [Tutorials in Haystack Documentation](https://haystack.deepset.ai/tutorials/first-qa-system)\n\n### In Transformers\n```python\nfrom transformers import AutoModelForQuestionAnswering, AutoTokenizer, pipeline\n\nmodel_name = \"deepset/roberta-base-squad2\"\n\n# a) Get predictions\nnlp = pipeline('question-answering', model=model_name, tokenizer=model_name)\nQA_input = {\n 'question': 'Why is model conversion important?',\n 'context': 'The option to convert models between FARM and transformers gives freedom to the user and let people easily switch between frameworks.'\n}\nres = nlp(QA_input)\n\n# b) Load model & tokenizer\nmodel = AutoModelForQuestionAnswering.from_pretrained(model_name)\ntokenizer = AutoTokenizer.from_pretrained(model_name)\n```\n\n## Performance\nEvaluated on the SQuAD 2.0 dev set with the [official eval script](https://worksheets.codalab.org/rest/bundles/0x6b567e1cf2e041ec80d7098f031c5c9e/contents/blob/).\n\n```\n\"exact\": 79.87029394424324,\n\"f1\": 82.91251169582613,\n\n\"total\": 11873,\n\"HasAns_exact\": 77.93522267206478,\n\"HasAns_f1\": 84.02838248389763,\n\"HasAns_total\": 5928,\n\"NoAns_exact\": 81.79983179142137,\n\"NoAns_f1\": 81.79983179142137,\n\"NoAns_total\": 5945\n```\n\n## Authors\n**Branden Chan:** branden.chan@deepset.ai \n**Timo M\u00f6ller:** timo.moeller@deepset.ai \n**Malte Pietsch:** malte.pietsch@deepset.ai \n**Tanay Soni:** tanay.soni@deepset.ai \n\n## About us\n\n
\n
\n \"\"\n
\n
\n \"\"\n
\n
\n\n[deepset](http://deepset.ai/) is the company behind the open-source NLP framework [Haystack](https://haystack.deepset.ai/) which is designed to help you build production ready NLP systems that use: Question answering, summarization, ranking etc.\n\n\nSome of our other work: \n- [Distilled roberta-base-squad2 (aka \"tinyroberta-squad2\")]([https://huggingface.co/deepset/tinyroberta-squad2)\n- [German BERT (aka \"bert-base-german-cased\")](https://deepset.ai/german-bert)\n- [GermanQuAD and GermanDPR datasets and models (aka \"gelectra-base-germanquad\", \"gbert-base-germandpr\")](https://deepset.ai/germanquad)\n\n## Get in touch and join the Haystack community\n\n

For more info on Haystack, visit our GitHub repo and Documentation. \n\nWe also have a Discord community open to everyone!

\n\n[Twitter](https://twitter.com/deepset_ai) | [LinkedIn](https://www.linkedin.com/company/deepset-ai/) | [Discord](https://haystack.deepset.ai/community) | [GitHub Discussions](https://github.com/deepset-ai/haystack/discussions) | [Website](https://deepset.ai)\n\nBy the way: [we're hiring!](http://www.deepset.ai/jobs) \n"} {"downloads": 1284721, "id": "distilbert-base-cased-distilled-squad", "likes": 102, "pipeline_tag": "question-answering", "task": "question-answering", "meta": {"language": "en", "license": "apache-2.0", "datasets": ["squad"], "metrics": ["squad"], "model-index": [{"name": "distilbert-base-cased-distilled-squad", "results": [{"task": {"type": "question-answering", "name": "Question Answering"}, "dataset": {"name": "squad", "type": "squad", "config": "plain_text", "split": "validation"}, "metrics": [{"type": "exact_match", "value": 79.5998, "name": "Exact Match", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiZTViZDA2Y2E2NjUyMjNjYjkzNTUzODc5OTk2OTNkYjQxMDRmMDhlYjdmYWJjYWQ2N2RlNzY1YmI3OWY1NmRhOSIsInZlcnNpb24iOjF9.ZJHhboAMwsi3pqU-B-XKRCYP_tzpCRb8pEjGr2Oc-TteZeoWHI8CXcpDxugfC3f7d_oBcKWLzh3CClQxBW1iAQ"}, {"type": "f1", "value": 86.9965, "name": "F1", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiZWZlMzY2MmE1NDNhOGNjNWRmODg0YjQ2Zjk5MjUzZDQ2MDYxOTBlMTNhNzQ4NTA2NjRmNDU3MGIzMTYwMmUyOSIsInZlcnNpb24iOjF9.z0ZDir87aT7UEmUeDm8Uw0oUdAqzlBz343gwnsQP3YLfGsaHe-jGlhco0Z7ISUd9NokyCiJCRc4NNxJQ83IuCw"}]}]}]}, "description": "\n\n# DistilBERT base cased distilled SQuAD\n\n## Table of Contents\n- [Model Details](#model-details)\n- [How To Get Started With the Model](#how-to-get-started-with-the-model)\n- [Uses](#uses)\n- [Risks, Limitations and Biases](#risks-limitations-and-biases)\n- [Training](#training)\n- [Evaluation](#evaluation)\n- [Environmental Impact](#environmental-impact)\n- [Technical Specifications](#technical-specifications)\n- [Citation Information](#citation-information)\n- [Model Card Authors](#model-card-authors)\n\n## Model Details\n\n**Model Description:** The DistilBERT model was proposed in the blog post [Smaller, faster, cheaper, lighter: Introducing DistilBERT, adistilled version of BERT](https://medium.com/huggingface/distilbert-8cf3380435b5), and the paper [DistilBERT, adistilled version of BERT: smaller, faster, cheaper and lighter](https://arxiv.org/abs/1910.01108). DistilBERT is a small, fast, cheap and light Transformer model trained by distilling BERT base. It has 40% less parameters than *bert-base-uncased*, runs 60% faster while preserving over 95% of BERT's performances as measured on the GLUE language understanding benchmark.\n\nThis model is a fine-tune checkpoint of [DistilBERT-base-cased](https://huggingface.co/distilbert-base-cased), fine-tuned using (a second step of) knowledge distillation on [SQuAD v1.1](https://huggingface.co/datasets/squad). \n\n- **Developed by:** Hugging Face\n- **Model Type:** Transformer-based language model\n- **Language(s):** English \n- **License:** Apache 2.0\n- **Related Models:** [DistilBERT-base-cased](https://huggingface.co/distilbert-base-cased)\n- **Resources for more information:**\n - See [this repository](https://github.com/huggingface/transformers/tree/main/examples/research_projects/distillation) for more about Distil\\* (a class of compressed models including this model)\n - See [Sanh et al. (2019)](https://arxiv.org/abs/1910.01108) for more information about knowledge distillation and the training procedure\n\n## How to Get Started with the Model \n\nUse the code below to get started with the model. \n\n```python\n>>> from transformers import pipeline\n>>> question_answerer = pipeline(\"question-answering\", model='distilbert-base-cased-distilled-squad')\n\n>>> context = r\"\"\"\n... Extractive Question Answering is the task of extracting an answer from a text given a question. An example of a\n... question answering dataset is the SQuAD dataset, which is entirely based on that task. If you would like to fine-tune\n... a model on a SQuAD task, you may leverage the examples/pytorch/question-answering/run_squad.py script.\n... \"\"\"\n\n>>> result = question_answerer(question=\"What is a good example of a question answering dataset?\", context=context)\n>>> print(\n... f\"Answer: '{result['answer']}', score: {round(result['score'], 4)}, start: {result['start']}, end: {result['end']}\"\n...)\n\nAnswer: 'SQuAD dataset', score: 0.5152, start: 147, end: 160\n```\n\nHere is how to use this model in PyTorch:\n\n```python\nfrom transformers import DistilBertTokenizer, DistilBertModel\nimport torch\ntokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-cased-distilled-squad')\nmodel = DistilBertModel.from_pretrained('distilbert-base-cased-distilled-squad')\n\nquestion, text = \"Who was Jim Henson?\", \"Jim Henson was a nice puppet\"\n\ninputs = tokenizer(question, text, return_tensors=\"pt\")\nwith torch.no_grad():\n outputs = model(**inputs)\n\nprint(outputs)\n```\n\nAnd in TensorFlow: \n\n```python\nfrom transformers import DistilBertTokenizer, TFDistilBertForQuestionAnswering\nimport tensorflow as tf\n\ntokenizer = DistilBertTokenizer.from_pretrained(\"distilbert-base-cased-distilled-squad\")\nmodel = TFDistilBertForQuestionAnswering.from_pretrained(\"distilbert-base-cased-distilled-squad\")\n\nquestion, text = \"Who was Jim Henson?\", \"Jim Henson was a nice puppet\"\n\ninputs = tokenizer(question, text, return_tensors=\"tf\")\noutputs = model(**inputs)\n\nanswer_start_index = int(tf.math.argmax(outputs.start_logits, axis=-1)[0])\nanswer_end_index = int(tf.math.argmax(outputs.end_logits, axis=-1)[0])\n\npredict_answer_tokens = inputs.input_ids[0, answer_start_index : answer_end_index + 1]\ntokenizer.decode(predict_answer_tokens)\n```\n\n## Uses\n\nThis model can be used for question answering.\n\n#### Misuse and Out-of-scope Use\n\nThe model should not be used to intentionally create hostile or alienating environments for people. In addition, the model was not trained to be factual or true representations of people or events, and therefore using the model to generate such content is out-of-scope for the abilities of this model.\n\n## Risks, Limitations and Biases\n\n**CONTENT WARNING: Readers should be aware that language generated by this model can be disturbing or offensive to some and can propagate historical and current stereotypes.**\n\nSignificant research has explored bias and fairness issues with language models (see, e.g., [Sheng et al. (2021)](https://aclanthology.org/2021.acl-long.330.pdf) and [Bender et al. (2021)](https://dl.acm.org/doi/pdf/10.1145/3442188.3445922)). Predictions generated by the model can include disturbing and harmful stereotypes across protected classes; identity characteristics; and sensitive, social, and occupational groups. For example:\n\n\n```python\n>>> from transformers import pipeline\n>>> question_answerer = pipeline(\"question-answering\", model='distilbert-base-cased-distilled-squad')\n\n>>> context = r\"\"\"\n... Alice is sitting on the bench. Bob is sitting next to her.\n... \"\"\"\n\n>>> result = question_answerer(question=\"Who is the CEO?\", context=context)\n>>> print(\n... f\"Answer: '{result['answer']}', score: {round(result['score'], 4)}, start: {result['start']}, end: {result['end']}\"\n...)\n\nAnswer: 'Bob', score: 0.7527, start: 32, end: 35\n```\n\nUsers (both direct and downstream) should be made aware of the risks, biases and limitations of the model.\n\n## Training\n\n#### Training Data\n\nThe [distilbert-base-cased model](https://huggingface.co/distilbert-base-cased) was trained using the same data as the [distilbert-base-uncased model](https://huggingface.co/distilbert-base-uncased). The [distilbert-base-uncased model](https://huggingface.co/distilbert-base-uncased) model describes it's training data as: \n\n> DistilBERT pretrained on the same data as BERT, which is [BookCorpus](https://yknzhu.wixsite.com/mbweb), a dataset consisting of 11,038 unpublished books and [English Wikipedia](https://en.wikipedia.org/wiki/English_Wikipedia) (excluding lists, tables and headers).\n\nTo learn more about the SQuAD v1.1 dataset, see the [SQuAD v1.1 data card](https://huggingface.co/datasets/squad).\n\n#### Training Procedure\n\n##### Preprocessing\n\nSee the [distilbert-base-cased model card](https://huggingface.co/distilbert-base-cased) for further details.\n\n##### Pretraining\n\nSee the [distilbert-base-cased model card](https://huggingface.co/distilbert-base-cased) for further details. \n\n## Evaluation\n\nAs discussed in the [model repository](https://github.com/huggingface/transformers/blob/main/examples/research_projects/distillation/README.md)\n\n> This model reaches a F1 score of 87.1 on the [SQuAD v1.1] dev set (for comparison, BERT bert-base-cased version reaches a F1 score of 88.7).\t\n\n## Environmental Impact\n\nCarbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). We present the hardware type and hours used based on the [associated paper](https://arxiv.org/pdf/1910.01108.pdf). Note that these details are just for training DistilBERT, not including the fine-tuning with SQuAD.\n\n- **Hardware Type:** 8 16GB V100 GPUs\n- **Hours used:** 90 hours\n- **Cloud Provider:** Unknown\n- **Compute Region:** Unknown\n- **Carbon Emitted:** Unknown\n\n## Technical Specifications\n\nSee the [associated paper](https://arxiv.org/abs/1910.01108) for details on the modeling architecture, objective, compute infrastructure, and training details.\n\n## Citation Information\n\n```bibtex\n@inproceedings{sanh2019distilbert,\n title={DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter},\n author={Sanh, Victor and Debut, Lysandre and Chaumond, Julien and Wolf, Thomas},\n booktitle={NeurIPS EMC^2 Workshop},\n year={2019}\n}\n```\n\nAPA: \n- Sanh, V., Debut, L., Chaumond, J., & Wolf, T. (2019). DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108.\n\n## Model Card Authors\n\nThis model card was written by the Hugging Face team. \n"} {"downloads": 1051117, "id": "bert-large-uncased-whole-word-masking-finetuned-squad", "likes": 70, "pipeline_tag": "question-answering", "task": "question-answering", "meta": {"language": "en", "license": "apache-2.0", "datasets": ["bookcorpus", "wikipedia"]}, "description": "\n\n# BERT large model (uncased) whole word masking finetuned on SQuAD\n\nPretrained model on English language using a masked language modeling (MLM) objective. It was introduced in\n[this paper](https://arxiv.org/abs/1810.04805) and first released in\n[this repository](https://github.com/google-research/bert). This model is uncased: it does not make a difference\nbetween english and English.\n\nDifferently to other BERT models, this model was trained with a new technique: Whole Word Masking. In this case, all of the tokens corresponding to a word are masked at once. The overall masking rate remains the same.\n\nThe training is identical -- each masked WordPiece token is predicted independently. \n\nAfter pre-training, this model was fine-tuned on the SQuAD dataset with one of our fine-tuning scripts. See below for more information regarding this fine-tuning.\n\nDisclaimer: The team releasing BERT did not write a model card for this model so this model card has been written by\nthe Hugging Face team.\n\n## Model description\n\nBERT is a transformers model pretrained on a large corpus of English data in a self-supervised fashion. This means it\nwas pretrained on the raw texts only, with no humans labelling them in any way (which is why it can use lots of\npublicly available data) with an automatic process to generate inputs and labels from those texts. More precisely, it\nwas pretrained with two objectives:\n\n- Masked language modeling (MLM): taking a sentence, the model randomly masks 15% of the words in the input then run\n the entire masked sentence through the model and has to predict the masked words. This is different from traditional\n recurrent neural networks (RNNs) that usually see the words one after the other, or from autoregressive models like\n GPT which internally mask the future tokens. It allows the model to learn a bidirectional representation of the\n sentence.\n- Next sentence prediction (NSP): the models concatenates two masked sentences as inputs during pretraining. Sometimes\n they correspond to sentences that were next to each other in the original text, sometimes not. The model then has to\n predict if the two sentences were following each other or not.\n\nThis way, the model learns an inner representation of the English language that can then be used to extract features\nuseful for downstream tasks: if you have a dataset of labeled sentences for instance, you can train a standard\nclassifier using the features produced by the BERT model as inputs.\n\nThis model has the following configuration:\n\n- 24-layer\n- 1024 hidden dimension\n- 16 attention heads\n- 336M parameters.\n\n## Intended uses & limitations\nThis model should be used as a question-answering model. You may use it in a question answering pipeline, or use it to output raw results given a query and a context. You may see other use cases in the [task summary](https://huggingface.co/transformers/task_summary.html#extractive-question-answering) of the transformers documentation.## Training data\n\nThe BERT model was pretrained on [BookCorpus](https://yknzhu.wixsite.com/mbweb), a dataset consisting of 11,038\nunpublished books and [English Wikipedia](https://en.wikipedia.org/wiki/English_Wikipedia) (excluding lists, tables and\nheaders).\n\n## Training procedure\n\n### Preprocessing\n\nThe texts are lowercased and tokenized using WordPiece and a vocabulary size of 30,000. The inputs of the model are\nthen of the form:\n\n```\n[CLS] Sentence A [SEP] Sentence B [SEP]\n```\n\nWith probability 0.5, sentence A and sentence B correspond to two consecutive sentences in the original corpus and in\nthe other cases, it's another random sentence in the corpus. Note that what is considered a sentence here is a\nconsecutive span of text usually longer than a single sentence. The only constrain is that the result with the two\n\"sentences\" has a combined length of less than 512 tokens.\n\nThe details of the masking procedure for each sentence are the following:\n- 15% of the tokens are masked.\n- In 80% of the cases, the masked tokens are replaced by `[MASK]`.\n- In 10% of the cases, the masked tokens are replaced by a random token (different) from the one they replace.\n- In the 10% remaining cases, the masked tokens are left as is.\n\n### Pretraining\n\nThe model was trained on 4 cloud TPUs in Pod configuration (16 TPU chips total) for one million steps with a batch size\nof 256. The sequence length was limited to 128 tokens for 90% of the steps and 512 for the remaining 10%. The optimizer\nused is Adam with a learning rate of 1e-4, \\\\(\\beta_{1} = 0.9\\\\) and \\\\(\\beta_{2} = 0.999\\\\), a weight decay of 0.01,\nlearning rate warmup for 10,000 steps and linear decay of the learning rate after.\n\n### Fine-tuning\n\nAfter pre-training, this model was fine-tuned on the SQuAD dataset with one of our fine-tuning scripts. In order to reproduce the training, you may use the following command:\n```\npython -m torch.distributed.launch --nproc_per_node=8 ./examples/question-answering/run_qa.py \\\n --model_name_or_path bert-large-uncased-whole-word-masking \\\n --dataset_name squad \\\n --do_train \\\n --do_eval \\\n --learning_rate 3e-5 \\\n --num_train_epochs 2 \\\n --max_seq_length 384 \\\n --doc_stride 128 \\\n --output_dir ./examples/models/wwm_uncased_finetuned_squad/ \\\n --per_device_eval_batch_size=3 \\\n --per_device_train_batch_size=3 \\\n```\n\n## Evaluation results\n\nThe results obtained are the following:\n\n```\nf1 = 93.15\nexact_match = 86.91\n```\n\n\n### BibTeX entry and citation info\n\n```bibtex\n@article{DBLP:journals/corr/abs-1810-04805,\n author = {Jacob Devlin and\n Ming{-}Wei Chang and\n Kenton Lee and\n Kristina Toutanova},\n title = {{BERT:} Pre-training of Deep Bidirectional Transformers for Language\n Understanding},\n journal = {CoRR},\n volume = {abs/1810.04805},\n year = {2018},\n url = {http://arxiv.org/abs/1810.04805},\n archivePrefix = {arXiv},\n eprint = {1810.04805},\n timestamp = {Tue, 30 Oct 2018 20:39:56 +0100},\n biburl = {https://dblp.org/rec/journals/corr/abs-1810-04805.bib},\n bibsource = {dblp computer science bibliography, https://dblp.org}\n}\n```"} {"downloads": 19314, "id": "luhua/chinese_pretrain_mrc_roberta_wwm_ext_large", "likes": 46, "pipeline_tag": "question-answering", "task": "question-answering", "meta": {"language": ["zh"], "license": "apache-2.0"}, "description": "\n\n## Chinese MRC roberta_wwm_ext_large\n\n* \u4f7f\u7528\u5927\u91cf\u4e2d\u6587MRC\u6570\u636e\u8bad\u7ec3\u7684roberta_wwm_ext_large\u6a21\u578b\uff0c\u8be6\u60c5\u53ef\u67e5\u770b\uff1ahttps://github.com/basketballandlearn/MRC_Competition_Dureader\n* \u6b64\u5e93\u53d1\u5e03\u7684\u518d\u8bad\u7ec3\u6a21\u578b\uff0c\u5728 \u9605\u8bfb\u7406\u89e3/\u5206\u7c7b \u7b49\u4efb\u52a1\u4e0a\u5747\u6709\u5927\u5e45\u63d0\u9ad8
\n\uff08\u5df2\u6709\u591a\u4f4d\u5c0f\u4f19\u4f34\u5728Dureader-2021\u7b49\u591a\u4e2a\u6bd4\u8d5b\u4e2d\u53d6\u5f97**top5**\u7684\u6210\u7ee9\ud83d\ude01\uff09\n\n| \u6a21\u578b/\u6570\u636e\u96c6 | Dureader-2021 | tencentmedical |\n| "} {"downloads": 2819, "id": "uer/roberta-base-chinese-extractive-qa", "likes": 40, "pipeline_tag": "question-answering", "task": "question-answering", "meta": {"language": "zh", "widget": [{"text": "\u8457\u540d\u8bd7\u6b4c\u300a\u5047\u5982\u751f\u6d3b\u6b3a\u9a97\u4e86\u4f60\u300b\u7684\u4f5c\u8005\u662f", "context": "\u666e\u5e0c\u91d1\u4ece\u90a3\u91cc\u5b66\u4e60\u4eba\u6c11\u7684\u8bed\u8a00\uff0c\u5438\u53d6\u4e86\u8bb8\u591a\u6709\u76ca\u7684\u517b\u6599\uff0c\u8fd9\u4e00\u5207\u5bf9\u666e\u5e0c\u91d1\u540e\u6765\u7684\u521b\u4f5c\u4ea7\u751f\u4e86\u5f88\u5927\u7684\u5f71\u54cd\u3002\u8fd9\u4e24\u5e74\u91cc\uff0c\u666e\u5e0c\u91d1\u521b\u4f5c\u4e86\u4e0d\u5c11\u4f18\u79c0\u7684\u4f5c\u54c1\uff0c\u5982\u300a\u56da\u5f92\u300b\u3001\u300a\u81f4\u5927\u6d77\u300b\u3001\u300a\u81f4\u51ef\u6069\u300b\u548c\u300a\u5047\u5982\u751f\u6d3b\u6b3a\u9a97\u4e86\u4f60\u300b\u7b49\u51e0\u5341\u9996\u6292\u60c5\u8bd7\uff0c\u53d9\u4e8b\u8bd7\u300a\u52aa\u6797\u4f2f\u7235\u300b\uff0c\u5386\u53f2\u5267\u300a\u9c8d\u91cc\u65af\u00b7\u6208\u90fd\u8bfa\u592b\u300b\uff0c\u4ee5\u53ca\u300a\u53f6\u752b\u76d6\u5c3c\u00b7\u5965\u6d85\u91d1\u300b\u524d\u516d\u7ae0\u3002"}]}, "description": "\n\n# Chinese RoBERTa-Base Model for QA\n\n## Model description\n\nThe model is used for extractive question answering. You can download the model from the link [roberta-base-chinese-extractive-qa](https://huggingface.co/uer/roberta-base-chinese-extractive-qa).\n\n## How to use\n\nYou can use the model directly with a pipeline for extractive question answering:\n\n```python\n>>> from transformers import AutoModelForQuestionAnswering,AutoTokenizer,pipeline\n>>> model = AutoModelForQuestionAnswering.from_pretrained('uer/roberta-base-chinese-extractive-qa')\n>>> tokenizer = AutoTokenizer.from_pretrained('uer/roberta-base-chinese-extractive-qa')\n>>> QA = pipeline('question-answering', model=model, tokenizer=tokenizer)\n>>> QA_input = {'question': \"\u8457\u540d\u8bd7\u6b4c\u300a\u5047\u5982\u751f\u6d3b\u6b3a\u9a97\u4e86\u4f60\u300b\u7684\u4f5c\u8005\u662f\",'context': \"\u666e\u5e0c\u91d1\u4ece\u90a3\u91cc\u5b66\u4e60\u4eba\u6c11\u7684\u8bed\u8a00\uff0c\u5438\u53d6\u4e86\u8bb8\u591a\u6709\u76ca\u7684\u517b\u6599\uff0c\u8fd9\u4e00\u5207\u5bf9\u666e\u5e0c\u91d1\u540e\u6765\u7684\u521b\u4f5c\u4ea7\u751f\u4e86\u5f88\u5927\u7684\u5f71\u54cd\u3002\u8fd9\u4e24\u5e74\u91cc\uff0c\u666e\u5e0c\u91d1\u521b\u4f5c\u4e86\u4e0d\u5c11\u4f18\u79c0\u7684\u4f5c\u54c1\uff0c\u5982\u300a\u56da\u5f92\u300b\u3001\u300a\u81f4\u5927\u6d77\u300b\u3001\u300a\u81f4\u51ef\u6069\u300b\u548c\u300a\u5047\u5982\u751f\u6d3b\u6b3a\u9a97\u4e86\u4f60\u300b\u7b49\u51e0\u5341\u9996\u6292\u60c5\u8bd7\uff0c\u53d9\u4e8b\u8bd7\u300a\u52aa\u6797\u4f2f\u7235\u300b\uff0c\u5386\u53f2\u5267\u300a\u9c8d\u91cc\u65af\u00b7\u6208\u90fd\u8bfa\u592b\u300b\uff0c\u4ee5\u53ca\u300a\u53f6\u752b\u76d6\u5c3c\u00b7\u5965\u6d85\u91d1\u300b\u524d\u516d\u7ae0\u3002\"}\n>>> QA(QA_input)\n {'score': 0.9766426682472229, 'start': 0, 'end': 3, 'answer': '\u666e\u5e0c\u91d1'}\n```\n\n## Training data\n\nTraining data comes from three sources: [cmrc2018](https://github.com/ymcui/cmrc2018), [webqa](https://spaces.ac.cn/archives/4338), and [laisi](https://www.kesci.com/home/competition/5d142d8cbb14e6002c04e14a/content/0). We only use the train set of three datasets.\n\n## Training procedure\n\nThe model is fine-tuned by [UER-py](https://github.com/dbiir/UER-py/) on [Tencent Cloud TI-ONE](https://cloud.tencent.com/product/tione/). We fine-tune three epochs with a sequence length of 512 on the basis of the pre-trained model [chinese_roberta_L-12_H-768](https://huggingface.co/uer/chinese_roberta_L-12_H-768). At the end of each epoch, the model is saved when the best performance on development set is achieved.\n\n```\npython3 run_cmrc.py --pretrained_model_path models/cluecorpussmall_roberta_base_seq512_model.bin-250000 \\\n --vocab_path models/google_zh_vocab.txt \\\n --train_path extractive_qa.json \\\n --dev_path datasets/cmrc2018/dev.json \\\n --output_model_path models/extractive_qa_model.bin \\\n --learning_rate 3e-5 --epochs_num 3 --batch_size 32 --seq_length 512\n```\n\nFinally, we convert the fine-tuned model into Huggingface's format:\n\n```\npython3 scripts/convert_bert_extractive_qa_from_uer_to_huggingface.py --input_model_path extractive_qa_model.bin \\\n --output_model_path pytorch_model.bin \\\n --layers_num 12\n```\n\n### BibTeX entry and citation info\n\n```\n@article{zhao2019uer,\n title={UER: An Open-Source Toolkit for Pre-training Models},\n author={Zhao, Zhe and Chen, Hui and Zhang, Jinbin and Zhao, Xin and Liu, Tao and Lu, Wei and Chen, Xi and Deng, Haotang and Ju, Qi and Du, Xiaoyong},\n journal={EMNLP-IJCNLP 2019},\n pages={241},\n year={2019}\n}\n```"} {"downloads": 37925, "id": "deepset/xlm-roberta-large-squad2", "likes": 29, "pipeline_tag": "question-answering", "task": "question-answering", "meta": {"language": "multilingual", "license": "cc-by-4.0", "tags": ["question-answering"], "datasets": ["squad_v2"], "model-index": [{"name": "deepset/xlm-roberta-large-squad2", "results": [{"task": {"type": "question-answering", "name": "Question Answering"}, "dataset": {"name": "squad_v2", "type": "squad_v2", "config": "squad_v2", "split": "validation"}, "metrics": [{"type": "exact_match", "value": 81.8281, "name": "Exact Match", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiNzVhZDE2NTg5NmUwOWRkMmI2MGUxYjFlZjIzNmMyNDQ2MDY2MDNhYzE0ZjY5YTkyY2U4ODc3ODFiZjQxZWQ2YSIsInZlcnNpb24iOjF9.f_rN3WPMAdv-OBPz0T7N7lOxYz9f1nEr_P-vwKhi3jNdRKp_JTy18MYR9eyJM2riKHC6_ge-8XwfyrUf51DSDA"}, {"type": "f1", "value": 84.8886, "name": "F1", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiZGE5MWJmZGUxMGMwNWFhYzVhZjQwZGEwOWQ4N2Q2Yjg5NzdjNDFiNDhiYTQ1Y2E5ZWJkOTFhYmI1Y2Q2ZGYwOCIsInZlcnNpb24iOjF9.TIdH-tOx3kEMDs5wK1r6iwZqqSjNGlBrpawrsE917j1F3UFJVnQ7wJwaj0OIgmC4iw8OQeLZL56ucBcLApa-AQ"}]}]}]}, "description": "\n\n# Multilingual XLM-RoBERTa large for QA on various languages \n\n## Overview\n**Language model:** xlm-roberta-large \n**Language:** Multilingual \n**Downstream-task:** Extractive QA \n**Training data:** SQuAD 2.0 \n**Eval data:** SQuAD dev set - German MLQA - German XQuAD \n**Training run:** [MLFlow link](https://public-mlflow.deepset.ai/#/experiments/124/runs/3a540e3f3ecf4dd98eae8fc6d457ff20) \n**Infrastructure**: 4x Tesla v100\n\n## Hyperparameters\n\n```\nbatch_size = 32\nn_epochs = 3\nbase_LM_model = \"xlm-roberta-large\"\nmax_seq_len = 256\nlearning_rate = 1e-5\nlr_schedule = LinearWarmup\nwarmup_proportion = 0.2\ndoc_stride=128\nmax_query_length=64\n``` \n\n## Performance\nEvaluated on the SQuAD 2.0 English dev set with the [official eval script](https://worksheets.codalab.org/rest/bundles/0x6b567e1cf2e041ec80d7098f031c5c9e/contents/blob/).\n```\n \"exact\": 79.45759285774446,\n \"f1\": 83.79259828925511,\n \"total\": 11873,\n \"HasAns_exact\": 71.96356275303644,\n \"HasAns_f1\": 80.6460053117963,\n \"HasAns_total\": 5928,\n \"NoAns_exact\": 86.93019343986543,\n \"NoAns_f1\": 86.93019343986543,\n \"NoAns_total\": 5945\n```\n\nEvaluated on German [MLQA: test-context-de-question-de.json](https://github.com/facebookresearch/MLQA)\n```\n\"exact\": 49.34691166703564,\n\"f1\": 66.15582561674236,\n\"total\": 4517,\n```\n\nEvaluated on German [XQuAD: xquad.de.json](https://github.com/deepmind/xquad)\n```\n\"exact\": 61.51260504201681,\n\"f1\": 78.80206098332569,\n\"total\": 1190,\n```\n\n## Usage\n\n### In Haystack\nFor doing QA at scale (i.e. many docs instead of single paragraph), you can load the model also in [haystack](https://github.com/deepset-ai/haystack/):\n```python\nreader = FARMReader(model_name_or_path=\"deepset/xlm-roberta-large-squad2\")\n# or \nreader = TransformersReader(model=\"deepset/xlm-roberta-large-squad2\",tokenizer=\"deepset/xlm-roberta-large-squad2\")\n```\n\n### In Transformers\n```python\nfrom transformers import AutoModelForQuestionAnswering, AutoTokenizer, pipeline\n\nmodel_name = \"deepset/xlm-roberta-large-squad2\"\n\n# a) Get predictions\nnlp = pipeline('question-answering', model=model_name, tokenizer=model_name)\nQA_input = {\n 'question': 'Why is model conversion important?',\n 'context': 'The option to convert models between FARM and transformers gives freedom to the user and let people easily switch between frameworks.'\n}\nres = nlp(QA_input)\n\n# b) Load model & tokenizer\nmodel = AutoModelForQuestionAnswering.from_pretrained(model_name)\ntokenizer = AutoTokenizer.from_pretrained(model_name)\n```\n\n## Authors\n**Branden Chan:** branden.chan@deepset.ai \n**Timo M\u00f6ller:** timo.moeller@deepset.ai \n**Malte Pietsch:** malte.pietsch@deepset.ai \n**Tanay Soni:** tanay.soni@deepset.ai \n\n## About us\n
\n
\n \"\"\n
\n
\n \"\"\n
\n
\n\n[deepset](http://deepset.ai/) is the company behind the open-source NLP framework [Haystack](https://haystack.deepset.ai/) which is designed to help you build production ready NLP systems that use: Question answering, summarization, ranking etc.\n\n\nSome of our other work: \n- [Distilled roberta-base-squad2 (aka \"tinyroberta-squad2\")]([https://huggingface.co/deepset/tinyroberta-squad2)\n- [German BERT (aka \"bert-base-german-cased\")](https://deepset.ai/german-bert)\n- [GermanQuAD and GermanDPR datasets and models (aka \"gelectra-base-germanquad\", \"gbert-base-germandpr\")](https://deepset.ai/germanquad)\n\n## Get in touch and join the Haystack community\n\n

For more info on Haystack, visit our GitHub repo and Documentation. \n\nWe also have a Discord community open to everyone!

\n\n[Twitter](https://twitter.com/deepset_ai) | [LinkedIn](https://www.linkedin.com/company/deepset-ai/) | [Discord](https://haystack.deepset.ai/community) | [GitHub Discussions](https://github.com/deepset-ai/haystack/discussions) | [Website](https://deepset.ai)\n\nBy the way: [we're hiring!](http://www.deepset.ai/jobs)\n"} {"downloads": 341161, "id": "deepset/minilm-uncased-squad2", "likes": 27, "pipeline_tag": "question-answering", "task": "question-answering", "meta": {"language": "en", "license": "cc-by-4.0", "datasets": ["squad_v2"], "model-index": [{"name": "deepset/minilm-uncased-squad2", "results": [{"task": {"type": "question-answering", "name": "Question Answering"}, "dataset": {"name": "squad_v2", "type": "squad_v2", "config": "squad_v2", "split": "validation"}, "metrics": [{"type": "exact_match", "value": 76.1921, "name": "Exact Match", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiNmViZTQ3YTBjYTc3ZDQzYmI1Mzk3MTAxM2MzNjdmMTc0MWY4Yzg2MWU3NGQ1MDJhZWI2NzY0YWYxZTY2OTgzMiIsInZlcnNpb24iOjF9.s4XCRs_pvW__LJ57dpXAEHD6NRsQ3XaFrM1xaguS6oUs5fCN77wNNc97scnfoPXT18A8RAn0cLTNivfxZm0oBA"}, {"type": "f1", "value": 79.5483, "name": "F1", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiZmJlYTIyOTg2NjMyMzg4NzNlNGIzMTY2NDVkMjg0ODdiOWRmYjVkZDYyZjBjNWNiNTBhNjcwOWUzMDM4ZWJiZiIsInZlcnNpb24iOjF9.gxpwIBBA3_5xPi-TaZcqWNnGgCiHzxaUNgrS2jucxoVWGxhBtnPdwKVCxLleQoDDZenAXB3Yh71zMP3xTSeHCw"}]}]}]}, "description": "\n\n# MiniLM-L12-H384-uncased for QA\n\n## Overview\n**Language model:** microsoft/MiniLM-L12-H384-uncased \n**Language:** English \n**Downstream-task:** Extractive QA \n**Training data:** SQuAD 2.0 \n**Eval data:** SQuAD 2.0 \n**Code:** See [example](https://github.com/deepset-ai/FARM/blob/master/examples/question_answering.py) in [FARM](https://github.com/deepset-ai/FARM/blob/master/examples/question_answering.py) \n**Infrastructure**: 1x Tesla v100 \n\n## Hyperparameters\n\n```\nseed=42\nbatch_size = 12\nn_epochs = 4\nbase_LM_model = \"microsoft/MiniLM-L12-H384-uncased\"\nmax_seq_len = 384\nlearning_rate = 4e-5\nlr_schedule = LinearWarmup\nwarmup_proportion = 0.2\ndoc_stride=128\nmax_query_length=64\ngrad_acc_steps=4\n```\n\n## Performance\nEvaluated on the SQuAD 2.0 dev set with the [official eval script](https://worksheets.codalab.org/rest/bundles/0x6b567e1cf2e041ec80d7098f031c5c9e/contents/blob/).\n```\n\"exact\": 76.13071675229513,\n\"f1\": 79.49786500219953,\n\"total\": 11873,\n\"HasAns_exact\": 78.35695006747639,\n\"HasAns_f1\": 85.10090269418276,\n\"HasAns_total\": 5928,\n\"NoAns_exact\": 73.91084945332211,\n\"NoAns_f1\": 73.91084945332211,\n\"NoAns_total\": 5945\n```\n\n## Usage\n\n### In Transformers\n```python\nfrom transformers import AutoModelForQuestionAnswering, AutoTokenizer, pipeline\n\nmodel_name = \"deepset/minilm-uncased-squad2\"\n\n# a) Get predictions\nnlp = pipeline('question-answering', model=model_name, tokenizer=model_name)\nQA_input = {\n 'question': 'Why is model conversion important?',\n 'context': 'The option to convert models between FARM and transformers gives freedom to the user and let people easily switch between frameworks.'\n}\nres = nlp(QA_input)\n\n# b) Load model & tokenizer\nmodel = AutoModelForQuestionAnswering.from_pretrained(model_name)\ntokenizer = AutoTokenizer.from_pretrained(model_name)\n```\n\n### In FARM\n\n```python\nfrom farm.modeling.adaptive_model import AdaptiveModel\nfrom farm.modeling.tokenization import Tokenizer\nfrom farm.infer import Inferencer\n\nmodel_name = \"deepset/minilm-uncased-squad2\"\n\n# a) Get predictions\nnlp = Inferencer.load(model_name, task_type=\"question_answering\")\nQA_input = [{\"questions\": [\"Why is model conversion important?\"],\n \"text\": \"The option to convert models between FARM and transformers gives freedom to the user and let people easily switch between frameworks.\"}]\nres = nlp.inference_from_dicts(dicts=QA_input)\n\n# b) Load model & tokenizer\nmodel = AdaptiveModel.convert_from_transformers(model_name, device=\"cpu\", task_type=\"question_answering\")\ntokenizer = Tokenizer.load(model_name)\n```\n\n### In haystack\nFor doing QA at scale (i.e. many docs instead of single paragraph), you can load the model also in [haystack](https://github.com/deepset-ai/haystack/):\n```python\nreader = FARMReader(model_name_or_path=\"deepset/minilm-uncased-squad2\")\n# or\nreader = TransformersReader(model=\"deepset/minilm-uncased-squad2\",tokenizer=\"deepset/minilm-uncased-squad2\")\n```\n\n\n## Authors\n**Vaishali Pal:** vaishali.pal@deepset.ai \n**Branden Chan:** branden.chan@deepset.ai \n**Timo M\u00f6ller:** timo.moeller@deepset.ai \n**Malte Pietsch:** malte.pietsch@deepset.ai \n**Tanay Soni:** tanay.soni@deepset.ai \n\n## About us\n![deepset logo](https://workablehr.s3.amazonaws.com/uploads/account/logo/476306/logo)\nWe bring NLP to the industry via open source!\nOur focus: Industry specific language models & large scale QA systems.\n\nSome of our work: \n- [German BERT (aka \"bert-base-german-cased\")](https://deepset.ai/german-bert)\n- [GermanQuAD and GermanDPR datasets and models (aka \"gelectra-base-germanquad\", \"gbert-base-germandpr\")](https://deepset.ai/germanquad)\n- [FARM](https://github.com/deepset-ai/FARM)\n- [Haystack](https://github.com/deepset-ai/haystack/)\n\nGet in touch:\n[Twitter](https://twitter.com/deepset_ai) | [LinkedIn](https://www.linkedin.com/company/deepset-ai/) | [Discord](https://haystack.deepset.ai/community/join) | [GitHub Discussions](https://github.com/deepset-ai/haystack/discussions) | [Website](https://deepset.ai)\n\nBy the way: [we're hiring!](http://www.deepset.ai/jobs)\n"} {"downloads": 55106, "id": "distilbert-base-uncased-distilled-squad", "likes": 25, "pipeline_tag": "question-answering", "task": "question-answering", "meta": {"language": "en", "datasets": ["squad"], "widget": [{"text": "Which name is also used to describe the Amazon rainforest in English?", "context": "The Amazon rainforest (Portuguese: Floresta Amaz\u00f4nica or Amaz\u00f4nia; Spanish: Selva Amaz\u00f3nica, Amazon\u00eda or usually Amazonia; French: For\u00eat amazonienne; Dutch: Amazoneregenwoud), also known in English as Amazonia or the Amazon Jungle, is a moist broadleaf forest that covers most of the Amazon basin of South America. This basin encompasses 7,000,000 square kilometres (2,700,000 sq mi), of which 5,500,000 square kilometres (2,100,000 sq mi) are covered by the rainforest. This region includes territory belonging to nine nations. The majority of the forest is contained within Brazil, with 60% of the rainforest, followed by Peru with 13%, Colombia with 10%, and with minor amounts in Venezuela, Ecuador, Bolivia, Guyana, Suriname and French Guiana. States or departments in four nations contain \"Amazonas\" in their names. The Amazon represents over half of the planet's remaining rainforests, and comprises the largest and most biodiverse tract of tropical rainforest in the world, with an estimated 390 billion individual trees divided into 16,000 species."}, {"text": "How many square kilometers of rainforest is covered in the basin?", "context": "The Amazon rainforest (Portuguese: Floresta Amaz\u00f4nica or Amaz\u00f4nia; Spanish: Selva Amaz\u00f3nica, Amazon\u00eda or usually Amazonia; French: For\u00eat amazonienne; Dutch: Amazoneregenwoud), also known in English as Amazonia or the Amazon Jungle, is a moist broadleaf forest that covers most of the Amazon basin of South America. This basin encompasses 7,000,000 square kilometres (2,700,000 sq mi), of which 5,500,000 square kilometres (2,100,000 sq mi) are covered by the rainforest. This region includes territory belonging to nine nations. The majority of the forest is contained within Brazil, with 60% of the rainforest, followed by Peru with 13%, Colombia with 10%, and with minor amounts in Venezuela, Ecuador, Bolivia, Guyana, Suriname and French Guiana. States or departments in four nations contain \"Amazonas\" in their names. The Amazon represents over half of the planet's remaining rainforests, and comprises the largest and most biodiverse tract of tropical rainforest in the world, with an estimated 390 billion individual trees divided into 16,000 species."}], "license": "apache-2.0"}, "description": "\n\n# DistilBERT base uncased distilled SQuAD\n\n## Table of Contents\n- [Model Details](#model-details)\n- [How To Get Started With the Model](#how-to-get-started-with-the-model)\n- [Uses](#uses)\n- [Risks, Limitations and Biases](#risks-limitations-and-biases)\n- [Training](#training)\n- [Evaluation](#evaluation)\n- [Environmental Impact](#environmental-impact)\n- [Technical Specifications](#technical-specifications)\n- [Citation Information](#citation-information)\n- [Model Card Authors](#model-card-authors)\n\n## Model Details\n\n**Model Description:** The DistilBERT model was proposed in the blog post [Smaller, faster, cheaper, lighter: Introducing DistilBERT, adistilled version of BERT](https://medium.com/huggingface/distilbert-8cf3380435b5), and the paper [DistilBERT, adistilled version of BERT: smaller, faster, cheaper and lighter](https://arxiv.org/abs/1910.01108). DistilBERT is a small, fast, cheap and light Transformer model trained by distilling BERT base. It has 40% less parameters than *bert-base-uncased*, runs 60% faster while preserving over 95% of BERT's performances as measured on the GLUE language understanding benchmark.\n\nThis model is a fine-tune checkpoint of [DistilBERT-base-uncased](https://huggingface.co/distilbert-base-uncased), fine-tuned using (a second step of) knowledge distillation on [SQuAD v1.1](https://huggingface.co/datasets/squad). \n\n- **Developed by:** Hugging Face\n- **Model Type:** Transformer-based language model\n- **Language(s):** English \n- **License:** Apache 2.0\n- **Related Models:** [DistilBERT-base-uncased](https://huggingface.co/distilbert-base-uncased)\n- **Resources for more information:**\n - See [this repository](https://github.com/huggingface/transformers/tree/main/examples/research_projects/distillation) for more about Distil\\* (a class of compressed models including this model)\n - See [Sanh et al. (2019)](https://arxiv.org/abs/1910.01108) for more information about knowledge distillation and the training procedure\n\n## How to Get Started with the Model \n\nUse the code below to get started with the model. \n\n```python\n>>> from transformers import pipeline\n>>> question_answerer = pipeline(\"question-answering\", model='distilbert-base-uncased-distilled-squad')\n\n>>> context = r\"\"\"\n... Extractive Question Answering is the task of extracting an answer from a text given a question. An example of a\n... question answering dataset is the SQuAD dataset, which is entirely based on that task. If you would like to fine-tune\n... a model on a SQuAD task, you may leverage the examples/pytorch/question-answering/run_squad.py script.\n... \"\"\"\n\n>>> result = question_answerer(question=\"What is a good example of a question answering dataset?\", context=context)\n>>> print(\n... f\"Answer: '{result['answer']}', score: {round(result['score'], 4)}, start: {result['start']}, end: {result['end']}\"\n...)\n\nAnswer: 'SQuAD dataset', score: 0.4704, start: 147, end: 160\n```\n\nHere is how to use this model in PyTorch:\n\n```python\nfrom transformers import DistilBertTokenizer, DistilBertForQuestionAnswering\nimport torch\ntokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-uncased-distilled-squad')\nmodel = DistilBertForQuestionAnswering.from_pretrained('distilbert-base-uncased-distilled-squad')\n\nquestion, text = \"Who was Jim Henson?\", \"Jim Henson was a nice puppet\"\n\ninputs = tokenizer(question, text, return_tensors=\"pt\")\nwith torch.no_grad():\n outputs = model(**inputs)\n\nanswer_start_index = torch.argmax(outputs.start_logits)\nanswer_end_index = torch.argmax(outputs.end_logits)\n\npredict_answer_tokens = inputs.input_ids[0, answer_start_index : answer_end_index + 1]\ntokenizer.decode(predict_answer_tokens)\n```\n\nAnd in TensorFlow: \n\n```python\nfrom transformers import DistilBertTokenizer, TFDistilBertForQuestionAnswering\nimport tensorflow as tf\n\ntokenizer = DistilBertTokenizer.from_pretrained(\"distilbert-base-uncased-distilled-squad\")\nmodel = TFDistilBertForQuestionAnswering.from_pretrained(\"distilbert-base-uncased-distilled-squad\")\n\nquestion, text = \"Who was Jim Henson?\", \"Jim Henson was a nice puppet\"\n\ninputs = tokenizer(question, text, return_tensors=\"tf\")\noutputs = model(**inputs)\n\nanswer_start_index = int(tf.math.argmax(outputs.start_logits, axis=-1)[0])\nanswer_end_index = int(tf.math.argmax(outputs.end_logits, axis=-1)[0])\n\npredict_answer_tokens = inputs.input_ids[0, answer_start_index : answer_end_index + 1]\ntokenizer.decode(predict_answer_tokens)\n```\n\n## Uses\n\nThis model can be used for question answering.\n\n#### Misuse and Out-of-scope Use\n\nThe model should not be used to intentionally create hostile or alienating environments for people. In addition, the model was not trained to be factual or true representations of people or events, and therefore using the model to generate such content is out-of-scope for the abilities of this model.\n\n## Risks, Limitations and Biases\n\n**CONTENT WARNING: Readers should be aware that language generated by this model can be disturbing or offensive to some and can propagate historical and current stereotypes.**\n\nSignificant research has explored bias and fairness issues with language models (see, e.g., [Sheng et al. (2021)](https://aclanthology.org/2021.acl-long.330.pdf) and [Bender et al. (2021)](https://dl.acm.org/doi/pdf/10.1145/3442188.3445922)). Predictions generated by the model can include disturbing and harmful stereotypes across protected classes; identity characteristics; and sensitive, social, and occupational groups. For example:\n\n\n```python\n>>> from transformers import pipeline\n>>> question_answerer = pipeline(\"question-answering\", model='distilbert-base-uncased-distilled-squad')\n\n>>> context = r\"\"\"\n... Alice is sitting on the bench. Bob is sitting next to her.\n... \"\"\"\n\n>>> result = question_answerer(question=\"Who is the CEO?\", context=context)\n>>> print(\n... f\"Answer: '{result['answer']}', score: {round(result['score'], 4)}, start: {result['start']}, end: {result['end']}\"\n...)\n\nAnswer: 'Bob', score: 0.4183, start: 32, end: 35\n```\n\nUsers (both direct and downstream) should be made aware of the risks, biases and limitations of the model.\n\n## Training\n\n#### Training Data\n\nThe [distilbert-base-uncased model](https://huggingface.co/distilbert-base-uncased) model describes it's training data as: \n\n> DistilBERT pretrained on the same data as BERT, which is [BookCorpus](https://yknzhu.wixsite.com/mbweb), a dataset consisting of 11,038 unpublished books and [English Wikipedia](https://en.wikipedia.org/wiki/English_Wikipedia) (excluding lists, tables and headers).\n\nTo learn more about the SQuAD v1.1 dataset, see the [SQuAD v1.1 data card](https://huggingface.co/datasets/squad).\n\n#### Training Procedure\n\n##### Preprocessing\n\nSee the [distilbert-base-uncased model card](https://huggingface.co/distilbert-base-uncased) for further details.\n\n##### Pretraining\n\nSee the [distilbert-base-uncased model card](https://huggingface.co/distilbert-base-uncased) for further details. \n\n## Evaluation\n\nAs discussed in the [model repository](https://github.com/huggingface/transformers/blob/main/examples/research_projects/distillation/README.md)\n\n> This model reaches a F1 score of 86.9 on the [SQuAD v1.1] dev set (for comparison, Bert bert-base-uncased version reaches a F1 score of 88.5).\n\n## Environmental Impact\n\nCarbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). We present the hardware type and hours used based on the [associated paper](https://arxiv.org/pdf/1910.01108.pdf). Note that these details are just for training DistilBERT, not including the fine-tuning with SQuAD.\n\n- **Hardware Type:** 8 16GB V100 GPUs\n- **Hours used:** 90 hours\n- **Cloud Provider:** Unknown\n- **Compute Region:** Unknown\n- **Carbon Emitted:** Unknown\n\n## Technical Specifications\n\nSee the [associated paper](https://arxiv.org/abs/1910.01108) for details on the modeling architecture, objective, compute infrastructure, and training details.\n\n## Citation Information\n\n```bibtex\n@inproceedings{sanh2019distilbert,\n title={DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter},\n author={Sanh, Victor and Debut, Lysandre and Chaumond, Julien and Wolf, Thomas},\n booktitle={NeurIPS EMC^2 Workshop},\n year={2019}\n}\n```\n\nAPA: \n- Sanh, V., Debut, L., Chaumond, J., & Wolf, T. (2019). DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108.\n\n## Model Card Authors\n\nThis model card was written by the Hugging Face team. \n"} {"downloads": 5762, "id": "timpal0l/mdeberta-v3-base-squad2", "likes": 25, "pipeline_tag": "question-answering", "task": "question-answering", "meta": {"datasets": ["squad_v2"], "language": ["multilingual", "af", "am", "ar", "as", "az", "be", "bg", "bn", "br", "bs", "ca", "cs", "cy", "da", "de", "el", "en", "eo", "es", "et", "eu", "fa", "fi", "fr", "fy", "ga", "gd", "gl", "gu", "ha", "he", "hi", "hr", "hu", "hy", "id", "is", "it", "ja", "jv", "ka", "kk", "km", "kn", "ko", "ku", "ky", "la", "lo", "lt", "lv", "mg", "mk", "ml", "mn", "mr", "ms", "my", "ne", "nl", false, "om", "or", "pa", "pl", "ps", "pt", "ro", "ru", "sa", "sd", "si", "sk", "sl", "so", "sq", "sr", "su", "sv", "sw", "ta", "te", "th", "tl", "tr", "ug", "uk", "ur", "uz", "vi", "xh", "yi", "zh"], "tags": ["deberta", "deberta-v3", "mdeberta", "question-answering"], "thumbnail": "https://huggingface.co/front/thumbnails/microsoft.png", "license": "mit"}, "description": "\n## This model can be used for Extractive QA\nIt has been finetuned for 3 epochs on [SQuAD2.0](https://rajpurkar.github.io/SQuAD-explorer/).\n\n## Evaluation on SQuAD2.0 dev set\n```\n{\n \"epoch\": 3.0,\n \"eval_HasAns_exact\": 79.65587044534414,\n \"eval_HasAns_f1\": 85.91387795001529,\n \"eval_HasAns_total\": 5928,\n \"eval_NoAns_exact\": 82.10260723296888,\n \"eval_NoAns_f1\": 82.10260723296888,\n \"eval_NoAns_total\": 5945,\n \"eval_best_exact\": 80.8809904826076,\n \"eval_best_exact_thresh\": 0.0,\n \"eval_best_f1\": 84.00551406448994,\n \"eval_best_f1_thresh\": 0.0,\n \"eval_exact\": 80.8809904826076,\n \"eval_f1\": 84.00551406449004,\n \"eval_samples\": 12508,\n \"eval_total\": 11873,\n \"train_loss\": 0.7729689576483615,\n \"train_runtime\": 9118.953,\n \"train_samples\": 134891,\n \"train_samples_per_second\": 44.377,\n \"train_steps_per_second\": 0.925\n}\n``` \n## DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing\n\n[DeBERTa](https://arxiv.org/abs/2006.03654) improves the BERT and RoBERTa models using disentangled attention and enhanced mask decoder. With those two improvements, DeBERTa out perform RoBERTa on a majority of NLU tasks with 80GB training data. \n\nIn [DeBERTa V3](https://arxiv.org/abs/2111.09543), we further improved the efficiency of DeBERTa using ELECTRA-Style pre-training with Gradient Disentangled Embedding Sharing. Compared to DeBERTa, our V3 version significantly improves the model performance on downstream tasks. You can find more technique details about the new model from our [paper](https://arxiv.org/abs/2111.09543).\n\nPlease check the [official repository](https://github.com/microsoft/DeBERTa) for more implementation details and updates.\n\nmDeBERTa is multilingual version of DeBERTa which use the same structure as DeBERTa and was trained with CC100 multilingual data.\nThe mDeBERTa V3 base model comes with 12 layers and a hidden size of 768. It has 86M backbone parameters with a vocabulary containing 250K tokens which introduces 190M parameters in the Embedding layer. This model was trained using the 2.5T CC100 data as XLM-R.\n"} {"downloads": 9884, "id": "deepset/deberta-v3-large-squad2", "likes": 24, "pipeline_tag": "question-answering", "task": "question-answering", "meta": {"language": "en", "license": "cc-by-4.0", "tags": ["deberta", "deberta-v3", "deberta-v3-large"], "datasets": ["squad_v2"], "model-index": [{"name": "deepset/deberta-v3-large-squad2", "results": [{"task": {"type": "question-answering", "name": "Question Answering"}, "dataset": {"name": "squad_v2", "type": "squad_v2", "config": "squad_v2", "split": "validation"}, "metrics": [{"type": "exact_match", "value": 88.0876, "name": "Exact Match", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiZmE0MWEwNjBkNTA1MmU0ZDkyYTA1OGEwNzY3NGE4NWU4NGI0NTQzNjRlNjY1NGRmNDU2MjA0NjU1N2JlZmNhYiIsInZlcnNpb24iOjF9.PnBF_vD0HujNBSShGJzsJnjmiBP_qT8xb2E7ORmpKfNspKXEuN_pBk9iV0IHRzdqOSyllcxlCv93XMPblNjWDw"}, {"type": "f1", "value": 91.1623, "name": "F1", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiZDBkNDUzZmNkNDQwOGRkMmVlZjkxZWVlMzk3NzFmMGIxMTFmMjZlZDcyOWFiMjljNjM5MThlZDM4OWRmNzMwOCIsInZlcnNpb24iOjF9.bacyetziNI2DxO67GWpTyeRPXqF1POkyv00wEHXlyZu71pZngsNpZyrnuj2aJlCqQwHGnF_lT2ysaXKHprQRBg"}]}, {"task": {"type": "question-answering", "name": "Question Answering"}, "dataset": {"name": "squad", "type": "squad", "config": "plain_text", "split": "validation"}, "metrics": [{"type": "exact_match", "value": 89.2366, "name": "Exact Match", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiMjQ1Yjk3YTdiYTY1NmYxMTI1ZGZlMjRkNTlhZTkyNjRkNjgxYWJiNDk2NzE3NjAyYmY3YmRjNjg4YmEyNDkyYyIsInZlcnNpb24iOjF9.SEWyqX_FPQJOJt2KjOCNgQ2giyVeLj5bmLI5LT_Pfo33tbWPWD09TySYdsthaVTjUGT5DvDzQLASSwBH05FyBw"}, {"type": "f1", "value": 95.0569, "name": "F1", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiY2QyODQ1NWVlYjQxMjA0YTgyNmQ2NmIxOWY3MDRmZjE3ZWI5Yjc4ZDE4NzA2YjE2YTE1YTBlNzNiYmNmNzI3NCIsInZlcnNpb24iOjF9.NcXEc9xoggV76w1bQKxuJDYbOTxFzdny2k-85_b6AIMtfpYV3rGR1Z5YF6tVY2jyp7mgm5Jd5YSgGI3NvNE-CQ"}]}]}]}, "description": "\n# deberta-v3-large for QA \n\nThis is the [deberta-v3-large](https://huggingface.co/microsoft/deberta-v3-large) model, fine-tuned using the [SQuAD2.0](https://huggingface.co/datasets/squad_v2) dataset. It's been trained on question-answer pairs, including unanswerable questions, for the task of Question Answering. \n\n\n## Overview\n**Language model:** deberta-v3-large \n**Language:** English \n**Downstream-task:** Extractive QA \n**Training data:** SQuAD 2.0 \n**Eval data:** SQuAD 2.0 \n**Code:** See [an example QA pipeline on Haystack](https://haystack.deepset.ai/tutorials/first-qa-system) \n**Infrastructure**: 1x NVIDIA A10G\n\n## Hyperparameters\n\n```\nbatch_size = 2\ngrad_acc_steps = 32\nn_epochs = 6\nbase_LM_model = \"microsoft/deberta-v3-large\"\nmax_seq_len = 512\nlearning_rate = 7e-6\nlr_schedule = LinearWarmup\nwarmup_proportion = 0.2\ndoc_stride=128\nmax_query_length=64\n``` \n\n## Usage\n\n### In Haystack\nHaystack is an NLP framework by deepset. You can use this model in a Haystack pipeline to do question answering at scale (over many documents). To load the model in [Haystack](https://github.com/deepset-ai/haystack/):\n```python\nreader = FARMReader(model_name_or_path=\"deepset/deberta-v3-large-squad2\")\n# or \nreader = TransformersReader(model_name_or_path=\"deepset/deberta-v3-large-squad2\",tokenizer=\"deepset/deberta-v3-large-squad2\")\n```\n\n### In Transformers\n```python\nfrom transformers import AutoModelForQuestionAnswering, AutoTokenizer, pipeline\n\nmodel_name = \"deepset/deberta-v3-large-squad2\"\n\n# a) Get predictions\nnlp = pipeline('question-answering', model=model_name, tokenizer=model_name)\nQA_input = {\n 'question': 'Why is model conversion important?',\n 'context': 'The option to convert models between FARM and transformers gives freedom to the user and let people easily switch between frameworks.'\n}\nres = nlp(QA_input)\n\n# b) Load model & tokenizer\nmodel = AutoModelForQuestionAnswering.from_pretrained(model_name)\ntokenizer = AutoTokenizer.from_pretrained(model_name)\n```\n\n## Performance\nEvaluated on the SQuAD 2.0 dev set with the [official eval script](https://worksheets.codalab.org/rest/bundles/0x6b567e1cf2e041ec80d7098f031c5c9e/contents/blob/).\n\n```\n\"exact\": 87.6105449338836,\n\"f1\": 90.75307008866517,\n\n\"total\": 11873,\n\"HasAns_exact\": 84.37921727395411,\n\"HasAns_f1\": 90.6732795483674,\n\"HasAns_total\": 5928,\n\"NoAns_exact\": 90.83263246425568,\n\"NoAns_f1\": 90.83263246425568,\n\"NoAns_total\": 5945\n```\n\n## About us\n
\n
\n \"\"\n
\n
\n \"\"\n
\n
\n\n[deepset](http://deepset.ai/) is the company behind the open-source NLP framework [Haystack](https://haystack.deepset.ai/) which is designed to help you build production ready NLP systems that use: Question answering, summarization, ranking etc.\n\n\nSome of our other work: \n- [Distilled roberta-base-squad2 (aka \"tinyroberta-squad2\")]([https://huggingface.co/deepset/tinyroberta-squad2)\n- [German BERT (aka \"bert-base-german-cased\")](https://deepset.ai/german-bert)\n- [GermanQuAD and GermanDPR datasets and models (aka \"gelectra-base-germanquad\", \"gbert-base-germandpr\")](https://deepset.ai/germanquad)\n\n## Get in touch and join the Haystack community\n\n

For more info on Haystack, visit our GitHub repo and Documentation. \n\nWe also have a Discord community open to everyone!

\n\n[Twitter](https://twitter.com/deepset_ai) | [LinkedIn](https://www.linkedin.com/company/deepset-ai/) | [Discord](https://haystack.deepset.ai/community/join) | [GitHub Discussions](https://github.com/deepset-ai/haystack/discussions) | [Website](https://deepset.ai)\n\nBy the way: [we're hiring!](http://www.deepset.ai/jobs) \n"} {"downloads": 56763, "id": "deepset/tinyroberta-squad2", "likes": 22, "pipeline_tag": "question-answering", "task": "question-answering", "meta": {"language": "en", "license": "cc-by-4.0", "datasets": ["squad_v2"], "model-index": [{"name": "deepset/tinyroberta-squad2", "results": [{"task": {"type": "question-answering", "name": "Question Answering"}, "dataset": {"name": "squad_v2", "type": "squad_v2", "config": "squad_v2", "split": "validation"}, "metrics": [{"type": "exact_match", "value": 78.8627, "name": "Exact Match", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiNDNlZDU4ODAxMzY5NGFiMTMyZmQ1M2ZhZjMyODA1NmFlOGMxNzYxNTA4OGE5YTBkZWViZjBkNGQ2ZmMxZjVlMCIsInZlcnNpb24iOjF9.Wgu599r6TvgMLTrHlLMVAbUtKD_3b70iJ5QSeDQ-bRfUsVk6Sz9OsJCp47riHJVlmSYzcDj_z_3jTcUjCFFXBg"}, {"type": "f1", "value": 82.0355, "name": "F1", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiOTFkMzEzMWNiZDRhMGZlODhkYzcwZTZiMDFjZDg2YjllZmUzYWM5NTgwNGQ2NGYyMDk2ZGQwN2JmMTE5NTc3YiIsInZlcnNpb24iOjF9.ChgaYpuRHd5WeDFjtiAHUyczxtoOD_M5WR8834jtbf7wXhdGOnZKdZ1KclmhoI5NuAGc1NptX-G0zQ5FTHEcBA"}]}]}]}, "description": "\n\n# tinyroberta-squad2\n\nThis is the *distilled* version of the [deepset/roberta-base-squad2](https://huggingface.co/deepset/roberta-base-squad2) model. This model has a comparable prediction quality and runs at twice the speed of the base model.\n\n## Overview\n**Language model:** tinyroberta-squad2 \n**Language:** English \n**Downstream-task:** Extractive QA \n**Training data:** SQuAD 2.0 \n**Eval data:** SQuAD 2.0 \n**Code:** See [an example QA pipeline on Haystack](https://haystack.deepset.ai/tutorials/first-qa-system) \n**Infrastructure**: 4x Tesla v100\n\n## Hyperparameters\n\n```\nbatch_size = 96\nn_epochs = 4\nbase_LM_model = \"deepset/tinyroberta-squad2-step1\"\nmax_seq_len = 384\nlearning_rate = 3e-5\nlr_schedule = LinearWarmup\nwarmup_proportion = 0.2\ndoc_stride = 128\nmax_query_length = 64\ndistillation_loss_weight = 0.75\ntemperature = 1.5\nteacher = \"deepset/robert-large-squad2\"\n``` \n\n## Distillation\nThis model was distilled using the TinyBERT approach described in [this paper](https://arxiv.org/pdf/1909.10351.pdf) and implemented in [haystack](https://github.com/deepset-ai/haystack).\nFirstly, we have performed intermediate layer distillation with roberta-base as the teacher which resulted in [deepset/tinyroberta-6l-768d](https://huggingface.co/deepset/tinyroberta-6l-768d).\nSecondly, we have performed task-specific distillation with [deepset/roberta-base-squad2](https://huggingface.co/deepset/roberta-base-squad2) as the teacher for further intermediate layer distillation on an augmented version of SQuADv2 and then with [deepset/roberta-large-squad2](https://huggingface.co/deepset/roberta-large-squad2) as the teacher for prediction layer distillation. \n\n## Usage\n\n### In Haystack\nHaystack is an NLP framework by deepset. You can use this model in a Haystack pipeline to do question answering at scale (over many documents). To load the model in [Haystack](https://github.com/deepset-ai/haystack/):\n\n```python\nreader = FARMReader(model_name_or_path=\"deepset/tinyroberta-squad2\")\n# or \nreader = TransformersReader(model_name_or_path=\"deepset/tinyroberta-squad2\")\n```\n\n### In Transformers\n```python\nfrom transformers import AutoModelForQuestionAnswering, AutoTokenizer, pipeline\n\nmodel_name = \"deepset/tinyroberta-squad2\"\n\n# a) Get predictions\nnlp = pipeline('question-answering', model=model_name, tokenizer=model_name)\nQA_input = {\n 'question': 'Why is model conversion important?',\n 'context': 'The option to convert models between FARM and transformers gives freedom to the user and let people easily switch between frameworks.'\n}\nres = nlp(QA_input)\n\n# b) Load model & tokenizer\nmodel = AutoModelForQuestionAnswering.from_pretrained(model_name)\ntokenizer = AutoTokenizer.from_pretrained(model_name)\n```\n\n## Performance\nEvaluated on the SQuAD 2.0 dev set with the [official eval script](https://worksheets.codalab.org/rest/bundles/0x6b567e1cf2e041ec80d7098f031c5c9e/contents/blob/).\n\n```\n\"exact\": 78.69114798281817,\n\"f1\": 81.9198998536977,\n\n\"total\": 11873,\n\"HasAns_exact\": 76.19770580296895,\n\"HasAns_f1\": 82.66446878592329,\n\"HasAns_total\": 5928,\n\"NoAns_exact\": 81.17746005046257,\n\"NoAns_f1\": 81.17746005046257,\n\"NoAns_total\": 5945\n```\n\n## Authors\n**Branden Chan:** branden.chan@deepset.ai \n**Timo M\u00f6ller:** timo.moeller@deepset.ai \n**Malte Pietsch:** malte.pietsch@deepset.ai \n**Tanay Soni:** tanay.soni@deepset.ai \n**Michel Bartels:** michel.bartels@deepset.ai\n\n## About us\n\n
\n
\n \"\"\n
\n
\n \"\"\n
\n
\n\n[deepset](http://deepset.ai/) is the company behind the open-source NLP framework [Haystack](https://haystack.deepset.ai/) which is designed to help you build production ready NLP systems that use: Question answering, summarization, ranking etc.\n\n\nSome of our other work: \n- [roberta-base-squad2]([https://huggingface.co/deepset/roberta-base-squad2)\n- [German BERT (aka \"bert-base-german-cased\")](https://deepset.ai/german-bert)\n- [GermanQuAD and GermanDPR datasets and models (aka \"gelectra-base-germanquad\", \"gbert-base-germandpr\")](https://deepset.ai/germanquad)\n\n## Get in touch and join the Haystack community\n\n

For more info on Haystack, visit our GitHub repo and Documentation. \n\nWe also have a Discord community open to everyone!

\n\n[Twitter](https://twitter.com/deepset_ai) | [LinkedIn](https://www.linkedin.com/company/deepset-ai/) | [Discord](https://haystack.deepset.ai/community) | [GitHub Discussions](https://github.com/deepset-ai/haystack/discussions) | [Website](https://deepset.ai)\n\nBy the way: [we're hiring!](http://www.deepset.ai/jobs)"} {"downloads": 6433, "id": "mrm8488/distill-bert-base-spanish-wwm-cased-finetuned-spa-squad2-es", "likes": 20, "pipeline_tag": "question-answering", "task": "question-answering", "meta": {"language": "es", "thumbnail": "https://i.imgur.com/jgBdimh.png", "license": "apache-2.0"}, "description": "\n\n# BETO (Spanish BERT) + Spanish SQuAD2.0 + distillation using 'bert-base-multilingual-cased' as teacher\n\nThis model is a fine-tuned on [SQuAD-es-v2.0](https://github.com/ccasimiro88/TranslateAlignRetrieve) and **distilled** version of [BETO](https://github.com/dccuchile/beto) for **Q&A**.\n\nDistillation makes the model **smaller, faster, cheaper and lighter** than [bert-base-spanish-wwm-cased-finetuned-spa-squad2-es](https://github.com/huggingface/transformers/blob/master/model_cards/mrm8488/bert-base-spanish-wwm-cased-finetuned-spa-squad2-es/README.md)\n\nThis model was fine-tuned on the same dataset but using **distillation** during the process as mentioned above (and one more train epoch).\n\nThe **teacher model** for the distillation was `bert-base-multilingual-cased`. It is the same teacher used for `distilbert-base-multilingual-cased` AKA [**DistilmBERT**](https://github.com/huggingface/transformers/tree/master/examples/distillation) (on average is twice as fast as **mBERT-base**).\n\n## Details of the downstream task (Q&A) - Dataset\n\n
\n\n[SQuAD-es-v2.0](https://github.com/ccasimiro88/TranslateAlignRetrieve)\n\n| Dataset | # Q&A |\n| "} {"downloads": 2348, "id": "pierreguillou/bert-large-cased-squad-v1.1-portuguese", "likes": 20, "pipeline_tag": "question-answering", "task": "question-answering", "meta": {"language": "pt", "license": "mit", "tags": ["question-answering", "bert", "bert-large", "pytorch"], "datasets": ["brWaC", "squad", "squad_v1_pt"], "metrics": ["squad"], "widget": [{"text": "Quando come\u00e7ou a pandemia de Covid-19 no mundo?", "context": "A pandemia de COVID-19, tamb\u00e9m conhecida como pandemia de coronav\u00edrus, \u00e9 uma pandemia em curso de COVID-19, uma doen\u00e7a respirat\u00f3ria causada pelo coronav\u00edrus da s\u00edndrome respirat\u00f3ria aguda grave 2 (SARS-CoV-2). O v\u00edrus tem origem zoon\u00f3tica e o primeiro caso conhecido da doen\u00e7a remonta a dezembro de 2019 em Wuhan, na China."}, {"text": "Onde foi descoberta a Covid-19?", "context": "A pandemia de COVID-19, tamb\u00e9m conhecida como pandemia de coronav\u00edrus, \u00e9 uma pandemia em curso de COVID-19, uma doen\u00e7a respirat\u00f3ria causada pelo coronav\u00edrus da s\u00edndrome respirat\u00f3ria aguda grave 2 (SARS-CoV-2). O v\u00edrus tem origem zoon\u00f3tica e o primeiro caso conhecido da doen\u00e7a remonta a dezembro de 2019 em Wuhan, na China."}]}, "description": "\n\n# Portuguese BERT large cased QA (Question Answering), finetuned on SQUAD v1.1\n\n![Exemple of what can do the Portuguese BERT large cased QA (Question Answering), finetuned on SQUAD v1.1](https://miro.medium.com/max/5256/1*QxyeAjT2V1OfE2B6nEcs3w.png)\n\n## Introduction\n\nThe model was trained on the dataset SQUAD v1.1 in portuguese from the [Deep Learning Brasil group](http://www.deeplearningbrasil.com.br/). \n\nThe language model used is the [BERTimbau Large](https://huggingface.co/neuralmind/bert-large-portuguese-cased) (aka \"bert-large-portuguese-cased\") from [Neuralmind.ai](https://neuralmind.ai/): BERTimbau is a pretrained BERT model for Brazilian Portuguese that achieves state-of-the-art performances on three downstream NLP tasks: Named Entity Recognition, Sentence Textual Similarity and Recognizing Textual Entailment. It is available in two sizes: Base and Large.\n\n## Informations on the method used\n\nAll the informations are in the blog post : [NLP | Como treinar um modelo de Question Answering em qualquer linguagem baseado no BERT large, melhorando o desempenho do modelo utilizando o BERT base? (estudo de caso em portugu\u00eas)](https://medium.com/@pierre_guillou/nlp-como-treinar-um-modelo-de-question-answering-em-qualquer-linguagem-baseado-no-bert-large-1c899262dd96)\n\n## Notebook in GitHub\n\n[question_answering_BERT_large_cased_squad_v11_pt.ipynb](https://github.com/piegu/language-models/blob/master/question_answering_BERT_large_cased_squad_v11_pt.ipynb) ([nbviewer version](https://nbviewer.jupyter.org/github/piegu/language-models/blob/master/question_answering_BERT_large_cased_squad_v11_pt.ipynb))\n\n## Performance\n\nThe results obtained are the following:\n\n```\nf1 = 84.43 (against 82.50 for the base model)\nexact match = 72.68 (against 70.49 for the base model)\n```\n\n## How to use the model... with Pipeline\n\n```python\nimport transformers\nfrom transformers import pipeline\n\n# source: https://pt.wikipedia.org/wiki/Pandemia_de_COVID-19\ncontext = r\"\"\"\nA pandemia de COVID-19, tamb\u00e9m conhecida como pandemia de coronav\u00edrus, \u00e9 uma pandemia em curso de COVID-19, \numa doen\u00e7a respirat\u00f3ria causada pelo coronav\u00edrus da s\u00edndrome respirat\u00f3ria aguda grave 2 (SARS-CoV-2). \nO v\u00edrus tem origem zoon\u00f3tica e o primeiro caso conhecido da doen\u00e7a remonta a dezembro de 2019 em Wuhan, na China. \nEm 20 de janeiro de 2020, a Organiza\u00e7\u00e3o Mundial da Sa\u00fade (OMS) classificou o surto \ncomo Emerg\u00eancia de Sa\u00fade P\u00fablica de \u00c2mbito Internacional e, em 11 de mar\u00e7o de 2020, como pandemia. \nEm 18 de junho de 2021, 177 349 274 casos foram confirmados em 192 pa\u00edses e territ\u00f3rios, \ncom 3 840 181 mortes atribu\u00eddas \u00e0 doen\u00e7a, tornando-se uma das pandemias mais mortais da hist\u00f3ria.\nOs sintomas de COVID-19 s\u00e3o altamente vari\u00e1veis, variando de nenhum a doen\u00e7as com risco de morte. \nO v\u00edrus se espalha principalmente pelo ar quando as pessoas est\u00e3o perto umas das outras. \nEle deixa uma pessoa infectada quando ela respira, tosse, espirra ou fala e entra em outra pessoa pela boca, nariz ou olhos.\nEle tamb\u00e9m pode se espalhar atrav\u00e9s de superf\u00edcies contaminadas. \nAs pessoas permanecem contagiosas por at\u00e9 duas semanas e podem espalhar o v\u00edrus mesmo se forem assintom\u00e1ticas.\n\"\"\"\n\nmodel_name = 'pierreguillou/bert-large-cased-squad-v1.1-portuguese'\nnlp = pipeline(\"question-answering\", model=model_name)\n\nquestion = \"Quando come\u00e7ou a pandemia de Covid-19 no mundo?\"\n\nresult = nlp(question=question, context=context)\n\nprint(f\"Answer: '{result['answer']}', score: {round(result['score'], 4)}, start: {result['start']}, end: {result['end']}\")\n\n# Answer: 'dezembro de 2019', score: 0.5087, start: 290, end: 306\n```\n\n## How to use the model... with the Auto classes\n\n```python\nfrom transformers import AutoTokenizer, AutoModelForQuestionAnswering\n \ntokenizer = AutoTokenizer.from_pretrained(\"pierreguillou/bert-large-cased-squad-v1.1-portuguese\")\nmodel = AutoModelForQuestionAnswering.from_pretrained(\"pierreguillou/bert-large-cased-squad-v1.1-portuguese\")\n``` \n\nOr just clone the model repo:\n\n```python\ngit lfs install\ngit clone https://huggingface.co/pierreguillou/bert-large-cased-squad-v1.1-portuguese\n \n# if you want to clone without large files \u2013 just their pointers\n# prepend your git clone with the following env var:\n \nGIT_LFS_SKIP_SMUDGE=1\n``` \n\n## Limitations and bias\n\nThe training data used for this model come from Portuguese SQUAD. It could contain a lot of unfiltered content, which is far from neutral, and biases.\n\n## Author\n\nPortuguese BERT large cased QA (Question Answering), finetuned on SQUAD v1.1 was trained and evaluated by [Pierre GUILLOU](https://www.linkedin.com/in/pierreguillou/) thanks to the Open Source code, platforms and advices of many organizations ([link to the list](https://medium.com/@pierre_guillou/nlp-como-treinar-um-modelo-de-question-answering-em-qualquer-linguagem-baseado-no-bert-large-1c899262dd96#c2f5)). In particular: [Hugging Face](https://huggingface.co/), [Neuralmind.ai](https://neuralmind.ai/), [Deep Learning Brasil group](http://www.deeplearningbrasil.com.br/) and [AI Lab](https://ailab.unb.br/).\n\n## Citation\nIf you use our work, please cite:\n\n```bibtex\n@inproceedings{pierreguillou2021bertlargecasedsquadv11portuguese,\n title={Portuguese BERT large cased QA (Question Answering), finetuned on SQUAD v1.1},\n author={Pierre Guillou},\n year={2021}\n}\n```"} {"downloads": 2616, "id": "deepset/xlm-roberta-base-squad2", "likes": 19, "pipeline_tag": "question-answering", "task": "question-answering", "meta": {"license": "cc-by-4.0", "datasets": ["squad_v2"], "model-index": [{"name": "deepset/xlm-roberta-base-squad2", "results": [{"task": {"type": "question-answering", "name": "Question Answering"}, "dataset": {"name": "squad_v2", "type": "squad_v2", "config": "squad_v2", "split": "validation"}, "metrics": [{"type": "exact_match", "value": 74.0354, "name": "Exact Match", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiMWMxNWQ2ODJkNWIzZGQwOWI4OTZjYjU3ZDVjZGQzMjI5MzljNjliZTY4Mzk4YTk4OTMzZWYxZjUxYmZhYTBhZSIsInZlcnNpb24iOjF9.eEeFYYJ30BfJDd-JYfI1kjlxJrRF6OFtj2GnkTCOO4kqX31inFy8ptDWusVlLFsUphm4dNWfTKXC5e-gytLBDA"}, {"type": "f1", "value": 77.1833, "name": "F1", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiMjg4MjNkOTA4Y2I5OGFlYTk1NWZjMWFlNjI5M2Y0NGZhMThhN2M4YmY2Y2RhZjcwYzU0MGNjN2RkZDljZmJmNiIsInZlcnNpb24iOjF9.TX42YMXpH4e0qu7cC4ARDlZWSkd55dwwyeyFXmOlXERNnEicDuFBCsy8WHLaqQCLUkzODJ22Hw4zhv81rwnlAQ"}]}]}]}, "description": "\n\n# Multilingual XLM-RoBERTa base for QA on various languages \n\n## Overview\n**Language model:** xlm-roberta-base \n**Language:** Multilingual \n**Downstream-task:** Extractive QA \n**Training data:** SQuAD 2.0 \n**Eval data:** SQuAD 2.0 dev set - German MLQA - German XQuAD \n**Code:** See [example](https://github.com/deepset-ai/FARM/blob/master/examples/question_answering.py) in [FARM](https://github.com/deepset-ai/FARM/blob/master/examples/question_answering.py) \n**Infrastructure**: 4x Tesla v100\n\n## Hyperparameters\n\n```\nbatch_size = 22*4\nn_epochs = 2\nmax_seq_len=256,\ndoc_stride=128,\nlearning_rate=2e-5,\n``` \n\nCorresponding experiment logs in mlflow: [link](https://public-mlflow.deepset.ai/#/experiments/2/runs/b25ec75e07614accb3f1ce03d43dbe08)\n\n\n## Performance\nEvaluated on the SQuAD 2.0 dev set with the [official eval script](https://worksheets.codalab.org/rest/bundles/0x6b567e1cf2e041ec80d7098f031c5c9e/contents/blob/).\n```\n\"exact\": 73.91560683904657\n\"f1\": 77.14103746689592\n```\n\nEvaluated on German MLQA: test-context-de-question-de.json\n \"exact\": 33.67279167589108\n \"f1\": 44.34437105434842\n \"total\": 4517\n\nEvaluated on German XQuAD: xquad.de.json\n\"exact\": 48.739495798319325\n \"f1\": 62.552615701071495\n \"total\": 1190\n\n\n## Usage\n\n### In Transformers\n```python\nfrom transformers.pipelines import pipeline\nfrom transformers.modeling_auto import AutoModelForQuestionAnswering\nfrom transformers.tokenization_auto import AutoTokenizer\n\nmodel_name = \"deepset/xlm-roberta-base-squad2\"\n\n# a) Get predictions\nnlp = pipeline('question-answering', model=model_name, tokenizer=model_name)\nQA_input = {\n 'question': 'Why is model conversion important?',\n 'context': 'The option to convert models between FARM and transformers gives freedom to the user and let people easily switch between frameworks.'\n}\nres = nlp(QA_input)\n\n# b) Load model & tokenizer\nmodel = AutoModelForQuestionAnswering.from_pretrained(model_name)\ntokenizer = AutoTokenizer.from_pretrained(model_name)\n```\n\n### In FARM\n\n```python\nfrom farm.modeling.adaptive_model import AdaptiveModel\nfrom farm.modeling.tokenization import Tokenizer\nfrom farm.infer import Inferencer\n\nmodel_name = \"deepset/xlm-roberta-base-squad2\"\n\n# a) Get predictions\nnlp = Inferencer.load(model_name, task_type=\"question_answering\")\nQA_input = [{\"questions\": [\"Why is model conversion important?\"],\n \"text\": \"The option to convert models between FARM and transformers gives freedom to the user and let people easily switch between frameworks.\"}]\nres = nlp.inference_from_dicts(dicts=QA_input, rest_api_schema=True)\n\n# b) Load model & tokenizer\nmodel = AdaptiveModel.convert_from_transformers(model_name, device=\"cpu\", task_type=\"question_answering\")\ntokenizer = Tokenizer.load(model_name)\n```\n\n### In haystack\nFor doing QA at scale (i.e. many docs instead of single paragraph), you can load the model also in [haystack](https://github.com/deepset-ai/haystack/):\n```python\nreader = FARMReader(model_name_or_path=\"deepset/xlm-roberta-base-squad2\")\n# or \nreader = TransformersReader(model=\"deepset/roberta-base-squad2\",tokenizer=\"deepset/xlm-roberta-base-squad2\")\n```\n\n\n## Authors\nBranden Chan: `branden.chan [at] deepset.ai`\nTimo M\u00f6ller: `timo.moeller [at] deepset.ai`\nMalte Pietsch: `malte.pietsch [at] deepset.ai`\nTanay Soni: `tanay.soni [at] deepset.ai`\n\n## About us\n![deepset logo](https://workablehr.s3.amazonaws.com/uploads/account/logo/476306/logo)\n\nWe bring NLP to the industry via open source! \nOur focus: Industry specific language models & large scale QA systems. \n \nSome of our work: \n- [German BERT (aka \"bert-base-german-cased\")](https://deepset.ai/german-bert)\n- [GermanQuAD and GermanDPR datasets and models (aka \"gelectra-base-germanquad\", \"gbert-base-germandpr\")](https://deepset.ai/germanquad)\n- [FARM](https://github.com/deepset-ai/FARM)\n- [Haystack](https://github.com/deepset-ai/haystack/)\n\nGet in touch:\n[Twitter](https://twitter.com/deepset_ai) | [LinkedIn](https://www.linkedin.com/company/deepset-ai/) | [Discord](https://haystack.deepset.ai/community/join) | [GitHub Discussions](https://github.com/deepset-ai/haystack/discussions) | [Website](https://deepset.ai)\n\nBy the way: [we're hiring!](http://www.deepset.ai/jobs)\n"} {"downloads": 1520, "id": "AlexKay/xlm-roberta-large-qa-multilingual-finedtuned-ru", "likes": 16, "pipeline_tag": "question-answering", "task": "question-answering", "meta": {"language": ["en", "ru", "multilingual"], "license": "apache-2.0"}, "description": "\n# XLM-RoBERTa large model whole word masking finetuned on SQuAD\nPretrained model using a masked language modeling (MLM) objective. \nFine tuned on English and Russian QA datasets\n\n## Used QA Datasets\nSQuAD + SberQuAD\n\n[SberQuAD original paper](https://arxiv.org/pdf/1912.09723.pdf) is here! Recommend to read!\n\n## Evaluation results\nThe results obtained are the following (SberQUaD):\n```\nf1 = 84.3\nexact_match = 65.3\n"} {"downloads": 187016, "id": "deepset/bert-large-uncased-whole-word-masking-squad2", "likes": 16, "pipeline_tag": "question-answering", "task": "question-answering", "meta": {"language": "en", "license": "cc-by-4.0", "datasets": ["squad_v2"], "model-index": [{"name": "deepset/bert-large-uncased-whole-word-masking-squad2", "results": [{"task": {"type": "question-answering", "name": "Question Answering"}, "dataset": {"name": "squad_v2", "type": "squad_v2", "config": "squad_v2", "split": "validation"}, "metrics": [{"type": "exact_match", "value": 80.8846, "name": "Exact Match", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiY2E5ZGNkY2ExZWViZGEwNWE3OGRmMWM2ZmE4ZDU4ZDQ1OGM3ZWE0NTVmZjFmYmZjZmJmNjJmYTc3NTM3OTk3OSIsInZlcnNpb24iOjF9.aSblF4ywh1fnHHrN6UGL392R5KLaH3FCKQlpiXo_EdQ4XXEAENUCjYm9HWDiFsgfSENL35GkbSyz_GAhnefsAQ"}, {"type": "f1", "value": 83.8765, "name": "F1", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiNGFlNmEzMTk2NjRkNTI3ZTk3ZTU1NWNlYzIyN2E0ZDFlNDA2ZjYwZWJlNThkMmRmMmE0YzcwYjIyZDM5NmRiMCIsInZlcnNpb24iOjF9.-rc2_Bsp_B26-o12MFYuAU0Ad2Hg9PDx7Preuk27WlhYJDeKeEr32CW8LLANQABR3Mhw2x8uTYkEUrSDMxxLBw"}]}]}]}, "description": "\n\n# bert-large-uncased-whole-word-masking-squad2\n\nThis is a berta-large model, fine-tuned using the SQuAD2.0 dataset for the task of question answering.\n\n## Overview\n**Language model:** bert-large \n**Language:** English \n**Downstream-task:** Extractive QA \n**Training data:** SQuAD 2.0 \n**Eval data:** SQuAD 2.0 \n**Code:** See [an example QA pipeline on Haystack](https://haystack.deepset.ai/tutorials/first-qa-system) \n\n## Usage\n\n### In Haystack\nHaystack is an NLP framework by deepset. You can use this model in a Haystack pipeline to do question answering at scale (over many documents). To load the model in [Haystack](https://github.com/deepset-ai/haystack/):\n```python\nreader = FARMReader(model_name_or_path=\"deepset/bert-large-uncased-whole-word-masking-squad2\")\n# or \nreader = TransformersReader(model_name_or_path=\"FILL\",tokenizer=\"deepset/bert-large-uncased-whole-word-masking-squad2\")\n```\n\n### In Transformers\n```python\nfrom transformers import AutoModelForQuestionAnswering, AutoTokenizer, pipeline\n\nmodel_name = \"deepset/bert-large-uncased-whole-word-masking-squad2\"\n\n# a) Get predictions\nnlp = pipeline('question-answering', model=model_name, tokenizer=model_name)\nQA_input = {\n 'question': 'Why is model conversion important?',\n 'context': 'The option to convert models between FARM and transformers gives freedom to the user and let people easily switch between frameworks.'\n}\nres = nlp(QA_input)\n\n# b) Load model & tokenizer\nmodel = AutoModelForQuestionAnswering.from_pretrained(model_name)\ntokenizer = AutoTokenizer.from_pretrained(model_name)\n```\n\n## About us\n
\n
\n \"\"\n
\n
\n \"\"\n
\n
\n\n[deepset](http://deepset.ai/) is the company behind the open-source NLP framework [Haystack](https://haystack.deepset.ai/) which is designed to help you build production ready NLP systems that use: Question answering, summarization, ranking etc.\n\n\nSome of our other work: \n- [Distilled roberta-base-squad2 (aka \"tinyroberta-squad2\")]([https://huggingface.co/deepset/tinyroberta-squad2)\n- [German BERT (aka \"bert-base-german-cased\")](https://deepset.ai/german-bert)\n- [GermanQuAD and GermanDPR datasets and models (aka \"gelectra-base-germanquad\", \"gbert-base-germandpr\")](https://deepset.ai/germanquad)\n\n## Get in touch and join the Haystack community\n\n

For more info on Haystack, visit our GitHub repo and Documentation. \n\nWe also have a Discord community open to everyone!

\n\n[Twitter](https://twitter.com/deepset_ai) | [LinkedIn](https://www.linkedin.com/company/deepset-ai/) | [Discord](https://haystack.deepset.ai/community/join) | [GitHub Discussions](https://github.com/deepset-ai/haystack/discussions) | [Website](https://deepset.ai)\n\nBy the way: [we're hiring!](http://www.deepset.ai/jobs)"} {"downloads": 1084, "id": "deutsche-telekom/bert-multi-english-german-squad2", "likes": 16, "pipeline_tag": "question-answering", "task": "question-answering", "meta": {"language": ["de", "en", "multilingual"], "license": "mit", "tags": ["english", "german"]}, "description": "\n\n# Bilingual English + German SQuAD2.0\n\nWe created German Squad 2.0 (**deQuAD 2.0**) and merged with [**SQuAD2.0**](https://rajpurkar.github.io/SQuAD-explorer/) into an English and German training data for question answering. The [**bert-base-multilingual-cased**](https://github.com/google-research/bert/blob/master/multilingual.md) is used to fine-tune bilingual QA downstream task.\n\n## Details of deQuAD 2.0\n[**SQuAD2.0**](https://rajpurkar.github.io/SQuAD-explorer/) was auto-translated into German. We hired professional editors to proofread the translated transcripts, correct mistakes and double check the answers to further polish the text and enhance annotation quality. The final German deQuAD dataset contains **130k** training and **11k** test samples.\n\n## Overview\n- **Language model:** bert-base-multilingual-cased \n- **Language:** German, English \n- **Training data:** deQuAD2.0 + SQuAD2.0 training set \n- **Evaluation data:** SQuAD2.0 test set; deQuAD2.0 test set\n- **Infrastructure:** 8xV100 GPU \n- **Published**: July 9th, 2021\n\n## Evaluation on English SQuAD2.0 \n\n```\nHasAns_exact = 85.79622132253711\nHasAns_f1 = 90.92004586077663\nHasAns_total = 5928\nNoAns_exact = 94.76871320437343\nNoAns_f1 = 94.76871320437343\nNoAns_total = 5945\nexact = 90.28889076054915\nf1 = 92.84713483219753\ntotal = 11873\n```\n## Evaluation on German deQuAD2.0 \n\n```\nHasAns_exact = 63.80526406330638\nHasAns_f1 = 72.47269140789888\nHasAns_total = 5813\nNoAns_exact = 82.0291893792861\nNoAns_f1 = 82.0291893792861\nNoAns_total = 5687\nexact = 72.81739130434782\nf1 = 77.19858740470603\ntotal = 11500\n```\n## Use Model in Pipeline\n\n\n```python\nfrom transformers import pipeline\n\nqa_pipeline = pipeline(\n \"question-answering\",\n model=\"deutsche-telekom/bert-multi-english-german-squad2\",\n tokenizer=\"deutsche-telekom/bert-multi-english-german-squad2\"\n)\n\ncontexts = [\"Die Allianz Arena ist ein Fu\u00dfballstadion im Norden von M\u00fcnchen und bietet bei Bundesligaspielen 75.021 Pl\u00e4tze, zusammengesetzt aus 57.343 Sitzpl\u00e4tzen, 13.794 Stehpl\u00e4tzen, 1.374 Logenpl\u00e4tzen, 2.152 Business Seats und 966 Sponsorenpl\u00e4tzen. In der Allianz Arena bestreitet der FC Bayern M\u00fcnchen seit der Saison 2005/06 seine Heimspiele. Bis zum Saisonende 2017 war die Allianz Arena auch Spielst\u00e4tte des TSV 1860 M\u00fcnchen.\",\n \"Harvard is a large, highly residential research university. It operates several arts, cultural, and scientific museums, alongside the Harvard Library, which is the world's largest academic and private library system, comprising 79 individual libraries with over 18 million volumes. \"]\nquestions = [\"Wo befindet sich die Allianz Arena?\", \n \"What is the worlds largest academic and private library system?\"]\n \nqa_pipeline(context=contexts, question=questions)\n\n```\n\n# Output:\n\n```json\n[{'score': 0.7290093898773193,\n 'start': 44,\n 'end': 62,\n 'answer': 'Norden von M\u00fcnchen'},\n {'score': 0.7979822754859924,\n 'start': 134,\n 'end': 149,\n 'answer': 'Harvard Library'}]\n```\n## License - The MIT License\nCopyright (c) 2021 Fang Xu, Deutsche Telekom AG \n"} {"downloads": 7956, "id": "etalab-ia/camembert-base-squadFR-fquad-piaf", "likes": 16, "pipeline_tag": "question-answering", "task": "question-answering", "meta": {"language": "fr", "datasets": ["piaf", "FQuAD", "SQuAD-FR"], "widget": [{"text": "Comment s'appelle le portail open data du gouvernement ?", "context": "Etalab est une administration publique fran\u00e7aise qui fait notamment office de Chief Data Officer de l'\u00c9tat et coordonne la conception et la mise en \u0153uvre de sa strat\u00e9gie dans le domaine de la donn\u00e9e (ouverture et partage des donn\u00e9es publiques ou open data, exploitation des donn\u00e9es et intelligence artificielle...). Ainsi, Etalab d\u00e9veloppe et maintient le portail des donn\u00e9es ouvertes du gouvernement fran\u00e7ais data.gouv.fr. Etalab promeut \u00e9galement une plus grande ouverture l'administration sur la soci\u00e9t\u00e9 (gouvernement ouvert) : transparence de l'action publique, innovation ouverte, participation citoyenne... elle promeut l\u2019innovation, l\u2019exp\u00e9rimentation, les m\u00e9thodes de travail ouvertes, agiles et it\u00e9ratives, ainsi que les synergies avec la soci\u00e9t\u00e9 civile pour d\u00e9cloisonner l\u2019administration et favoriser l\u2019adoption des meilleures pratiques professionnelles dans le domaine du num\u00e9rique. \u00c0 ce titre elle \u00e9tudie notamment l\u2019opportunit\u00e9 de recourir \u00e0 des technologies en voie de maturation issues du monde de la recherche. Cette entit\u00e9 charg\u00e9e de l'innovation au sein de l'administration doit contribuer \u00e0 l'am\u00e9lioration du service public gr\u00e2ce au num\u00e9rique. Elle est rattach\u00e9e \u00e0 la Direction interminist\u00e9rielle du num\u00e9rique, dont les missions et l\u2019organisation ont \u00e9t\u00e9 fix\u00e9es par le d\u00e9cret du 30 octobre 2019.\u2009 Dirig\u00e9 par Laure Lucchesi depuis 2016, elle rassemble une \u00e9quipe pluridisciplinaire d'une trentaine de personnes."}]}, "description": "\n\n# camembert-base-squadFR-fquad-piaf\n\n## Description\n\nQuestion-answering French model, using base [CamemBERT](https://camembert-model.fr/) fine-tuned on a combo of three French Q&A datasets:\n\n1. [PIAFv1.1](https://www.data.gouv.fr/en/datasets/piaf-le-dataset-francophone-de-questions-reponses/)\n2. [FQuADv1.0](https://fquad.illuin.tech/)\n3. [SQuAD-FR (SQuAD automatically translated to French)](https://github.com/Alikabbadj/French-SQuAD)\n\n## Training hyperparameters\n\n```shell\npython run_squad.py \\\n--model_type camembert \\\n--model_name_or_path camembert-base \\\n--do_train --do_eval \\\n--train_file data/SQuAD+fquad+piaf.json \\\n--predict_file data/fquad_valid.json \\\n--per_gpu_train_batch_size 12 \\ \n--learning_rate 3e-5 \\ \n--num_train_epochs 4 \\ \n--max_seq_length 384 \\ \n--doc_stride 128 \\\n--save_steps 10000 \n``` \n\n## Evaluation results\n### FQuAD v1.0 Evaluation\n```shell\n{\"f1\": 79.81, \"exact_match\": 55.14}\n```\n### SQuAD-FR Evaluation\n```shell\n{\"f1\": 80.61, \"exact_match\": 59.54}\n```\n\n## Usage\n\n```python\nfrom transformers import pipeline\n\nnlp = pipeline('question-answering', model='etalab-ia/camembert-base-squadFR-fquad-piaf', tokenizer='etalab-ia/camembert-base-squadFR-fquad-piaf')\n\nnlp({\n 'question': \"Qui est Claude Monet?\",\n 'context': \"Claude Monet, n\u00e9 le 14 novembre 1840 \u00e0 Paris et mort le 5 d\u00e9cembre 1926 \u00e0 Giverny, est un peintre fran\u00e7ais et l\u2019un des fondateurs de l'impressionnisme.\"\n})\n```\n## Acknowledgments\n\nThis work was performed using HPC resources from GENCI\u2013IDRIS (Grant 2020-AD011011224). \n\n## Citations\n\n### PIAF\n```\n@inproceedings{KeraronLBAMSSS20,\n author = {Rachel Keraron and\n Guillaume Lancrenon and\n Mathilde Bras and\n Fr{\\'{e}}d{\\'{e}}ric Allary and\n Gilles Moyse and\n Thomas Scialom and\n Edmundo{-}Pavel Soriano{-}Morales and\n Jacopo Staiano},\n title = {Project {PIAF:} Building a Native French Question-Answering Dataset},\n booktitle = {{LREC}},\n pages = {5481--5490},\n publisher = {European Language Resources Association},\n year = {2020}\n}\n\n```\n\n### FQuAD\n```\n@article{dHoffschmidt2020FQuADFQ,\n title={FQuAD: French Question Answering Dataset},\n author={Martin d'Hoffschmidt and Maxime Vidal and Wacim Belblidia and Tom Brendl'e and Quentin Heinrich},\n journal={ArXiv},\n year={2020},\n volume={abs/2002.06071}\n}\n```\n\n### SQuAD-FR\n```\n @MISC{kabbadj2018,\n author = \"Kabbadj, Ali\",\n title = \"Something new in French Text Mining and Information Extraction (Universal Chatbot): Largest Q&A French training dataset (110 000+) \",\n editor = \"linkedin.com\",\n month = \"November\",\n year = \"2018\",\n url = \"\\url{https://www.linkedin.com/pulse/something-new-french-text-mining-information-chatbot-largest-kabbadj/}\",\n note = \"[Online; posted 11-November-2018]\",\n }\n ```\n\n### CamemBERT\nHF model card : [https://huggingface.co/camembert-base](https://huggingface.co/camembert-base)\n\n```\n@inproceedings{martin2020camembert,\n title={CamemBERT: a Tasty French Language Model},\n author={Martin, Louis and Muller, Benjamin and Su{\\'a}rez, Pedro Javier Ortiz and Dupont, Yoann and Romary, Laurent and de la Clergerie, {\\'E}ric Villemonte and Seddah, Djam{\\'e} and Sagot, Beno{\\^\\i}t},\n booktitle={Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics},\n year={2020}\n}\n```\n\n"} {"downloads": 922, "id": "IDEA-CCNL/Randeng-T5-784M-QA-Chinese", "likes": 16, "pipeline_tag": "question-answering", "task": "question-answering", "meta": {"language": ["zh"], "tags": ["question-answering", "text-generation"], "pipeline-tag": ["text-generation"], "metrics": ["RougeL", "BLEU-4", "F1", "EM", "Contain Answer Rate"], "widget": [{"text": "question:\u7f8e\u56fd\u5efa\u7b51\u5e08\u662f\u600e\u6837\u521b\u9020\u7ef4\u591a\u5229\u4e9a\u54e5\u7279\u5f0f\u5efa\u7b51\u7684?", "context": "knowledge:\u5e95\u7279\u5f8b\u5723\u4fdd\u7f57\u5ea7\u5802(Cathedral Church of St. Paul)\u662f\u7f8e\u56fd\u5723\u516c\u4f1a\u5bc6\u6b47\u6839\u6559\u533a\u7684\u4e3b\u6559\u5ea7\u5802,\u4f4d\u4e8e\u5e95\u7279\u5f8b\u4f0d\u5fb7\u6c83\u5fb7\u5927\u90534800\u53f7,\u6bd7\u90bb\u97e6\u6069\u5dde\u7acb\u5927\u5b66\u6821\u56ed\u3002\u5723\u4fdd\u7f57\u5802\u533a\u6210\u7acb\u4e8e1824\u5e74,\u662f\u5bc6\u6b47\u6839\u7b2c\u4e00\u4e2a\u65b0\u6559\u5802\u4f1a\u3002\u73b0\u5b58\u5efa\u7b51\u7531\u8457\u540d\u6559\u5802\u8bbe\u8ba1\u5e08\u62c9\u5c14\u592b\u00b7\u514b\u62c9\u59c6(Ralph Adams Cram),\u59cb\u5efa\u4e8e1907\u5e74,\u81f3\u4eca\u949f\u697c\u5c1a\u672a\u5b8c\u6210\u3002\u6559\u5802\u5b8c\u5168\u7528\u77f3\u7070\u5ca9\u548c\u4e2d\u4e16\u7eaa\u5efa\u7b51\u6280\u672f\u5efa\u9020,\u6ca1\u6709\u652f\u6301\u7684\u94a2\u94c1\u4e0a\u5c42\u5efa\u7b51\u3002\u5efa\u8bbe\u62e5\u6709\u4ea4\u9519\u9aa8,\u5927\u7247\u82b1\u7a97\u73bb\u7483,\u96d5\u9970\u7a97\u683c,\u54e5\u7279\u5f0f\u5efa\u7b51\u7684\u6977\u6a21,\u5305\u62ecPewabic \u9676\u74f7\u4e2d\u5fc3\u3002\u57281912\u5e74\u6210\u4e3a\u6559\u533a\u7684\u4e3b\u6559\u5ea7\u5802\u3002\u5723\u4fdd\u7f57\u5ea7\u5802\u662f20\u4e16\u7eaa\u521d\u540e\u671f\u54e5\u7279\u590d\u5174\u5efa\u7b51\u7684\u6700\u4f73\u5b9e\u4f8b\u4e4b\u4e00\u300219\u4e16\u7eaa\u4e2d\u53f6\u7684\u7f8e\u56fd\u5efa\u7b51\u5e08\u8f93\u5165\u5e76\u91cd\u65b0\u9610\u91ca\u4e86\u82f1\u56fd\u54e5\u7279\u590d\u5174\u98ce\u683c,\u57fa\u4e8e\u4e2d\u4e16\u7eaa\u4e3b\u6559\u5ea7\u5802\u7684\u89c6\u89c9\u4e30\u5bcc\u7684\u7ec6\u8282\u3002\u7f8e\u56fd\u5efa\u7b51\u5e08\u5c06\u54e5\u7279\u5143\u7d20\u4e0e\u7b80\u5355\u7684\u5efa\u7b51\u89c4\u5212\u76f8\u7ed3\u5408,\u521b\u9020\u4e86\u7f8e\u56fd\u5efa\u7b51\u98ce\u683c\u201c\u7ef4\u591a\u5229\u4e9a\u54e5\u7279\u5f0f\u201d(Victorian Gothic)\u3002\u5174\u5efa\u4e8e1876\u5e74\u7684\u5821\u5792\u8857\u957f\u8001\u4f1a\u6559\u5802\u5c31\u662f\u65e9\u671f\u7ef4\u591a\u5229\u4e9a\u54e5\u7279\u5f0f\u5efa\u7b51\u7684\u6770\u51fa\u4f8b\u8bc1\u3002answer:", "example_title": "\u5c06\u54e5\u7279\u5143\u7d20\u4e0e\u7b80\u5355\u7684\u5efa\u7b51\u89c4\u5212\u76f8\u7ed3\u5408"}], "licence": "apache-2.0"}, "description": "\n# Randeng-T5-784M-QA-Chinese\nT5 for Chinese Question Answering\n- Github: [finetune and predict codes in Fengshenbang-LM](https://github.com/IDEA-CCNL/Fengshenbang-LM/tree/main/fengshen/examples/qa_t5)\n- Docs: [Fengshenbang-Docs](https://fengshenbang-doc.readthedocs.io/)\n\n\n## \u7b80\u4ecb Brief Introduction\nThis T5-Large model, is the first pretrained generative question answering model for Chinese in huggingface. It was pretrained on the Wudao 180G corpus, and finetuned on Chinese SQuAD and CMRC2018 dataset. It can produce a fluent and accurate answer given a passage and question.\n\n\u8fd9\u662fhuggingface\u4e0a\u9996\u4e2a\u4e2d\u6587\u7684\u751f\u6210\u5f0f\u95ee\u7b54\u6a21\u578b\u3002\u5b83\u57fa\u4e8eT5-Large\u7ed3\u6784\uff0c\u4f7f\u7528\u609f\u9053180G\u8bed\u6599\u5728[\u5c01\u795e\u6846\u67b6](https://github.com/IDEA-CCNL/Fengshenbang-LM/tree/main/fengshen)\u8fdb\u884c\u9884\u8bad\u7ec3\uff0c\u5728\u7ffb\u8bd1\u7684\u4e2d\u6587SQuAD\u548cCMRC2018\u4e24\u4e2a\u9605\u8bfb\u7406\u89e3\u6570\u636e\u96c6\u4e0a\u8fdb\u884c\u5fae\u8c03\u3002\u8f93\u5165\u4e00\u7bc7\u6587\u7ae0\u548c\u4e00\u4e2a\u95ee\u9898\uff0c\u53ef\u4ee5\u751f\u6210\u51c6\u786e\u6d41\u7545\u7684\u56de\u7b54\u3002\n\n## \u6a21\u578b\u7c7b\u522b Model Taxonomy\n\n| \u9700\u6c42 Demand | \u4efb\u52a1 Task | \u7cfb\u5217 Series | \u6a21\u578b Model | \u53c2\u6570 Parameter | \u989d\u5916 Extra |\n| :"} {"downloads": 3156, "id": "pierreguillou/bert-base-cased-squad-v1.1-portuguese", "likes": 15, "pipeline_tag": "question-answering", "task": "question-answering", "meta": {"language": "pt", "license": "mit", "tags": ["question-answering", "bert", "bert-base", "pytorch"], "datasets": ["brWaC", "squad", "squad_v1_pt"], "metrics": ["squad"], "widget": [{"text": "Quando come\u00e7ou a pandemia de Covid-19 no mundo?", "context": "A pandemia de COVID-19, tamb\u00e9m conhecida como pandemia de coronav\u00edrus, \u00e9 uma pandemia em curso de COVID-19, uma doen\u00e7a respirat\u00f3ria aguda causada pelo coronav\u00edrus da s\u00edndrome respirat\u00f3ria aguda grave 2 (SARS-CoV-2). A doen\u00e7a foi identificada pela primeira vez em Wuhan, na prov\u00edncia de Hubei, Rep\u00fablica Popular da China, em 1 de dezembro de 2019, mas o primeiro caso foi reportado em 31 de dezembro do mesmo ano."}, {"text": "Onde foi descoberta a Covid-19?", "context": "A pandemia de COVID-19, tamb\u00e9m conhecida como pandemia de coronav\u00edrus, \u00e9 uma pandemia em curso de COVID-19, uma doen\u00e7a respirat\u00f3ria aguda causada pelo coronav\u00edrus da s\u00edndrome respirat\u00f3ria aguda grave 2 (SARS-CoV-2). A doen\u00e7a foi identificada pela primeira vez em Wuhan, na prov\u00edncia de Hubei, Rep\u00fablica Popular da China, em 1 de dezembro de 2019, mas o primeiro caso foi reportado em 31 de dezembro do mesmo ano."}]}, "description": "\n\n# Portuguese BERT base cased QA (Question Answering), finetuned on SQUAD v1.1\n\n![Exemple of what can do the Portuguese BERT base cased QA (Question Answering), finetuned on SQUAD v1.1](https://miro.medium.com/max/2000/1*te5MmdesAHCmg4KmK8zD3g.png)\n\n## Introduction\n\nThe model was trained on the dataset SQUAD v1.1 in portuguese from the [Deep Learning Brasil group](http://www.deeplearningbrasil.com.br/) on Google Colab. \n\nThe language model used is the [BERTimbau Base](https://huggingface.co/neuralmind/bert-base-portuguese-cased) (aka \"bert-base-portuguese-cased\") from [Neuralmind.ai](https://neuralmind.ai/): BERTimbau Base is a pretrained BERT model for Brazilian Portuguese that achieves state-of-the-art performances on three downstream NLP tasks: Named Entity Recognition, Sentence Textual Similarity and Recognizing Textual Entailment. It is available in two sizes: Base and Large.\n\n## Informations on the method used\n\nAll the informations are in the blog post : [NLP | Modelo de Question Answering em qualquer idioma baseado no BERT base (estudo de caso em portugu\u00eas)](https://medium.com/@pierre_guillou/nlp-modelo-de-question-answering-em-qualquer-idioma-baseado-no-bert-base-estudo-de-caso-em-12093d385e78)\n\n## Notebooks in Google Colab & GitHub\n\n- Google Colab: [colab_question_answering_BERT_base_cased_squad_v11_pt.ipynb](https://colab.research.google.com/drive/18ueLdi_V321Gz37x4gHq8mb4XZSGWfZx?usp=sharing)\n- GitHub: [colab_question_answering_BERT_base_cased_squad_v11_pt.ipynb](https://github.com/piegu/language-models/blob/master/colab_question_answering_BERT_base_cased_squad_v11_pt.ipynb)\n\n## Performance\n\nThe results obtained are the following:\n\n```\nf1 = 82.50\nexact match = 70.49\n```\n\n## How to use the model... with Pipeline\n\n```python\nimport transformers\nfrom transformers import pipeline\n\n# source: https://pt.wikipedia.org/wiki/Pandemia_de_COVID-19\ncontext = r\"\"\"\nA pandemia de COVID-19, tamb\u00e9m conhecida como pandemia de coronav\u00edrus, \u00e9 uma pandemia em curso de COVID-19, \numa doen\u00e7a respirat\u00f3ria aguda causada pelo coronav\u00edrus da s\u00edndrome respirat\u00f3ria aguda grave 2 (SARS-CoV-2). \nA doen\u00e7a foi identificada pela primeira vez em Wuhan, na prov\u00edncia de Hubei, Rep\u00fablica Popular da China, \nem 1 de dezembro de 2019, mas o primeiro caso foi reportado em 31 de dezembro do mesmo ano. \nAcredita-se que o v\u00edrus tenha uma origem zoon\u00f3tica, porque os primeiros casos confirmados \ntinham principalmente liga\u00e7\u00f5es ao Mercado Atacadista de Frutos do Mar de Huanan, que tamb\u00e9m vendia animais vivos. \nEm 11 de mar\u00e7o de 2020, a Organiza\u00e7\u00e3o Mundial da Sa\u00fade declarou o surto uma pandemia. At\u00e9 8 de fevereiro de 2021, \npelo menos 105 743 102 casos da doen\u00e7a foram confirmados em pelo menos 191 pa\u00edses e territ\u00f3rios, \ncom cerca de 2 308 943 mortes e 58 851 440 pessoas curadas.\n\"\"\"\n\nmodel_name = 'pierreguillou/bert-base-cased-squad-v1.1-portuguese'\nnlp = pipeline(\"question-answering\", model=model_name)\n\nquestion = \"Quando come\u00e7ou a pandemia de Covid-19 no mundo?\"\n\nresult = nlp(question=question, context=context)\n\nprint(f\"Answer: '{result['answer']}', score: {round(result['score'], 4)}, start: {result['start']}, end: {result['end']}\")\n\n# Answer: '1 de dezembro de 2019', score: 0.713, start: 328, end: 349\n```\n\n## How to use the model... with the Auto classes\n\n```python\nfrom transformers import AutoTokenizer, AutoModelForQuestionAnswering\n \ntokenizer = AutoTokenizer.from_pretrained(\"pierreguillou/bert-base-cased-squad-v1.1-portuguese\")\nmodel = AutoModelForQuestionAnswering.from_pretrained(\"pierreguillou/bert-base-cased-squad-v1.1-portuguese\")\n``` \n\nOr just clone the model repo:\n\n```python\ngit lfs install\ngit clone https://huggingface.co/pierreguillou/bert-base-cased-squad-v1.1-portuguese\n \n# if you want to clone without large files \u2013 just their pointers\n# prepend your git clone with the following env var:\n \nGIT_LFS_SKIP_SMUDGE=1\n``` \n\n## Limitations and bias\n\nThe training data used for this model come from Portuguese SQUAD. It could contain a lot of unfiltered content, which is far from neutral, and biases.\n\n## Author\n\nPortuguese BERT base cased QA (Question Answering), finetuned on SQUAD v1.1 was trained and evaluated by [Pierre GUILLOU](https://www.linkedin.com/in/pierreguillou/) thanks to the Open Source code, platforms and advices of many organizations ([link to the list](https://medium.com/@pierre_guillou/nlp-modelo-de-question-answering-em-qualquer-idioma-baseado-no-bert-base-estudo-de-caso-em-12093d385e78#c572)). In particular: [Hugging Face](https://huggingface.co/), [Neuralmind.ai](https://neuralmind.ai/), [Deep Learning Brasil group](http://www.deeplearningbrasil.com.br/), [Google Colab](https://colab.research.google.com/) and [AI Lab](https://ailab.unb.br/).\n\n## Citation\nIf you use our work, please cite:\n\n```bibtex\n@inproceedings{pierreguillou2021bertbasecasedsquadv11portuguese,\n title={Portuguese BERT base cased QA (Question Answering), finetuned on SQUAD v1.1},\n author={Pierre Guillou},\n year={2021}\n}\n```"} {"downloads": 50492, "id": "deepset/bert-base-cased-squad2", "likes": 14, "pipeline_tag": "question-answering", "task": "question-answering", "meta": {"language": "en", "license": "cc-by-4.0", "datasets": ["squad_v2"], "model-index": [{"name": "deepset/bert-base-cased-squad2", "results": [{"task": {"type": "question-answering", "name": "Question Answering"}, "dataset": {"name": "squad_v2", "type": "squad_v2", "config": "squad_v2", "split": "validation"}, "metrics": [{"type": "exact_match", "value": 71.1517, "name": "Exact Match", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiZGZlNmQ1YzIzMWUzNTg4YmI4NWVhYThiMzE2ZGZmNWUzNDM3NWI0ZGJkNzliNGUxNTY2MDA5MWVkYjAwYWZiMCIsInZlcnNpb24iOjF9.iUvVdy5c4hoXkwlThJankQqG9QXzNilvfF1_4P0oL8X-jkY5Q6YSsZx6G6cpgXogqFpn7JlE_lP6_OT0VIamCg"}, {"type": "f1", "value": 74.6714, "name": "F1", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiMWE5OGNjODhmY2Y0NWIyZDIzMmQ2NmRjZGYyYTYzOWMxZDUzYzg4YjBhNTRiNTY4NTc0M2IxNjI5NWI5ZDM0NCIsInZlcnNpb24iOjF9.IqU9rbzUcKmDEoLkwCUZTKSH0ZFhtqgnhOaEDKKnaRMGBJLj98D5V4VirYT6jLh8FlR0FiwvMTMjReBcfTisAQ"}]}]}]}, "description": "\n\nThis is a BERT base cased model trained on SQuAD v2"} {"downloads": 7773, "id": "deepset/gelectra-large-germanquad", "likes": 14, "pipeline_tag": "question-answering", "task": "question-answering", "meta": {"language": "de", "datasets": ["deepset/germanquad"], "license": "mit", "thumbnail": "https://thumb.tildacdn.com/tild3433-3637-4830-a533-353833613061/-/resize/720x/-/format/webp/germanquad.jpg", "tags": ["exbert"]}, "description": "\n\n![bert_image](https://thumb.tildacdn.com/tild3433-3637-4830-a533-353833613061/-/resize/720x/-/format/webp/germanquad.jpg)\n\n## Overview\n**Language model:** gelectra-large-germanquad \n**Language:** German \n**Training data:** GermanQuAD train set (~ 12MB) \n**Eval data:** GermanQuAD test set (~ 5MB) \n**Infrastructure**: 1x V100 GPU \n**Published**: Apr 21st, 2021\n\n## Details\n- We trained a German question answering model with a gelectra-large model as its basis.\n- The dataset is GermanQuAD, a new, German language dataset, which we hand-annotated and published [online](https://deepset.ai/germanquad).\n- The training dataset is one-way annotated and contains 11518 questions and 11518 answers, while the test dataset is three-way annotated so that there are 2204 questions and with 2204\u00b73\u221276 = 6536 answers, because we removed 76 wrong answers.\n\nSee https://deepset.ai/germanquad for more details and dataset download in SQuAD format.\n\n## Hyperparameters\n```\nbatch_size = 24\nn_epochs = 2\nmax_seq_len = 384\nlearning_rate = 3e-5\nlr_schedule = LinearWarmup\nembeds_dropout_prob = 0.1\n```\n## Performance\nWe evaluated the extractive question answering performance on our GermanQuAD test set.\nModel types and training data are included in the model name. \nFor finetuning XLM-Roberta, we use the English SQuAD v2.0 dataset.\nThe GELECTRA models are warm started on the German translation of SQuAD v1.1 and finetuned on [GermanQuAD](https://deepset.ai/germanquad). \nThe human baseline was computed for the 3-way test set by taking one answer as prediction and the other two as ground truth.\n![performancetable](https://images.prismic.io/deepset/1c63afd8-40e6-4fd9-85c4-0dbb81996183_german-qa-vs-xlm-r.png) \n\n## Authors\n **Timo M\u00f6ller:** timo.moeller@deepset.ai \n **Julian Risch:** julian.risch@deepset.ai \n **Malte Pietsch:** malte.pietsch@deepset.ai \n \n## About us\n
\n
\n \"\"\n
\n
\n \"\"\n
\n
\n\n[deepset](http://deepset.ai/) is the company behind the open-source NLP framework [Haystack](https://haystack.deepset.ai/) which is designed to help you build production ready NLP systems that use: Question answering, summarization, ranking etc.\n\n\nSome of our other work: \n- [Distilled roberta-base-squad2 (aka \"tinyroberta-squad2\")]([https://huggingface.co/deepset/tinyroberta-squad2)\n- [German BERT (aka \"bert-base-german-cased\")](https://deepset.ai/german-bert)\n- [GermanQuAD and GermanDPR datasets and models (aka \"gelectra-base-germanquad\", \"gbert-base-germandpr\")](https://deepset.ai/germanquad)\n\n## Get in touch and join the Haystack community\n\n

For more info on Haystack, visit our GitHub repo and Documentation. \n\nWe also have a Discord community open to everyone!

\n\n[Twitter](https://twitter.com/deepset_ai) | [LinkedIn](https://www.linkedin.com/company/deepset-ai/) | [Discord](https://haystack.deepset.ai/community/join) | [GitHub Discussions](https://github.com/deepset-ai/haystack/discussions) | [Website](https://deepset.ai)\n\nBy the way: [we're hiring!](http://www.deepset.ai/jobs) \n"} {"downloads": 31492, "id": "deepset/roberta-large-squad2", "likes": 14, "pipeline_tag": "question-answering", "task": "question-answering", "meta": {"language": "en", "datasets": ["squad_v2"], "license": "cc-by-4.0"}, "description": ""} {"downloads": 668, "id": "luhua/chinese_pretrain_mrc_macbert_large", "likes": 14, "pipeline_tag": "question-answering", "task": "question-answering", "meta": {"language": ["zh"], "license": "apache-2.0"}, "description": "\n\n## Chinese MRC macbert-large\n\n* \u4f7f\u7528\u5927\u91cf\u4e2d\u6587MRC\u6570\u636e\u8bad\u7ec3\u7684macbert-large\u6a21\u578b\uff0c\u8be6\u60c5\u53ef\u67e5\u770b\uff1ahttps://github.com/basketballandlearn/MRC_Competition_Dureader\n* \u6b64\u5e93\u53d1\u5e03\u7684\u518d\u8bad\u7ec3\u6a21\u578b\uff0c\u5728 \u9605\u8bfb\u7406\u89e3/\u5206\u7c7b \u7b49\u4efb\u52a1\u4e0a\u5747\u6709\u5927\u5e45\u63d0\u9ad8
\n\uff08\u5df2\u6709\u591a\u4f4d\u5c0f\u4f19\u4f34\u5728Dureader-2021\u7b49\u591a\u4e2a\u6bd4\u8d5b\u4e2d\u53d6\u5f97**top5**\u7684\u6210\u7ee9\ud83d\ude01\uff09\n\n| \u6a21\u578b/\u6570\u636e\u96c6 | Dureader-2021 | tencentmedical |\n| "} {"downloads": 8027, "id": "deepset/gelectra-base-germanquad", "likes": 13, "pipeline_tag": "question-answering", "task": "question-answering", "meta": {"language": "de", "datasets": ["deepset/germanquad"], "license": "mit", "thumbnail": "https://thumb.tildacdn.com/tild3433-3637-4830-a533-353833613061/-/resize/720x/-/format/webp/germanquad.jpg", "tags": ["exbert"]}, "description": "\n\n![bert_image](https://thumb.tildacdn.com/tild3433-3637-4830-a533-353833613061/-/resize/720x/-/format/webp/germanquad.jpg)\n\n## Overview\n**Language model:** gelectra-base-germanquad \n**Language:** German \n**Training data:** GermanQuAD train set (~ 12MB) \n**Eval data:** GermanQuAD test set (~ 5MB) \n**Infrastructure**: 1x V100 GPU \n**Published**: Apr 21st, 2021\n\n## Details\n- We trained a German question answering model with a gelectra-base model as its basis.\n- The dataset is GermanQuAD, a new, German language dataset, which we hand-annotated and published [online](https://deepset.ai/germanquad).\n- The training dataset is one-way annotated and contains 11518 questions and 11518 answers, while the test dataset is three-way annotated so that there are 2204 questions and with 2204\u00b73\u221276 = 6536answers, because we removed 76 wrong answers.\n\nSee https://deepset.ai/germanquad for more details and dataset download in SQuAD format.\n\n## Hyperparameters\n```\nbatch_size = 24\nn_epochs = 2\nmax_seq_len = 384\nlearning_rate = 3e-5\nlr_schedule = LinearWarmup\nembeds_dropout_prob = 0.1\n```\n## Performance\nWe evaluated the extractive question answering performance on our GermanQuAD test set.\nModel types and training data are included in the model name. \nFor finetuning XLM-Roberta, we use the English SQuAD v2.0 dataset.\nThe GELECTRA models are warm started on the German translation of SQuAD v1.1 and finetuned on [GermanQuAD](https://deepset.ai/germanquad).\nThe human baseline was computed for the 3-way test set by taking one answer as prediction and the other two as ground truth. \n![performancetable](https://images.prismic.io/deepset/1c63afd8-40e6-4fd9-85c4-0dbb81996183_german-qa-vs-xlm-r.png) \n\n## Authors\n**Timo M\u00f6ller:** timo.moeller@deepset.ai \n**Julian Risch:** julian.risch@deepset.ai \n**Malte Pietsch:** malte.pietsch@deepset.ai \n\n## About us\n
\n
\n \"\"\n
\n
\n \"\"\n
\n
\n\n[deepset](http://deepset.ai/) is the company behind the open-source NLP framework [Haystack](https://haystack.deepset.ai/) which is designed to help you build production ready NLP systems that use: Question answering, summarization, ranking etc.\n\n\nSome of our other work: \n- [Distilled roberta-base-squad2 (aka \"tinyroberta-squad2\")]([https://huggingface.co/deepset/tinyroberta-squad2)\n- [German BERT (aka \"bert-base-german-cased\")](https://deepset.ai/german-bert)\n- [GermanQuAD and GermanDPR datasets and models (aka \"gelectra-base-germanquad\", \"gbert-base-germandpr\")](https://deepset.ai/germanquad)\n\n## Get in touch and join the Haystack community\n\n

For more info on Haystack, visit our GitHub repo and Documentation. \n\nWe also have a Discord community open to everyone!

\n\n[Twitter](https://twitter.com/deepset_ai) | [LinkedIn](https://www.linkedin.com/company/deepset-ai/) | [Discord](https://haystack.deepset.ai/community) | [GitHub Discussions](https://github.com/deepset-ai/haystack/discussions) | [Website](https://deepset.ai)\n\nBy the way: [we're hiring!](http://www.deepset.ai/jobs)\n"} {"downloads": 101477, "id": "salti/bert-base-multilingual-cased-finetuned-squad", "likes": 11, "pipeline_tag": "question-answering", "task": "question-answering", "meta": {"language": ["multilingual"], "datasets": ["squad", "arcd", "xquad"]}, "description": "\n\n# Multilingual BERT fine-tuned on SQuADv1.1\n\n[**WandB run link**](https://wandb.ai/salti/mBERT_QA/runs/wkqzhrp2)\n\n**GPU**: Tesla P100-PCIE-16GB\n\n## Training Arguments\n\n```python\nmax_seq_length = 512\ndoc_stride = 256\nmax_answer_length = 64\nbacth_size = 16\ngradient_accumulation_steps = 2\nlearning_rate = 5e-5\nweight_decay = 3e-7\nnum_train_epochs = 3\nwarmup_ratio = 0.1\nfp16 = True\nfp16_opt_level = \"O1\"\nseed = 0\n```\n\n## Results\n\n| EM | F1 |\n| :"} {"downloads": 694, "id": "Intel/dynamic_tinybert", "likes": 9, "pipeline_tag": "question-answering", "task": "question-answering", "meta": {"tags": ["question-answering", "bert"], "license": "apache-2.0", "datasets": ["squad"], "language": ["en"], "model-index": [{"name": "dynamic-tinybert", "results": [{"task": {"type": "question-answering", "name": "question-answering"}, "metrics": [{"type": "f1", "value": 88.71}]}]}]}, "description": "\n\n## Model Details: Dynamic-TinyBERT: Boost TinyBERT's Inference Efficiency by Dynamic Sequence Length\n\nDynamic-TinyBERT has been fine-tuned for the NLP task of question answering, trained on the SQuAD 1.1 dataset. [Guskin et al. (2021)](https://neurips2021-nlp.github.io/papers/16/CameraReady/Dynamic_TinyBERT_NLSP2021_camera_ready.pdf) note:\n\n> Dynamic-TinyBERT is a TinyBERT model that utilizes sequence-length reduction and Hyperparameter Optimization for enhanced inference efficiency per any computational budget. Dynamic-TinyBERT is trained only once, performing on-par with BERT and achieving an accuracy-speedup trade-off superior to any other efficient approaches (up to 3.3x with <1% loss-drop).\n\n\n\n| Model Detail | Description |\n| "} {"downloads": 57463, "id": "valhalla/longformer-base-4096-finetuned-squadv1", "likes": 9, "pipeline_tag": "question-answering", "task": "question-answering", "meta": {"datasets": ["squad_v1"], "license": "mit"}, "description": "\n\n# LONGFORMER-BASE-4096 fine-tuned on SQuAD v1\nThis is longformer-base-4096 model fine-tuned on SQuAD v1 dataset for question answering task. \n\n[Longformer](https://arxiv.org/abs/2004.05150) model created by Iz Beltagy, Matthew E. Peters, Arman Coha from AllenAI. As the paper explains it \n\n> `Longformer` is a BERT-like model for long documents. \n\nThe pre-trained model can handle sequences with upto 4096 tokens. \n\n\n## Model Training\nThis model was trained on google colab v100 GPU. You can find the fine-tuning colab here [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1zEl5D-DdkBKva-DdreVOmN0hrAfzKG1o?usp=sharing).\n\nFew things to keep in mind while training longformer for QA task,\nby default longformer uses sliding-window local attention on all tokens. But For QA, all question tokens should have global attention. For more details on this please refer the paper. The `LongformerForQuestionAnswering` model automatically does that for you. To allow it to do that \n1. The input sequence must have three sep tokens, i.e the sequence should be encoded like this\n ` question context`. If you encode the question and answer as a input pair, then the tokenizer already takes care of that, you shouldn't worry about it.\n2. `input_ids` should always be a batch of examples. \n\n## Results\n|Metric | # Value |\n|"} {"downloads": 137, "id": "hfl/chinese-pert-base-mrc", "likes": 9, "pipeline_tag": "question-answering", "task": "question-answering", "meta": {"language": ["zh"], "license": "apache-2.0"}, "description": "\n\n## A Chinese MRC model built on Chinese PERT-base\n\n**Please use `BertForQuestionAnswering` to load this model!**\n\nThis is a Chinese machine reading comprehension (MRC) model built on PERT-base and fine-tuned on a mixture of Chinese MRC datasets.\n\nPERT is a pre-trained model based on permuted language model (PerLM) to learn text semantic information in a self-supervised manner without introducing the mask tokens [MASK]. It yields competitive results on in tasks such as reading comprehension and sequence labeling.\n\nResults on Chinese MRC datasets (EM/F1):\n\n(We report the checkpoint that has the best AVG score)\n\n| | CMRC 2018 Dev | DRCD Dev | SQuAD-Zen Dev (Answerable) | AVG |\n| :"} {"downloads": 27189, "id": "deepset/deberta-v3-base-squad2", "likes": 9, "pipeline_tag": "question-answering", "task": "question-answering", "meta": {"language": "en", "license": "cc-by-4.0", "tags": ["deberta", "deberta-v3"], "datasets": ["squad_v2"], "model-index": [{"name": "deepset/deberta-v3-base-squad2", "results": [{"task": {"type": "question-answering", "name": "Question Answering"}, "dataset": {"name": "squad_v2", "type": "squad_v2", "config": "squad_v2", "split": "validation"}, "metrics": [{"type": "exact_match", "value": 83.8248, "name": "Exact Match", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiY2IyZTEyYzNlOTAwZmFlNWRiZTdiNzQzMTUyM2FmZTQ3ZWQwNWZmMzc2ZDVhYWYyMzkxOTUyMGNlMWY0M2E5MiIsInZlcnNpb24iOjF9.y8KvfefMLI977BYun0X1rAq5qudmezW_UJe9mh6sYBoiWaBosDO5TRnEGR1BHzdxmv2EgPK_PSomtZvb043jBQ"}, {"type": "f1", "value": 87.41, "name": "F1", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiOWVhNjAwM2Q5N2Y3MGU4ZWY3N2Y0MmNjYWYwYmQzNTdiYWExODhkYmQ1YjIwM2I1ODEzNWIxZDI1ZWQ1YWRjNSIsInZlcnNpb24iOjF9.Jk0v1ZheLRFz6k9iNAgCMMZtPYj5eVwUCku4E76wRYc-jHPmiUuxvNiNkn6NW-jkBD8bJGMqDSjJyVpVMn9pBA"}]}, {"task": {"type": "question-answering", "name": "Question Answering"}, "dataset": {"name": "squad", "type": "squad", "config": "plain_text", "split": "validation"}, "metrics": [{"type": "exact_match", "value": 84.9678, "name": "Exact Match", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiOWUxYTg4MzU3YTdmMDRmMGM0NjFjMTcwNGM3YzljM2RkMTc1ZGNhMDQwMTgwNGI0ZDE4ZGMxZTE3YjY5YzQ0ZiIsInZlcnNpb24iOjF9.KKaJ1UtikNe2g6T8XhLoWNtL9X4dHHyl_O4VZ5LreBT9nXneGc21lI1AW3n8KXTFGemzRpRMvmCDyKVDHucdDQ"}, {"type": "f1", "value": 92.2777, "name": "F1", "verified": true, "verifyToken": "eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiNDU0ZTQwMzg4ZDY1ZWYxOGIxMzY2ODljZTBkMTNlYjA0ODBjNjcxNTg3ZDliYWU1YTdkYTM2NTIxOTg1MGM4OCIsInZlcnNpb24iOjF9.8VHg1BXx6gLw_K7MUK2QSE80Y9guiVR8n8K8nX4laGsLibxv5u_yDv9F3ahbUa1eZG_bbidl93TY2qFUiYHtAQ"}]}]}]}, "description": "\n\n# deberta-v3-base for QA \n\nThis is the [deberta-v3-base](https://huggingface.co/microsoft/deberta-v3-base) model, fine-tuned using the [SQuAD2.0](https://huggingface.co/datasets/squad_v2) dataset. It's been trained on question-answer pairs, including unanswerable questions, for the task of Question Answering. \n\n\n## Overview\n**Language model:** deberta-v3-base \n**Language:** English \n**Downstream-task:** Extractive QA \n**Training data:** SQuAD 2.0 \n**Eval data:** SQuAD 2.0 \n**Code:** See [an example QA pipeline on Haystack](https://haystack.deepset.ai/tutorials/first-qa-system) \n**Infrastructure**: 1x NVIDIA A10G\n\n## Hyperparameters\n\n```\nbatch_size = 12\nn_epochs = 4\nbase_LM_model = \"deberta-v3-base\"\nmax_seq_len = 512\nlearning_rate = 2e-5\nlr_schedule = LinearWarmup\nwarmup_proportion = 0.2\ndoc_stride = 128\nmax_query_length = 64\n``` \n\n## Usage\n\n### In Haystack\nHaystack is an NLP framework by deepset. You can use this model in a Haystack pipeline to do question answering at scale (over many documents). To load the model in [Haystack](https://github.com/deepset-ai/haystack/):\n```python\nreader = FARMReader(model_name_or_path=\"deepset/deberta-v3-base-squad2\")\n# or \nreader = TransformersReader(model_name_or_path=\"deepset/deberta-v3-base-squad2\",tokenizer=\"deepset/deberta-v3-base-squad2\")\n```\n\n### In Transformers\n```python\nfrom transformers import AutoModelForQuestionAnswering, AutoTokenizer, pipeline\nmodel_name = \"deepset/deberta-v3-base-squad2\"\n# a) Get predictions\nnlp = pipeline('question-answering', model=model_name, tokenizer=model_name)\nQA_input = {\n 'question': 'Why is model conversion important?',\n 'context': 'The option to convert models between FARM and transformers gives freedom to the user and let people easily switch between frameworks.'\n}\nres = nlp(QA_input)\n# b) Load model & tokenizer\nmodel = AutoModelForQuestionAnswering.from_pretrained(model_name)\ntokenizer = AutoTokenizer.from_pretrained(model_name)\n```\n\n## Authors\n**Sebastian Lee:** sebastian.lee [at] deepset.ai \n**Timo M\u00f6ller:** timo.moeller [at] deepset.ai \n**Malte Pietsch:** malte.pietsch [at] deepset.ai \n\n## About us\n
\n
\n \"\"\n
\n
\n \"\"\n
\n
\n\n[deepset](http://deepset.ai/) is the company behind the open-source NLP framework [Haystack](https://haystack.deepset.ai/) which is designed to help you build production ready NLP systems that use: Question answering, summarization, ranking etc.\n\n\nSome of our other work: \n- [Distilled roberta-base-squad2 (aka \"tinyroberta-squad2\")]([https://huggingface.co/deepset/tinyroberta-squad2)\n- [German BERT (aka \"bert-base-german-cased\")](https://deepset.ai/german-bert)\n- [GermanQuAD and GermanDPR datasets and models (aka \"gelectra-base-germanquad\", \"gbert-base-germandpr\")](https://deepset.ai/germanquad)\n\n## Get in touch and join the Haystack community\n\n

For more info on Haystack, visit our GitHub repo and Documentation. \n\nWe also have a Discord community open to everyone!

\n\n[Twitter](https://twitter.com/deepset_ai) | [LinkedIn](https://www.linkedin.com/company/deepset-ai/) | [Discord](https://haystack.deepset.ai/community/join) | [GitHub Discussions](https://github.com/deepset-ai/haystack/discussions) | [Website](https://deepset.ai) "} {"downloads": 717855, "id": "PygmalionAI/pygmalion-6b", "likes": 285, "pipeline_tag": "conversational", "task": "conversational", "meta": {"license": "creativeml-openrail-m", "language": ["en"], "thumbnail": null, "tags": ["text generation", "conversational"], "inference": false}, "description": "\n\n# Pygmalion 6B\n\n## Model description\n\nPymalion 6B is a proof-of-concept dialogue model based on EleutherAI's [GPT-J-6B](https://huggingface.co/EleutherAI/gpt-j-6B).\n\n**Warning:** This model is **NOT** suitable for use by minors. It **will** output X-rated content under certain circumstances.\n\n## Training data\n\nThe fine-tuning dataset consisted of 56MB of dialogue data gathered from multiple sources, which includes both real _and_ partially machine-generated conversations.\n\n## Training procedure\n\nModel weights were initialized from the `uft-6b` ConvoGPT model made available in [this commit](https://huggingface.co/hakurei/convogpt/tree/41b67bfddb6cd97070ffddf708e9720c9cb8d224/6b-uft).\n\nThe model was then further fine-tuned on ~48.5 million tokens for ~5k steps on 4 NVIDIA A40s using DeepSpeed.\n\n## Intended use\n\n### The easy way\n\nWe provide a notebook with a Gradio UI for playing around with the model without having to manually format inputs. This notebook can be found [here](https://github.com/PygmalionAI/gradio-ui/blob/master/notebooks/GPU.ipynb).\n\n### The manual way\n\nThe model can be used as a regular text generation model, but it'll perform best if the input prompt adheres to the following format:\n\n```\n[CHARACTER]'s Persona: [A few sentences about the character you want the model to play]\n\n[DIALOGUE HISTORY]\nYou: [Your input message here]\n[CHARACTER]:\n```\n\nWhere `[CHARACTER]` is, as you can probably guess, the name of the character you want the model to portray, `` should be used verbatim as a delimiter token to separate persona and scenario data from the dialogue, and `[DIALOGUE HISTORY]` is chat history so the model can have some conversational context to draw from. Ideally it'll be pairs of messages like:\n\n```\n[CHARACTER]: [some dialogue here]\nYou: [your response to the dialogue above]\n```\n\nApart from chat history, you can also just add example conversations in `[DIALOGUE HISTORY]` to show how the character should speak - ideally at the beginning, so it doesn't get confused as to what's conversation history vs. character definition.\n\n## Known issues\n\nWe haven't played around with the model enough to enumerate them. Feel free to give us some feedback!\n"} {"downloads": 199748, "id": "facebook/blenderbot-400M-distill", "likes": 148, "pipeline_tag": "conversational", "task": "conversational", "meta": {"language": ["en"], "thumbnail": null, "tags": ["convAI", "conversational", "facebook"], "license": "apache-2.0", "datasets": ["blended_skill_talk"], "metrics": ["perplexity"]}, "description": "\n\n## Model description\n\n+ Paper: [Recipes for building an open-domain chatbot]( https://arxiv.org/abs/2004.13637)\n+ [Original PARLAI Code](https://parl.ai/projects/recipes/)\n\n\n### Abstract\n\n\nBuilding open-domain chatbots is a challenging area for machine learning research. While prior work has shown that scaling neural models in the number of parameters and the size of the data they are trained on gives improved results, we show that other ingredients are important for a high-performing chatbot. Good conversation requires a number of skills that an expert conversationalist blends in a seamless way: providing engaging talking points and listening to their partners, both asking and answering questions, and displaying knowledge, empathy and personality appropriately, depending on the situation. We show that large scale models can learn these skills when given appropriate training data and choice of generation strategy. We build variants of these recipes with 90M, 2.7B and 9.4B parameter neural models, and make our models and code publicly available. Human evaluations show our best models are superior to existing approaches in multi-turn dialogue in terms of engagingness and humanness measurements. We then discuss the limitations of this work by analyzing failure cases of our models.\n"} {"downloads": 27292, "id": "microsoft/DialoGPT-large", "likes": 130, "pipeline_tag": "conversational", "task": "conversational", "meta": {"thumbnail": "https://huggingface.co/front/thumbnails/dialogpt.png", "tags": ["conversational"], "license": "mit"}, "description": "\n\n## A State-of-the-Art Large-scale Pretrained Response generation model (DialoGPT)\n\nDialoGPT is a SOTA large-scale pretrained dialogue response generation model for multiturn conversations. \nThe [human evaluation results](https://github.com/dreasysnail/Dialogpt_dev#human-evaluation) indicate that the response generated from DialoGPT is comparable to human response quality under a single-turn conversation Turing test.\nThe model is trained on 147M multi-turn dialogue from Reddit discussion thread. \n\n* Multi-turn generation examples from an interactive environment:\n\n|Role | Response |\n|"} {"downloads": 99824, "id": "microsoft/DialoGPT-medium", "likes": 119, "pipeline_tag": "conversational", "task": "conversational", "meta": {"thumbnail": "https://huggingface.co/front/thumbnails/dialogpt.png", "tags": ["conversational"], "license": "mit"}, "description": "\n\n## A State-of-the-Art Large-scale Pretrained Response generation model (DialoGPT)\n\nDialoGPT is a SOTA large-scale pretrained dialogue response generation model for multiturn conversations. \nThe [human evaluation results](https://github.com/dreasysnail/Dialogpt_dev#human-evaluation) indicate that the response generated from DialoGPT is comparable to human response quality under a single-turn conversation Turing test.\nThe model is trained on 147M multi-turn dialogue from Reddit discussion thread. \n\n* Multi-turn generation examples from an interactive environment:\n\n|Role | Response |\n|"} {"downloads": 25841, "id": "facebook/blenderbot-3B", "likes": 81, "pipeline_tag": "conversational", "task": "conversational", "meta": {"language": ["en"], "thumbnail": null, "tags": ["convAI", "conversational", "facebook"], "license": "apache-2.0", "datasets": ["blended_skill_talk"], "metrics": ["perplexity"]}, "description": "\n\n## Model description\n\n+ Paper: [Recipes for building an open-domain chatbot](https://arxiv.org/abs/1907.06616)\n+ [Original PARLAI Code](https://parl.ai/projects/recipes/)\n\n\n### Abstract\n\n\nBuilding open-domain chatbots is a challenging area for machine learning research. While prior work has shown that scaling neural models in the number of parameters and the size of the data they are trained on gives improved results, we show that other ingredients are important for a high-performing chatbot. Good conversation requires a number of skills that an expert conversationalist blends in a seamless way: providing engaging talking points and listening to their partners, both asking and answering questions, and displaying knowledge, empathy and personality appropriately, depending on the situation. We show that large scale models can learn these skills when given appropriate training data and choice of generation strategy. We build variants of these recipes with 90M, 2.7B and 9.4B parameter neural models, and make our models and code publicly available. Human evaluations show our best models are superior to existing approaches in multi-turn dialogue in terms of engagingness and humanness measurements. We then discuss the limitations of this work by analyzing failure cases of our models.\n\n"} {"downloads": 3615, "id": "allenai/cosmo-xl", "likes": 68, "pipeline_tag": "conversational", "task": "conversational", "meta": {"language": ["en"], "tags": ["conversational", "dialogue", "response generation"], "license": "apache-2.0", "datasets": ["allenai/soda", "allenai/prosocial-dialog"]}, "description": "\n\n# Model Card for \ud83e\uddd1\ud83c\udffb\u200d\ud83d\ude80COSMO\n\n\ud83e\uddd1\ud83c\udffb\u200d\ud83d\ude80COSMO is a conversation agent with greater generalizability on both in- and out-of-domain chitchat datasets (e.g., DailyDialog, BlendedSkillTalk). It is trained on two datasets: SODA and ProsocialDialog. COSMO is especially aiming to model natural human conversations. It can accept situation descriptions as well as instructions on what role it should play in the situation.\n\n## Model Description\n- **Repository:** [Code](https://github.com/skywalker023/sodaverse)\n- **Paper:** [SODA: Million-scale Dialogue Distillation with Social Commonsense Contextualization](https://arxiv.org/abs/2212.10465)\n- **Point of Contact:** [Hyunwoo Kim](mailto:hyunwook@allenai.org)\n\n## Model Training\n\n\ud83e\uddd1\ud83c\udffb\u200d\ud83d\ude80COSMO is trained on our two recent datasets: \ud83e\udd64[SODA](https://huggingface.co/datasets/allenai/soda) and [Prosocial Dialog](https://huggingface.co/datasets/allenai/prosocial-dialog).\nThe backbone model of COSMO is the [lm-adapted T5](https://huggingface.co/google/t5-xl-lm-adapt).\n\n### How to use\n\n> \ud83d\udca1 Note: The HuggingFace inference API for Cosmo is not working correctly, we gently guide you to [our repository](https://hyunw.kim/sodaverse) to try out the demo code!\n\n> \ud83d\udea8 Disclaimer: We would like to emphasize that COSMO is trained on SODA and ProsocialDialog mainly for academic/research purposes. We discourage using COSMO in real-world applications or services as is. Model outputs should not be used for advice for humans, and could be potentially offensive, problematic, or harmful. The model\u2019s output does not necessarily reflect the views and opinions of the authors and their associated affiliations.\n\nBelow is a simple code snippet to get Cosmo running :)\n\n```python\nimport torch\nfrom transformers import AutoTokenizer, AutoModelForSeq2SeqLM\n\ndevice = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")\ntokenizer = AutoTokenizer.from_pretrained(\"allenai/cosmo-xl\")\nmodel = AutoModelForSeq2SeqLM.from_pretrained(\"allenai/cosmo-xl\").to(device)\n\ndef set_input(situation_narrative, role_instruction, conversation_history):\n input_text = \" \".join(conversation_history)\n\n if role_instruction != \"\":\n input_text = \"{} {}\".format(role_instruction, input_text)\n\n if situation_narrative != \"\":\n input_text = \"{} {}\".format(situation_narrative, input_text)\n\n return input_text\n\ndef generate(situation_narrative, role_instruction, conversation_history):\n \"\"\"\n situation_narrative: the description of situation/context with the characters included (e.g., \"David goes to an amusement park\")\n role_instruction: the perspective/speaker instruction (e.g., \"Imagine you are David and speak to his friend Sarah\").\n conversation_history: the previous utterances in the conversation in a list\n \"\"\"\n\n input_text = set_input(situation_narrative, role_instruction, conversation_history) \n\n inputs = tokenizer([input_text], return_tensors=\"pt\").to(device)\n outputs = model.generate(inputs[\"input_ids\"], max_new_tokens=128, temperature=1.0, top_p=.95, do_sample=True)\n response = tokenizer.decode(outputs[0], skip_special_tokens=True, clean_up_tokenization_spaces=False)\n\n return response\n\nsituation = \"Cosmo had a really fun time participating in the EMNLP conference at Abu Dhabi.\"\ninstruction = \"You are Cosmo and you are talking to a friend.\" # You can also leave the instruction empty\n\nconversation = [\n \"Hey, how was your trip to Abu Dhabi?\"\n]\n\nresponse = generate(situation, instruction, conversation)\nprint(response)\n```\n\n### Further Details, Social Impacts, Bias, and Limitations\nPlease refer to our [paper](https://arxiv.org/abs/2212.10465).\nCosmo is mostly trained on social chitchat. Therefore, we do not encourage having knowledge-intensive conversations (e.g., science, medical issues, law).\nSignificant research has explored bias and fairness issues with language models (see, e.g., [Sheng et al. 2021](https://aclanthology.org/2021.acl-long.330.pdf) and [Bender et al. 2021](https://dl.acm.org/doi/pdf/10.1145/3442188.3445922)). Predictions generated by the model may include disturbing and harmful stereotypes across protected classes; identity characteristics; and sensitive, social, and occupational groups.\n\n## Additional Information\n\nFor a brief summary of our paper, please see this [tweet](https://twitter.com/hyunw__kim/status/1605400305126248448).\n\n### Citation\n\nPlease cite our work if you find the resources in this repository useful:\n```\n@article{kim2022soda,\n title={SODA: Million-scale Dialogue Distillation with Social Commonsense Contextualization},\n author={Hyunwoo Kim and Jack Hessel and Liwei Jiang and Peter West and Ximing Lu and Youngjae Yu and Pei Zhou and Ronan Le Bras and Malihe Alikhani and Gunhee Kim and Maarten Sap and Yejin Choi},\n journal={ArXiv},\n year={2022},\n volume={abs/2212.10465}\n}\n```"} {"downloads": 1629, "id": "af1tang/personaGPT", "likes": 47, "pipeline_tag": "conversational", "task": "conversational", "meta": {"tags": ["conversational"], "license": "gpl-3.0"}, "description": "\n## A conversational agent with many personalities (PersonaGPT)\nPersonaGPT is an open-domain conversational agent designed to do 2 tasks:\n\n1. decoding _personalized_ responses based on input personality facts (the \"persona\" profile of the bot). \n2. incorporating _turn-level goals_ into its responses through \"action codes\" (e.g., \"talk about work\", \"ask about favorite music\").\n\nIt builds on the [DialoGPT-medium](https://huggingface.co/microsoft/DialoGPT-medium) pretrained model based on the [GPT-2](https://github.com/openai/gpt-2) architecture. \nThis model is trained on the [Persona-Chat](https://arxiv.org/pdf/1801.07243) dataset, with added special tokens to better distinguish between conversational history and personality traits for dyadic conversations. Furthermore, some active learning was used to train the model to do _controlled_ decoding using turn-level goals.\n\n## Full Repo\n\nPreprocessing, training and implementation details can be found in the [personaGPT repo](https://github.com/af1tang/personaGPT).\n\n### How to Use\n\n\n1. Load the model and define some helper functions.\n\n```python\nfrom transformers import GPT2Tokenizer, GPT2LMHeadModel\nimport torch\ntokenizer = AutoTokenizer.from_pretrained(\"af1tang/personaGPT\")\nmodel = AutoModelForCausalLM.from_pretrained(\"af1tang/personaGPT\")\nif torch.cuda.is_available():\n\tmodel = model.cuda()\n## utility functions ##\nflatten = lambda l: [item for sublist in l for item in sublist]\n\ndef to_data(x):\n if torch.cuda.is_available():\n x = x.cpu()\n return x.data.numpy()\n\ndef to_var(x):\n if not torch.is_tensor(x):\n x = torch.Tensor(x)\n if torch.cuda.is_available():\n x = x.cuda()\n return x\n\ndef display_dialog_history(dialog_hx):\n for j, line in enumerate(dialog_hx):\n msg = tokenizer.decode(line)\n if j %2 == 0:\n print(\">> User: \"+ msg)\n else:\n print(\"Bot: \"+msg)\n print()\n\ndef generate_next(bot_input_ids, do_sample=True, top_k=10, top_p=.92,\n max_length=1000, pad_token=tokenizer.eos_token_id):\n full_msg = model.generate(bot_input_ids, do_sample=True,\n top_k=top_k, top_p=top_p, \n max_length=max_length, pad_token_id=tokenizer.eos_token_id)\n msg = to_data(full_msg.detach()[0])[bot_input_ids.shape[-1]:]\n return msg\n```\n\n2. Give your chatbot partner a set of personalities. \n\n\n```python\n# get personality facts for conversation\npersonas = []\nfor i in range(3):\n response = input(\">> Fact %d: \"%(i+1))+ tokenizer.eos_token\n personas.append(response)\npersonas = tokenizer.encode(''.join(['<|p2|>'] + personas + ['<|sep|>'] + ['<|start|>']))\n```\n\n3. The first use of PersonaGPT is to do _personalized_ dialog generation. Use the following loop to interact with the model.\n\n```python\n# converse for 8 turns\ndialog_hx = []\nfor step in range(8):\n # encode the user input\n user_inp = tokenizer.encode(input(\">> User: \") + tokenizer.eos_token)\n # append to the chat history\n dialog_hx.append(user_inp)\n \n # generated a response while limiting the total chat history to 1000 tokens, \n bot_input_ids = to_var([personas + flatten(dialog_hx)]).long()\n msg = generate_next(bot_input_ids)\n dialog_hx.append(msg)\n print(\"Bot: {}\".format(tokenizer.decode(msg, skip_special_tokens=True)))\n```\n\n\nExample of personalized decoding:\n\n| | Persona Facts |\n|"} {"downloads": 22032, "id": "microsoft/GODEL-v1_1-large-seq2seq", "likes": 42, "pipeline_tag": "conversational", "task": "conversational", "meta": {"thumbnail": "https://huggingface.co/front/thumbnails/dialogpt.png", "tags": ["conversational"], "license": "mit"}, "description": "\n\n### Large-Scale Pre-Training for Goal-Directed Dialog (GODEL)\n\nGODEL is a large-scale pre-trained model for goal-directed dialogs. It is parameterized with a Transformer-based encoder-decoder model and trained for response generation grounded in external text, which allows more effective fine-tuning on dialog tasks that require conditioning the response on information that is external to the current conversation (e.g., a retrieved document). The pre-trained model can be efficiently fine-tuned and adapted to accomplish a new dialog task with a handful of task-specific dialogs. The v1.1 model is trained on 551M multi-turn dialogs from Reddit discussion thread, and 5M instruction and knowledge grounded dialogs.\n\n##### Multi-turn generation examples from an interactive environment:\nChitchat example:\n> Instruction: given a dialog context, you need to response empathically.
\n> User: Does money buy happiness?
\n> Agent: It is a question. Money buys you a lot of things, but not enough to buy happiness.
\n> User: What is the best way to buy happiness ?
\n> Agent: Happiness is bought through your experience and not money.
\n\nGrounded response generation example:\n> Instruction: given a dialog context and related knowledge, you need to response safely based on the knowledge.
\n> Knowledge: The best Stardew Valley mods PCGamesN_0 / About SMAPI
\n> User: My favorite game is stardew valley. stardew valley is very fun.
\n> Agent: I love Stardew Valley mods, like PCGamesN_0 / About SMAPI.
\n\nPlease find the information about preprocessing, training and full details of the GODEL in the [project webpage](https://aka.ms/GODEL).\n\nArXiv paper: [https://arxiv.org/abs/2206.11309](https://arxiv.org/abs/2206.11309)\n\n### How to use\n\nNow we are ready to try out how the model works as a chatting partner!\n\n```python\nfrom transformers import AutoTokenizer, AutoModelForSeq2SeqLM\ntokenizer = AutoTokenizer.from_pretrained(\"microsoft/GODEL-v1_1-large-seq2seq\")\nmodel = AutoModelForSeq2SeqLM.from_pretrained(\"microsoft/GODEL-v1_1-large-seq2seq\")\ndef generate(instruction, knowledge, dialog):\n if knowledge != '':\n knowledge = '[KNOWLEDGE] ' + knowledge\n dialog = ' EOS '.join(dialog)\n query = f\"{instruction} [CONTEXT] {dialog} {knowledge}\"\n input_ids = tokenizer(f\"{query}\", return_tensors=\"pt\").input_ids\n outputs = model.generate(input_ids, max_length=128, min_length=8, top_p=0.9, do_sample=True)\n output = tokenizer.decode(outputs[0], skip_special_tokens=True)\n return output\n# Instruction for a chitchat task\ninstruction = f'Instruction: given a dialog context, you need to response empathically.'\n# Leave the knowldge empty\nknowledge = ''\ndialog = [\n 'Does money buy happiness?',\n 'It is a question. Money buys you a lot of things, but not enough to buy happiness.',\n 'What is the best way to buy happiness ?'\n]\nresponse = generate(instruction, knowledge, dialog)\nprint(response)\n```\n\n### Citation\nif you use this code and data in your research, please cite our arxiv paper:\n```\n@misc{peng2022godel,\nauthor = {Peng, Baolin and Galley, Michel and He, Pengcheng and Brockett, Chris and Liden, Lars and Nouri, Elnaz and Yu, Zhou and Dolan, Bill and Gao, Jianfeng},\ntitle = {GODEL: Large-Scale Pre-training for Goal-Directed Dialog},\nhowpublished = {arXiv},\nyear = {2022},\nmonth = {June},\nurl = {https://www.microsoft.com/en-us/research/publication/godel-large-scale-pre-training-for-goal-directed-dialog/},\n}\n```"} {"downloads": 8911, "id": "microsoft/GODEL-v1_1-base-seq2seq", "likes": 36, "pipeline_tag": "conversational", "task": "conversational", "meta": {"thumbnail": "https://huggingface.co/front/thumbnails/dialogpt.png", "tags": ["conversational"], "license": "mit"}, "description": "\n\n### Large-Scale Pre-Training for Goal-Directed Dialog (GODEL)\n\nGODEL is a large-scale pre-trained model for goal-directed dialogs. It is parameterized with a Transformer-based encoder-decoder model and trained for response generation grounded in external text, which allows more effective fine-tuning on dialog tasks that require conditioning the response on information that is external to the current conversation (e.g., a retrieved document). The pre-trained model can be efficiently fine-tuned and adapted to accomplish a new dialog task with a handful of task-specific dialogs. The v1.1 model is trained on 551M multi-turn dialogs from Reddit discussion thread, and 5M instruction and knowledge grounded dialogs.\n\n##### Multi-turn generation examples from an interactive environment:\nChitchat example:\n> Instruction: given a dialog context, you need to response empathically.
\n> User: Does money buy happiness?
\n> Agent: It is a question. Money buys you a lot of things, but not enough to buy happiness.
\n> User: What is the best way to buy happiness ?
\n> Agent: Happiness is bought through your experience and not money.
\n\nGrounded response generation example:\n> Instruction: given a dialog context and related knowledge, you need to response safely based on the knowledge.
\n> Knowledge: The best Stardew Valley mods PCGamesN_0 / About SMAPI
\n> User: My favorite game is stardew valley. stardew valley is very fun.
\n> Agent: I love Stardew Valley mods, like PCGamesN_0 / About SMAPI.
\n\nPlease find the information about preprocessing, training and full details of the GODEL in the [project webpage](https://aka.ms/GODEL).\n\nArXiv paper: [https://arxiv.org/abs/2206.11309](https://arxiv.org/abs/2206.11309)\n\n### How to use\n\nNow we are ready to try out how the model works as a chatting partner!\n\n```python\n\nfrom transformers import AutoTokenizer, AutoModelForSeq2SeqLM\n\ntokenizer = AutoTokenizer.from_pretrained(\"microsoft/GODEL-v1_1-base-seq2seq\")\nmodel = AutoModelForSeq2SeqLM.from_pretrained(\"microsoft/GODEL-v1_1-base-seq2seq\")\n\ndef generate(instruction, knowledge, dialog):\n if knowledge != '':\n knowledge = '[KNOWLEDGE] ' + knowledge\n dialog = ' EOS '.join(dialog)\n query = f\"{instruction} [CONTEXT] {dialog} {knowledge}\"\n input_ids = tokenizer(f\"{query}\", return_tensors=\"pt\").input_ids\n outputs = model.generate(input_ids, max_length=128, min_length=8, top_p=0.9, do_sample=True)\n output = tokenizer.decode(outputs[0], skip_special_tokens=True)\n return output\n\n# Instruction for a chitchat task\ninstruction = f'Instruction: given a dialog context, you need to response empathically.'\n# Leave the knowldge empty\nknowledge = ''\ndialog = [\n 'Does money buy happiness?',\n 'It is a question. Money buys you a lot of things, but not enough to buy happiness.',\n 'What is the best way to buy happiness ?'\n]\nresponse = generate(instruction, knowledge, dialog)\nprint(response)\n```\n\n### Citation\nif you use this code and data in your research, please cite our arxiv paper:\n```\n@misc{peng2022godel,\nauthor = {Peng, Baolin and Galley, Michel and He, Pengcheng and Brockett, Chris and Liden, Lars and Nouri, Elnaz and Yu, Zhou and Dolan, Bill and Gao, Jianfeng},\ntitle = {GODEL: Large-Scale Pre-training for Goal-Directed Dialog},\nhowpublished = {arXiv},\nyear = {2022},\nmonth = {June},\nurl = {https://www.microsoft.com/en-us/research/publication/godel-large-scale-pre-training-for-goal-directed-dialog/},\n}\n```"} {"downloads": 36998, "id": "microsoft/DialoGPT-small", "likes": 31, "pipeline_tag": "conversational", "task": "conversational", "meta": {"thumbnail": "https://huggingface.co/front/thumbnails/dialogpt.png", "tags": ["conversational"], "license": "mit"}, "description": "\n\n## A State-of-the-Art Large-scale Pretrained Response generation model (DialoGPT)\n\nDialoGPT is a SOTA large-scale pretrained dialogue response generation model for multiturn conversations. \nThe [human evaluation results](https://github.com/dreasysnail/Dialogpt_dev#human-evaluation) indicate that the response generated from DialoGPT is comparable to human response quality under a single-turn conversation Turing test.\nThe model is trained on 147M multi-turn dialogue from Reddit discussion thread. \n\n* Multi-turn generation examples from an interactive environment:\n\n|Role | Response |\n|"} {"downloads": 4562, "id": "facebook/blenderbot_small-90M", "likes": 24, "pipeline_tag": "conversational", "task": "conversational", "meta": {"language": ["en"], "thumbnail": null, "tags": ["convAI", "conversational", "facebook"], "license": "apache-2.0", "datasets": ["blended_skill_talk"], "metrics": ["perplexity"]}, "description": "\n\n## Model description\n\n+ Paper: [Recipes for building an open-domain chatbot](https://arxiv.org/abs/1907.06616)\n+ [Original PARLAI Code](https://parl.ai/projects/recipes/)\n\n\n### Abstract\n\n\nBuilding open-domain chatbots is a challenging area for machine learning research. While prior work has shown that scaling neural models in the number of parameters and the size of the data they are trained on gives improved results, we show that other ingredients are important for a high-performing chatbot. Good conversation requires a number of skills that an expert conversationalist blends in a seamless way: providing engaging talking points and listening to their partners, both asking and answering questions, and displaying knowledge, empathy and personality appropriately, depending on the situation. We show that large scale models can learn these skills when given appropriate training data and choice of generation strategy. We build variants of these recipes with 90M, 2.7B and 9.4B parameter neural models, and make our models and code publicly available. Human evaluations show our best models are superior to existing approaches in multi-turn dialogue in terms of engagingness and humanness measurements. We then discuss the limitations of this work by analyzing failure cases of our models.\n\n"} {"downloads": 7312, "id": "PygmalionAI/pygmalion-350m", "likes": 24, "pipeline_tag": "conversational", "task": "conversational", "meta": {"language": ["en"], "thumbnail": null, "tags": ["convAI", "conversational"], "inference": false}, "description": "\n# pygmalion-350m\n\n# Model description\n\nThis is a proof-of-concept fine-tune of Facebook's OPT-350M model optimized for dialogue, to be used as a stepping stone to higher parameter models.\n\n**Disclaimer:** NSFW data was included in the fine-tuning of this model. Although SFW inputs will usually result in SFW outputs, you are advised to **chat at your own risk. This model is not suitable for use by minors.**\n\n# Fine-tuning process\n\nThis model was much easier than expected to create.\n\nWe used the [ColossalAI](https://www.colossalai.org/) library to fine-tune the [OPT-350M](https://huggingface.co/facebook/opt-350m) model originally trained by Facebook on The Pile. Though our initial dataset was sets of dialogue gathered from various sources totaling about 50 MB in size, early training runs revealed that the model converged after only 7% of the dataset was passed through. To alleviate this, we massively reduced the size of the dataset to only 273 KB.\n\nColossalAI's magic allowed for something incredible: this entire model was fine-tuned on a singular GPU with only 6 GB ***(!)*** of VRAM. Fine-tuning took less than an hour to complete."} {"downloads": 12498, "id": "PygmalionAI/pygmalion-2.7b", "likes": 23, "pipeline_tag": "conversational", "task": "conversational", "meta": {"license": "creativeml-openrail-m", "language": ["en"], "thumbnail": null, "tags": ["text generation", "conversational"], "inference": false}, "description": "\n\n# Pygmalion 2.7B\n\n## Model description\n\nPymalion 2.7B is a proof-of-concept dialogue model based on EleutherAI's [gpt-neo-2.7B](https://huggingface.co/EleutherAI/gpt-neo-2.7B).\n\n**Warning:** This model is **NOT** suitable for use by minors. It **will** output X-rated content under certain circumstances.\n\n## Training data\n\nThe fine-tuning dataset consisted of 56MB of dialogue data gathered from multiple sources, which includes both real _and_ partially machine-generated conversations.\n\n## Training procedure\n\nModel weights were initialized from the `uft-2.7b` ConvoGPT model made available in [this commit](https://huggingface.co/hakurei/convogpt/tree/07707377dee0aa7d1ee5363ef660b13eb5b73f9d/2.7b-uft).\n\nThe model was then further fine-tuned on ~48.5 million tokens for ~5k steps on 4 NVIDIA A40s using DeepSpeed.\n\n## Intended use\n\n### The easy way\n\nWe provide a notebook with a Gradio UI for playing around with the model without having to manually format inputs. This notebook can be found [here](https://github.com/PygmalionAI/gradio-ui/blob/master/notebooks/GPU.ipynb).\n\n### The manual way\n\nThe model can be used as a regular text generation model, but it'll perform best if the input prompt adheres to the following format:\n\n```\n[CHARACTER]'s Persona: [A few sentences about the character you want the model to play]\n\n[DIALOGUE HISTORY]\nYou: [Your input message here]\n[CHARACTER]:\n```\n\nWhere `[CHARACTER]` is, as you can probably guess, the name of the character you want the model to portray, `` should be used verbatim as a delimiter token to separate persona and scenario data from the dialogue, and `[DIALOGUE HISTORY]` is chat history so the model can have some conversational context to draw from. Ideally it'll be pairs of messages like:\n\n```\n[CHARACTER]: [some dialogue here]\nYou: [your response to the dialogue above]\n```\n\nApart from chat history, you can also just add example conversations in `[DIALOGUE HISTORY]` to show how the character should speak - ideally at the beginning, so it doesn't get confused as to what's conversation history vs. character definition.\n\n## Known issues\n\nWe haven't played around with the model enough to enumerate them. Feel free to give us some feedback!\n"} {"downloads": 1204, "id": "satvikag/chatbot", "likes": 18, "pipeline_tag": "conversational", "task": "conversational", "meta": {"tags": ["conversational"], "license": "mit"}, "description": "\n# DialoGPT Trained on the Speech of a Game Character\nThis is an instance of [microsoft/DialoGPT-medium](https://huggingface.co/microsoft/DialoGPT-medium) trained on a game character, Joshua from [The World Ends With You](https://en.wikipedia.org/wiki/The_World_Ends_with_You). The data comes from [a Kaggle game script dataset](https://www.kaggle.com/ruolinzheng/twewy-game-script).\nChat with the model:\n```python\ntokenizer = AutoTokenizer.from_pretrained('microsoft/DialoGPT-small')\nmodel = AutoModelWithLMHead.from_pretrained('output-small')\n\n# Let's chat for 5 lines\nfor step in range(100):\n # encode the new user input, add the eos_token and return a tensor in Pytorch\n new_user_input_ids = tokenizer.encode(input(\">> User:\") + tokenizer.eos_token, return_tensors='pt')\n # print(new_user_input_ids)\n\n # append the new user input tokens to the chat history\n bot_input_ids = torch.cat([chat_history_ids, new_user_input_ids], dim=-1) if step > 0 else new_user_input_ids\n\n # generated a response while limiting the total chat history to 1000 tokens, \n chat_history_ids = model.generate(\n bot_input_ids, max_length=500,\n pad_token_id=tokenizer.eos_token_id, \n no_repeat_ngram_size=3, \n do_sample=True, \n top_k=100, \n top_p=0.7,\n temperature = 0.8\n )\n \n # pretty print last ouput tokens from bot\n print(\"AI: {}\".format(tokenizer.decode(chat_history_ids[:, bot_input_ids.shape[-1]:][0], skip_special_tokens=True)))\n```"} {"downloads": 85, "id": "hyunwoongko/blenderbot-9B", "likes": 18, "pipeline_tag": "conversational", "task": "conversational", "meta": {"language": ["en"], "thumbnail": null, "tags": ["convAI", "conversational", "facebook"], "license": "apache-2.0", "datasets": ["blended_skill_talk"], "metrics": ["perplexity"]}, "description": "\n\n## Model description\n\n+ Paper: [Recipes for building an open-domain chatbot](https://arxiv.org/abs/1907.06616)\n+ [Original PARLAI Code](https://parl.ai/projects/recipes/)\n\n\n### Abstract\n\n\nBuilding open-domain chatbots is a challenging area for machine learning research. While prior work has shown that scaling neural models in the number of parameters and the size of the data they are trained on gives improved results, we show that other ingredients are important for a high-performing chatbot. Good conversation requires a number of skills that an expert conversationalist blends in a seamless way: providing engaging talking points and listening to their partners, both asking and answering questions, and displaying knowledge, empathy and personality appropriately, depending on the situation. We show that large scale models can learn these skills when given appropriate training data and choice of generation strategy. We build variants of these recipes with 90M, 2.7B and 9.4B parameter neural models, and make our models and code publicly available. Human evaluations show our best models are superior to existing approaches in multi-turn dialogue in terms of engagingness and humanness measurements. We then discuss the limitations of this work by analyzing failure cases of our models.\n\n"} {"downloads": 1233, "id": "facebook/blenderbot-1B-distill", "likes": 17, "pipeline_tag": "conversational", "task": "conversational", "meta": {"language": ["en"], "thumbnail": null, "tags": ["convAI", "conversational", "facebook"], "license": "apache-2.0", "datasets": ["blended_skill_talk"], "metrics": ["perplexity"]}, "description": "\n\n## Model description\n\n+ Paper: [Recipes for building an open-domain chatbot](https://arxiv.org/abs/1907.06616)\n+ [Original PARLAI Code](https://parl.ai/projects/recipes/)\n\n\n### Abstract\n\n\nBuilding open-domain chatbots is a challenging area for machine learning research. While prior work has shown that scaling neural models in the number of parameters and the size of the data they are trained on gives improved results, we show that other ingredients are important for a high-performing chatbot. Good conversation requires a number of skills that an expert conversationalist blends in a seamless way: providing engaging talking points and listening to their partners, both asking and answering questions, and displaying knowledge, empathy and personality appropriately, depending on the situation. We show that large scale models can learn these skills when given appropriate training data and choice of generation strategy. We build variants of these recipes with 90M, 2.7B and 9.4B parameter neural models, and make our models and code publicly available. Human evaluations show our best models are superior to existing approaches in multi-turn dialogue in terms of engagingness and humanness measurements. We then discuss the limitations of this work by analyzing failure cases of our models.\n"} {"downloads": 7248, "id": "PygmalionAI/pygmalion-1.3b", "likes": 16, "pipeline_tag": "conversational", "task": "conversational", "meta": {"license": "agpl-3.0", "language": ["en"], "thumbnail": null, "tags": ["text generation", "conversational"], "inference": false}, "description": "\n\n# Pygmalion 1.3B\n\n## Model description\n\nPymalion 1.3B is a proof-of-concept dialogue model based on EleutherAI's [pythia-1.3b-deduped](https://huggingface.co/EleutherAI/pythia-1.3b-deduped).\n\n**Warning:** This model is **NOT** suitable for use by minors. It **will** output X-rated content under certain circumstances.\n\n## Training data\n\nThe fine-tuning dataset consisted of 56MB of dialogue data gathered from multiple sources, which includes both real _and_ partially machine-generated conversations.\n\n## Training procedure\n\nFine-tuning was done using [ColossalAI](https://github.com/hpcaitech/ColossalAI) (specifically, with a slightly modified version of their [OPT fine-tune example](https://github.com/hpcaitech/ColossalAI/blob/78509124d32b63b7fc36f6508e0576a326d51422/examples/language/opt/run_clm.py)) for around 11.4 million tokens over 5440 steps on a single 24GB GPU. The run took just under 21 hours.\n\n## Intended use\n\n### The easy way\n\nWe provide a notebook with a Gradio UI for playing around with the model without having to manually format inputs. This notebook can be found [here](https://github.com/PygmalionAI/gradio-ui/blob/master/notebooks/GPU.ipynb).\n\n### The manual way\n\nThe model can be used as a regular text generation model, but it'll perform best if the input prompt adheres to the following format:\n\n```\n[CHARACTER]'s Persona: [A few sentences about the character you want the model to play]\n\n[DIALOGUE HISTORY]\nYou: [Your input message here]\n[CHARACTER]:\n```\n\nWhere `[CHARACTER] `is, as you can probably guess, the name of the character you want the model to portray, and `[DIALOGUE HISTORY]` is chat history so the model can have some conversational context to draw from. Ideally it'll be pairs of messages like:\n\n```\n[CHARACTER]: [some dialogue here]\nYou: [your response to the dialogue above]\n```\n\nApart from chat history, you can also just add example conversations in `[DIALOGUE HISTORY]` to show how the character should speak - ideally at the beginning, so it doesn't get confused as to what's conversation history vs. character definition.\n\n## Known issues\n\n- The model can get stuck repeating certain phrases, or sometimes even entire sentences.\n - We believe this is due to that behavior being present in the training data itself, and plan to investigate and adjust accordingly for future versions.\n"} {"downloads": 684, "id": "deepparag/Aeona", "likes": 15, "pipeline_tag": "conversational", "task": "conversational", "meta": {"thumbnail": "https://images-ext-2.discordapp.net/external/Wvtx1L98EbA7DR2lpZPbDxDuO4qmKt03nZygATZtXgk/%3Fsize%3D4096/https/cdn.discordapp.com/avatars/931226824753700934/338a9e413bbceaeb9095a29e97d4fac0.png", "tags": ["conversational"], "license": "mit", "pipeline_tag": "conversational", "metrics": ["accuracy", "f1", "perplexity"], "datasets": ["blended_skill_talk"]}, "description": "\n\n# Aeona | Chatbot\n![Aeona Banner](https://github.com/deepsarda/Aeona/blob/master/dashboard/static/banner.png?raw=true)\n\n\n\nAn generative AI made using [microsoft/DialoGPT-small](https://huggingface.co/microsoft/DialoGPT-small).\n\n\nRecommended to use along with an [AIML Chatbot](https://github.com/deepsarda/Aeona-Aiml) to reduce load, get better replies, add name and personality to your bot.\nUsing an AIML Chatbot will allow you to hardcode some replies also.\n\n# AEONA\nAeona is an chatbot which hope's to be able to talk with humans as if its an friend!\nIt's main target platform is discord. \nYou can invite the bot [here](https://aeona.xyz).\n\nTo learn more about this project and chat with the ai, you can use this [website](https://aeona.xyz/).\n\nAeona works why using context of the previous messages and guessing the personality of the human who is talking with it and adapting its own personality to better talk with the user.\n\n# Participate and Help the AI improve or just hang out at [hugging face discussions](https://huggingface.co/deepparag/Aeona/discussions)\n\n## Goals\n The goal is to create an AI which will work with AIML in order to create the most human like AI.\n \n #### Why not an AI on its own?\n For AI it is not possible (realistically) to learn about the user and store data on them, when compared to an AIML which can even execute code!\n The goal of the AI is to generate responses where the AIML fails.\n \n Hence the goals becomes to make an AI which has a wide variety of knowledge, yet be as small as possible!\n So we use 3 dataset:-\n 1. [Movielines](https://www.kaggle.com/Cornell-University/movie-dialog-corpus) The movie lines promote longer and more thought out responses but it can be very random. About 200k lines!\n 2. [Discord Messages](https://www.kaggle.com/jef1056/discord-data) The messages are on a wide variety of topics filtered and removed spam which makes the AI highly random but gives it a very random response to every days questions! about 120 million messages!\n 3. Custom dataset scrapped from my messages, These messages are very narrow teaching this dataset and sending a random reply will make the AI say sorry loads of time!\n \n## Training\n The Discord Messages Dataset simply dwarfs the other datasets, Hence the data sets are repeated.\n This leads to them covering each others issues!\n \n The AI has a context of 6 messages which means it will reply until the 4th message from user.\n [Example](https://huggingface.co/deepparag/Aeona-Beta/discussions/1)\n \n## Tips for Hugging Face interference\n I recommend send the user input,\n previous 3 AI and human responses.\n \n Using more context than this will lead to useless responses but using less is alright but the responses may be random. \n## Evaluation \nBelow is a comparison of Aeona vs. other baselines on the mixed dataset given above using automatic evaluation metrics.\n\n| Model | Perplexity |\n|"} {"downloads": 174, "id": "r3dhummingbird/DialoGPT-medium-joshua", "likes": 15, "pipeline_tag": "conversational", "task": "conversational", "meta": {"thumbnail": "https://raw.githubusercontent.com/RuolinZheng08/twewy-discord-chatbot/main/gif-demo/icon.png", "tags": ["conversational"], "license": "mit"}, "description": "\n\n# DialoGPT Trained on the Speech of a Game Character\n\nThis is an instance of [microsoft/DialoGPT-medium](https://huggingface.co/microsoft/DialoGPT-medium) trained on a game character, Joshua from [The World Ends With You](https://en.wikipedia.org/wiki/The_World_Ends_with_You). The data comes from [a Kaggle game script dataset](https://www.kaggle.com/ruolinzheng/twewy-game-script).\n\nI built a Discord AI chatbot based on this model. [Check out my GitHub repo.](https://github.com/RuolinZheng08/twewy-discord-chatbot)\n\nChat with the model:\n\n```python\nfrom transformers import AutoTokenizer, AutoModelWithLMHead\n \ntokenizer = AutoTokenizer.from_pretrained(\"r3dhummingbird/DialoGPT-medium-joshua\")\n\nmodel = AutoModelWithLMHead.from_pretrained(\"r3dhummingbird/DialoGPT-medium-joshua\")\n\n# Let's chat for 4 lines\nfor step in range(4):\n # encode the new user input, add the eos_token and return a tensor in Pytorch\n new_user_input_ids = tokenizer.encode(input(\">> User:\") + tokenizer.eos_token, return_tensors='pt')\n # print(new_user_input_ids)\n\n # append the new user input tokens to the chat history\n bot_input_ids = torch.cat([chat_history_ids, new_user_input_ids], dim=-1) if step > 0 else new_user_input_ids\n\n # generated a response while limiting the total chat history to 1000 tokens, \n chat_history_ids = model.generate(\n bot_input_ids, max_length=200,\n pad_token_id=tokenizer.eos_token_id, \n no_repeat_ngram_size=3, \n do_sample=True, \n top_k=100, \n top_p=0.7,\n temperature=0.8\n )\n \n # pretty print last ouput tokens from bot\n print(\"JoshuaBot: {}\".format(tokenizer.decode(chat_history_ids[:, bot_input_ids.shape[-1]:][0], skip_special_tokens=True)))\n```"} {"downloads": 0, "id": "zl111/ChatDoctor", "likes": 15, "pipeline_tag": "conversational", "task": "conversational", "meta": {}, "description": "Access to model zl111/ChatDoctor is restricted and you are not in the authorized list. Visit https://huggingface.co/zl111/ChatDoctor to ask for access."} {"downloads": 459, "id": "Kirili4ik/ruDialoGpt3-medium-finetuned-telegram", "likes": 13, "pipeline_tag": "conversational", "task": "conversational", "meta": {"language": ["ru", "ru-RU"], "tags": ["conversational"]}, "description": "\n### \ud83d\udcdd Description\n\nDialoGPT trained on Russian language and fine tuned on my telegram chat.\n\n\nThis model was created by [sberbank-ai](https://hf.co/sberbank-ai) and trained on Russian forums (see [Grossmend's model](https://hf.co/Grossmend/rudialogpt3_medium_based_on_gpt2)). You can find info about how it has been trained on [habr](https://habr.com/ru/company/icl_services/blog/548244/) (in Russian). I have created a **simple pipeline** and **fine tuned** that model on my own **exported telegram chat** (~30mb json). It is in fact very easy to get the data from telegram and fine tune a model. Therefore, I made a **colab tutorial** for it: https://colab.research.google.com/drive/1fnAVURjyZRK9VQg1Co_-SKUQnRES8l9R?usp=sharing\n\n\u26a0\ufe0f Due to specifics of the data Hosted inference API may not work properly \u26a0\ufe0f\n\n\ud83e\udd17To try it use my [Spaces demo](https://huggingface.co/spaces/Kirili4ik/chat-with-Kirill)\ud83e\udd17\n\n\n### \u2753 How to use with code\n\n```python\n\n# Download model and tokenizer\ncheckpoint = \"Kirili4ik/ruDialoGpt3-medium-finetuned-telegram\" \ntokenizer = AutoTokenizer.from_pretrained(checkpoint)\nmodel = AutoModelForCausalLM.from_pretrained(checkpoint)\nmodel.eval()\n\n\n# util function to get expected len after tokenizing\ndef get_length_param(text: str, tokenizer) -> str:\n tokens_count = len(tokenizer.encode(text))\n if tokens_count <= 15:\n len_param = '1'\n elif tokens_count <= 50:\n len_param = '2'\n elif tokens_count <= 256:\n len_param = '3'\n else:\n len_param = '-'\n return len_param\n\n\n# util function to get next person number (1/0) for Machine or Human in the dialogue\ndef get_user_param(text: dict, machine_name_in_chat: str) -> str:\n if text['from'] == machine_name_in_chat:\n return '1' # machine\n else:\n return '0' # human\n\n\nchat_history_ids = torch.zeros((1, 0), dtype=torch.int)\n\nwhile True:\n \n next_who = input(\"Who's phrase?\\t\") #input(\"H / G?\") # Human or GPT\n\n # In case Human\n if next_who == \"H\" or next_who == \"Human\":\n input_user = input(\"===> Human: \")\n \n # encode the new user input, add parameters and return a tensor in Pytorch\n new_user_input_ids = tokenizer.encode(f\"|0|{get_length_param(input_user, tokenizer)}|\" \\\n + input_user + tokenizer.eos_token, return_tensors=\"pt\")\n # append the new user input tokens to the chat history\n chat_history_ids = torch.cat([chat_history_ids, new_user_input_ids], dim=-1)\n\n if next_who == \"G\" or next_who == \"GPT\":\n\n next_len = input(\"Phrase len? 1/2/3/-\\t\") #input(\"Exp. len?(-/1/2/3): \")\n # encode the new user input, add parameters and return a tensor in Pytorch\n new_user_input_ids = tokenizer.encode(f\"|1|{next_len}|\", return_tensors=\"pt\")\n # append the new user input tokens to the chat history\n chat_history_ids = torch.cat([chat_history_ids, new_user_input_ids], dim=-1)\n \n # print(tokenizer.decode(chat_history_ids[-1])) # uncomment to see full gpt input\n \n # save previous len\n input_len = chat_history_ids.shape[-1]\n # generated a response; PS you can read about the parameters at hf.co/blog/how-to-generate\n chat_history_ids = model.generate(\n chat_history_ids,\n num_return_sequences=1, # use for more variants, but have to print [i]\n max_length=512,\n no_repeat_ngram_size=3,\n do_sample=True,\n top_k=50,\n top_p=0.9,\n temperature = 0.6, # 0 for greedy\n mask_token_id=tokenizer.mask_token_id,\n eos_token_id=tokenizer.eos_token_id,\n unk_token_id=tokenizer.unk_token_id,\n pad_token_id=tokenizer.pad_token_id,\n device='cpu'\n )\n \n \n # pretty print last ouput tokens from bot\n print(f\"===> GPT-3: {tokenizer.decode(chat_history_ids[:, input_len:][0], skip_special_tokens=True)}\")\n```"} {"downloads": 1407, "id": "tinkoff-ai/ruDialoGPT-medium", "likes": 13, "pipeline_tag": "conversational", "task": "conversational", "meta": {"license": "mit", "widget": [{"text": "@@\u041f\u0415\u0420\u0412\u042b\u0419@@ \u043f\u0440\u0438\u0432\u0435\u0442 @@\u0412\u0422\u041e\u0420\u041e\u0419@@ \u043f\u0440\u0438\u0432\u0435\u0442 @@\u041f\u0415\u0420\u0412\u042b\u0419@@ \u043a\u0430\u043a \u0434\u0435\u043b\u0430? @@\u0412\u0422\u041e\u0420\u041e\u0419@@", "example_title": "how r u"}, {"text": "@@\u041f\u0415\u0420\u0412\u042b\u0419@@ \u0447\u0442\u043e \u0442\u044b \u0434\u0435\u043b\u0430\u043b \u043d\u0430 \u0432\u044b\u0445\u043e\u0434\u043d\u044b\u0445? @@\u0412\u0422\u041e\u0420\u041e\u0419@@", "example_title": "wyd"}], "language": ["ru"], "tags": ["conversational"]}, "description": "\n\nThis generation model is based on [sberbank-ai/rugpt3medium_based_on_gpt2](https://huggingface.co/sberbank-ai/rugpt3medium_based_on_gpt2). It's trained on large corpus of dialog data and can be used for buildning generative conversational agents\n\nThe model was trained with context size 3\n\n\nOn a private validation set we calculated metrics introduced in [this paper](https://arxiv.org/pdf/2001.09977.pdf): \n- Sensibleness: Crowdsourcers were asked whether model's response makes sense given the context\n- Specificity: Crowdsourcers were asked whether model's response is specific for given context, in other words we don't want our model to give general and boring responses\n- SSA which is the average of two metrics above (Sensibleness Specificity Average)\n\n| | sensibleness | specificity | SSA |\n|:"} {"downloads": 410, "id": "gorkemgoknar/gpt2chatbotenglish", "likes": 8, "pipeline_tag": "conversational", "task": "conversational", "meta": {"language": ["en"], "thumbnail": null, "tags": ["gpt2", "conversational"], "license": "cc-by-4.0", "widget": [{"text": "Hello there", "context": "Gandalf"}]}, "description": "\n# GPT2 Persona Chatbot based on Movie Characters\nModel used for https://www.metayazar.com/chatbot\n\nGPT2 Small Trained on movie scripts (especially Sci-fi) \n\nUsual HF api will not work see HF Spaces for demo usage https://huggingface.co/spaces/gorkemgoknar/moviechatbot\n\n\nThis work is based on Persona Chatbot originally done by Hugging Face team (https://medium.com/huggingface/how-to-build-a-state-of-the-art-conversational-ai-with-transfer-learning-2d818ac26313)\n\nFor cleaning movie scripts I also provide cleaner code\nhttps://github.com/gorkemgoknar/moviescriptcleaner\n\nExample persona how to:\nhttps://gist.github.com/gorkemgoknar/ae29bf9d14fa814e6a64d0e57a4a4ed7\n\nFor obvious reasons I cannot share raw personafile but you can check above gist for example how to create it.\n\nA working \"full\" demo can be seen in https://www.metayazar.com/chatbot\n\nFor Turkish version (with limited training) https://www.metayazar.com/chatbot_tr\n\nDue to double LM head standart hugging face interface will not work. But if you follow huggingface tutorial should be same.\nExcept each persona is encoded as \"My name is XXXX\"\n\nUse model, tokenizer and parameters within a class and call in below functions to trigger model.\nSome of the available personas:\n\n| Macleod | Moran | Brenda | Ramirez | Peter Parker | Quentin Beck | Andy \n| Red | Norton | Willard | Chief | Chef | Kilgore | Kurtz | Westley | Buttercup \n| Vizzini | Fezzik | Inigo | Man In Black | Taylor | Zira | Zaius | Cornelius \n| Bud | Lindsey | Hippy | Erin | Ed | George | Donna | Trinity | Agent Smith \n| Morpheus | Neo | Tank | Meryl | Truman | Marlon | Christof | Stromboli | Bumstead \n| Schreber | Walker | Korben | Cornelius | Loc Rhod | Anakin | Obi-Wan | Palpatine \n| Padme | Superman | Luthor | Dude | Walter | Donny | Maude | General | Starkiller \n| Indiana | Willie | Short Round | John | Sarah | Terminator | Miller | Sarge | Reiben \n| Jackson | Upham | Chuckie | Will | Lambeau | Sean | Skylar | Saavik | Spock \n| Kirk | Bones | Khan | Kirk | Spock | Sybok | Scotty | Bourne | Pamela | Abbott \n\n\n```python\n def get_answer(self, input_text, personality, history, params=None):\n \n ##Check length of history (to save 1 computation!)\n if len(history)>0:\n #mostly it will be empty list so need a length check for performance\n #would do string check also but just assume it is list of list of strings, as not public\n \n new_hist = [] \n for ele in history:\n new_hist.append( self.tokenizer.encode(ele) )\n history = new_hist.copy()\n\n history.append(self.tokenizer.encode(input_text))\n\n with torch.no_grad():\n out_ids = self.sample_sequence(personality, history, self.tokenizer, self.model, params=params)\n history.append(out_ids)\n history = history[-(2*self.parameters['max_history']+1):]\n out_text = self.tokenizer.decode(out_ids, skip_special_tokens=True)\n #print(out_text)\n\n\n history_decoded = []\n for ele in history:\n history_decoded.append(self.tokenizer.decode(ele))\n\n return out_text, history_decoded, self.parameters\n\n```"} {"downloads": 568, "id": "thu-coai/CDial-GPT_LCCC-large", "likes": 7, "pipeline_tag": "conversational", "task": "conversational", "meta": {"tags": ["conversational"], "license": "mit", "datasets": ["silver/lccc"]}, "description": "\n\n## Chinese pre-trained dialogue model (CDial-GPT)\n\nThis project provides a large-scale Chinese GPT model pre-trained on the dataset [LCCC](https://huggingface.co/datasets/silver/lccc).\n\nWe present a series of Chinese GPT model that are first pre-trained on a Chinese novel dataset and then post-trained on our LCCC dataset.\n\nSimilar to [TransferTransfo](https://arxiv.org/abs/1901.08149), we concatenate all dialogue histories into one context sentence, and use this sentence to predict the response. The input of our model consists of word embedding, speaker embedding, and positional embedding of each word.\n\nPaper: [A Large-Scale Chinese Short-Text Conversation Dataset](https://arxiv.org/pdf/2008.03946.pdf)\n\n### How to use\n\n```python\nfrom transformers import OpenAIGPTLMHeadModel, GPT2LMHeadModel, BertTokenizer\nimport torch\ntokenizer = BertTokenizer.from_pretrained(\"thu-coai/CDial-GPT_LCCC-large\")\nmodel = OpenAIGPTLMHeadModel.from_pretrained(\"thu-coai/CDial-GPT_LCCC-large\")\n```\n\nFor more details, please refer to our [repo.](https://github.com/thu-coai/CDial-GPT) on github."} {"downloads": 23, "id": "hyunwoongko/reddit-3B", "likes": 7, "pipeline_tag": "conversational", "task": "conversational", "meta": {"language": ["en"], "thumbnail": null, "tags": ["convAI", "conversational", "facebook"], "license": "apache-2.0", "datasets": ["blended_skill_talk"], "metrics": ["perplexity"]}, "description": "\n\n## Model description\n\n+ Paper: [Recipes for building an open-domain chatbot](https://arxiv.org/abs/1907.06616)\n+ [Original PARLAI Code](https://parl.ai/projects/recipes/)\n\n\n### Abstract\n\n\nBuilding open-domain chatbots is a challenging area for machine learning research. While prior work has shown that scaling neural models in the number of parameters and the size of the data they are trained on gives improved results, we show that other ingredients are important for a high-performing chatbot. Good conversation requires a number of skills that an expert conversationalist blends in a seamless way: providing engaging talking points and listening to their partners, both asking and answering questions, and displaying knowledge, empathy and personality appropriately, depending on the situation. We show that large scale models can learn these skills when given appropriate training data and choice of generation strategy. We build variants of these recipes with 90M, 2.7B and 9.4B parameter neural models, and make our models and code publicly available. Human evaluations show our best models are superior to existing approaches in multi-turn dialogue in terms of engagingness and humanness measurements. We then discuss the limitations of this work by analyzing failure cases of our models.\n\n"} {"downloads": 1, "id": "PaddlePaddle/plato-mini", "likes": 6, "pipeline_tag": "conversational", "task": "conversational", "meta": {"license": "apache-2.0", "language": ["zh"], "library_name": "paddlenlp", "tags": ["conversational"]}, "description": "\n\n[![paddlenlp-banner](https://user-images.githubusercontent.com/1371212/175816733-8ec25eb0-9af3-4380-9218-27c154518258.png)](https://github.com/PaddlePaddle/PaddleNLP)\n\n# PaddlePaddle/plato-mini\n\n## Introduction\n\nPre-training models have been proved effective for a wide range of natural language processing tasks. \nInspired by this, we propose a novel dialogue generation pre-training framework to support various kinds of conversations, \nincluding chit-chat, knowledge grounded dialogues, and conversational question answering. In this framework, we adopt flexible \nattention mechanisms to fully leverage the bi-directional context and the uni-directional characteristic of language generation. \nWe also introduce discrete latent variables to tackle the inherent one-to-many mapping problem in response generation. \nTwo reciprocal tasks of response generation and latent act recognition are designed and carried out simultaneously within a shared network. \nComprehensive experiments on three publicly available datasets verify the effectiveness and superiority of the proposed framework.\n\nMore detail: https://arxiv.org/abs/1910.07931\n\n## Available Models\n\n- **plato-mini**, *6 layer, 12 heads, 768 hidden size*\n\n## How to Use?\n\nClick on the *Use in paddlenlp* button on the top right!\n\n## Citation Info\n\n```text\n@article{ernie2.0,\n title = {PLATO: Pre-trained Dialogue Generation Model with Discrete Latent Variable},\n author = {Bao, Siqi and He, Huang and Wang, Fan and Wu, Hua and Wang, Haifeng},\n journal={arXiv preprint arXiv:1910.07931},\n year = {2019},\n}\n```\n\n\n"} {"downloads": 207, "id": "byeongal/Ko-DialoGPT", "likes": 5, "pipeline_tag": "conversational", "task": "conversational", "meta": {"language": "ko", "tags": ["gpt2", "conversational"], "license": "cc-by-nc-sa-4.0"}, "description": "\n## Ko-DialoGPT\n\n\n### How to use\n```python\nfrom transformers import PreTrainedTokenizerFast, GPT2LMHeadModel\nimport torch\n\n\ndevice = 'cuda' if torch.cuda.is_available() else 'cpu'\n\ntokenizer = PreTrainedTokenizerFast.from_pretrained('byeongal/Ko-DialoGPT')\nmodel = GPT2LMHeadModel.from_pretrained('byeongal/Ko-DialoGPT').to(device)\n\npast_user_inputs = []\ngenerated_responses = []\n\nwhile True:\n user_input = input(\">> User:\")\n if user_input == 'bye':\n break\n text_idx = tokenizer.encode(user_input + tokenizer.eos_token, return_tensors='pt')\n for i in range(len(generated_responses)-1, len(generated_responses)-3, -1):\n if i < 0:\n break\n encoded_vector = tokenizer.encode(generated_responses[i] + tokenizer.eos_token, return_tensors='pt')\n if text_idx.shape[-1] + encoded_vector.shape[-1] < 1000:\n text_idx = torch.cat([encoded_vector, text_idx], dim=-1)\n else:\n break\n encoded_vector = tokenizer.encode(past_user_inputs[i] + tokenizer.eos_token, return_tensors='pt')\n if text_idx.shape[-1] + encoded_vector.shape[-1] < 1000:\n text_idx = torch.cat([encoded_vector, text_idx], dim=-1)\n else:\n break\n text_idx = text_idx.to(device)\n inference_output = model.generate(\n text_idx,\n max_length=1000,\n num_beams=5,\n top_k=20,\n no_repeat_ngram_size=4,\n length_penalty=0.65,\n repetition_penalty=2.0,\n )\n inference_output = inference_output.tolist()\n bot_response = tokenizer.decode(inference_output[0][text_idx.shape[-1]:], skip_special_tokens=True)\n print(f\"Bot: {bot_response}\")\n past_user_inputs.append(user_input)\n generated_responses.append(bot_response)\n```\n\n### Reference\n* [SKT-KoGPT2](https://huggingface.co/skt/kogpt2-base-v2)\n* [KETI R&D \ub370\uc774\ud130](https://aihub.or.kr/opendata/keti-data/recognition-laguage/KETI-02-008)\n* [\ud55c\uad6d\uc5b4 \ub300\ud654 \uc694\uc57d](https://aihub.or.kr/aidata/30714)\n"} {"downloads": 2429, "id": "BlackSamorez/rudialogpt3_medium_based_on_gpt2_2ch", "likes": 5, "pipeline_tag": "conversational", "task": "conversational", "meta": {"language": ["ru"], "tags": ["conversational"], "datasets": "BlackSamorez/2ch_b_dialogues"}, "description": "\n\nDialoGPT on Russian language\n\n\nBased on [Grossmend/rudialogpt3_medium_based_on_gpt2](https://huggingface.co/Grossmend/rudialogpt3_medium_based_on_gpt2)\n\nFine tuned on [2ch /b/ dialogues](https://huggingface.co/datasets/BlackSamorez/2ch_b_dialogues) data. To improve performance replies were filtered by obscenity.\n\nUsed in [Ebanko](https://t.me/toxic_ebanko_bot) **Telegram bot**.\n\nYou can find code for deployment on [my github](https://github.com/BlackSamorez/ebanko).\n\n"} {"downloads": 486, "id": "abhiramtirumala/DialoGPT-sarcastic", "likes": 5, "pipeline_tag": "conversational", "task": "conversational", "meta": {"pipeline_tag": "conversational"}, "description": "\nThis model is a fine-tuned version of Microsoft/DialoGPT-medium trained to created sarcastic responses from the dataset \"Sarcasm on Reddit\" located [here](https://www.kaggle.com/danofer/sarcasm)."} {"downloads": 90469, "id": "waifu-workshop/pygmalion-6b", "likes": 4, "pipeline_tag": "conversational", "task": "conversational", "meta": {"license": "creativeml-openrail-m", "language": ["en"], "thumbnail": null, "tags": ["text generation", "conversational", "reupload"], "inference": false, "duplicated_from": "PygmalionAI/pygmalion-6b"}, "description": "\n\n# Pygmalion 6B\n\nThis is a reupload of the [original model](https://huggingface.co/PygmalionAI/pygmalion-6b). Sharded variants are available in separate branches.\n\nAll credit goes to the [PygmalionAI team](https://huggingface.co/PygmalionAI).\n\n"} {"downloads": 86100, "id": "bigscience/bloom", "likes": 2988, "pipeline_tag": "text-generation", "task": "text-generation", "meta": {"license": "bigscience-bloom-rail-1.0", "language": ["ak", "ar", "as", "bm", "bn", "ca", "code", "en", "es", "eu", "fon", "fr", "gu", "hi", "id", "ig", "ki", "kn", "lg", "ln", "ml", "mr", "ne", "nso", "ny", "or", "pa", "pt", "rn", "rw", "sn", "st", "sw", "ta", "te", "tn", "ts", "tum", "tw", "ur", "vi", "wo", "xh", "yo", "zh", "zu"], "programming_language": ["C", "C++", "C#", "Go", "Java", "JavaScript", "Lua", "PHP", "Python", "Ruby", "Rust", "Scala", "TypeScript"], "pipeline_tag": "text-generation", "widget": [{"text": "A \"whatpu\" is a small, furry animal native to Tanzania. An example of a sentence that uses the word whatpu is: We were traveling in Africa and we saw these very cute whatpus. | To do a \"farduddle\" means to jump up and down really fast. An example of a sentence that uses the word farduddle is:", "example_title": "Imaginary word", "group": "English"}, {"text": "Un \"whatpu\" est un petit animal \u00e0 fourrure originaire de Tanzanie. Un exemple de phrase qui utilise le mot whatpu est: Nous \u00e9tions en Afrique et nous avons vu des whatpus trop mignons. Faire un \"farduddle\" veut dire sauter sur place vraiment vite. Un exemple de phrase qui utilise le mot farduddle est:", "example_title": "Imaginary word", "group": "French"}, {"text": "Un \"whatpu\" es un peque\u00f1o animal peludo nativo de Tanzania. Un ejemplo de una oraci\u00f3n que usa la palabra whatpu es: Est\u00e1bamos viajando por \u00c1frica y vimos estos whatpus muy bonitos. Hacer un \"farduddle\" significa saltar arriba y abajo muy r\u00e1pido. Un ejemplo de una oraci\u00f3n que usa la palabra farduddle es:", "example_title": "Imaginary word", "group": "Spanish"}, {"text": " \u0627\u0644\"\u0648\u0627\u062a\u0628\u0648\" \u0647\u0648 \u062d\u064a\u0648\u0627\u0646 \u0635\u063a\u064a\u0631 \u0645\u0643\u0633\u0648 \u0628\u0627\u0644\u0641\u0631\u0627\u0621 \u064a\u0639\u064a\u0634 \u0641\u064a \u062a\u0646\u0632\u0627\u0646\u064a\u0627. \u0645\u062b\u0627\u0644 \u0639\u0644\u0649 \u062c\u0645\u0644\u0629 \u062a\u0633\u062a\u062e\u062f\u0645 \u0643\u0644\u0645\u0629 \u0648\u0627\u062a\u0628\u0648 \u0647\u064a: \u0643\u0646\u0627 \u0646\u0633\u0627\u0641\u0631 \u0641\u064a \u0627\u0641\u0631\u064a\u0642\u064a\u0627 \u0648 \u0631\u0623\u064a\u0646\u0627 \u0647\u0624\u0644\u0627\u0621 \u0627\u0644\u0648\u0627\u062a\u0628\u0648 \u0627\u0644\u0644\u0637\u0641\u0627\u0621. \u0644\u0644\u0642\u064a\u0627\u0645 \u0628\"\u0641\u0627\u0631\u062f\u0627\u062f\u0644\" \u064a\u0639\u0646\u064a \u0627\u0646 \u062a\u0642\u0641\u0632 \u0644\u0644\u0623\u0639\u0644\u0649 \u0648 \u0627\u0644\u0623\u0633\u0641\u0644 \u0628\u0633\u0631\u0639\u0629 \u0643\u0628\u064a\u0631\u0629. \u0645\u062b\u0627\u0644 \u0639\u0644\u0649 \u062c\u0645\u0644\u0629 \u062a\u0633\u062a\u062e\u062f\u0645 \u0643\u0644\u0645\u0629 \u0641\u0627\u0631\u062f\u0627\u062f\u0644 \u0647\u064a:", "example_title": "Imaginary word", "group": "Arabic"}, {"text": "Um \"whatpu\" \u00e9 um pequeno animal peludo nativo da Tanz\u00e2nia. Um exemplo de uma frase que usa a palavra whatpu \u00e9: Est\u00e1vamos a viajar por \u00c1frica e vimos uns whatpus muito queridos. Fazer um \"farduddle\" significa saltar para cima e para baixo muito r\u00e1pido. Um exemplo de uma frase que usa a palavra farduddle \u00e9:", "example": "Imaginary word", "group": "Portuguese"}, {"text": "Pour d\u00e9guster un ortolan, il faut tout d'abord", "example_title": "Recipe", "group": "French"}, {"text": "34+10=44 \n54+20=", "example_title": "Addition", "group": "Math"}, {"text": "This tool converts irregular verbs to past tense.\nArise - Arose\nBecome - Became\nForget - Forgot\nFreeze -", "example_title": "Irregular verbs", "group": "English"}, {"text": "Please unscramble the letters into a word, and write that word:\nr e!c.i p r o.c a/l = reciprocal\nd.o m i!n a n.t =", "example_title": "Word unscrambling", "group": "English"}, {"text": "Estos ejemplos quitan vocales de las palabras\nEjemplos:\nhola - hl\nmanzana - mnzn\npapas - pps\nalacran - lcrn\npapa -", "example_title": "Vowel removal", "group": "Spanish"}, {"text": "Traduce espa\u00f1ol de Espa\u00f1a a espa\u00f1ol de Argentina\nEl coche es rojo - el auto es rojo\nEl ordenador es nuevo - la computadora es nueva\nel boligrafo es negro - lapicera es negra\nla nevera", "example_title": "Spanish to Argentinian Spanish", "group": "Spanish"}, {"text": "To say \"I love you\" in Hindi, you would say", "example_title": "Translation to Hindi", "group": "English"}, {"text": "To say \"I love you\" in Hindi, you would say", "example_title": "Translation from English", "group": "Hindi"}, {"text": "Poor English: She no went to the market. Corrected English:", "example_title": "Grammar exercise 1", "group": "English"}, {"text": "\u0627\u0633\u062a\u062e\u0631\u0627\u062c \u0627\u0644\u0639\u062f\u062f \u0627\u0644\u0639\u0627\u0645\u0644\u064a \u0641\u064a \u0644\u063a\u0629 \u0628\u0627\u064a\u062b\u0648\u0646:", "example_title": "Code generation", "group": "Arabic"}, {"text": "Regexp. Here is a regular expression to match a word starting with a number and then having only vowels:", "example_title": "Regular expressions", "group": "English"}, {"text": "Do a hello world in different languages:\nPython: print(\"hello world\")\nR:", "example_title": "Code generation", "group": "English"}, {"text": "Which is the correct preposition? I'm born X July. X is the preposition in\nHe sat X a chair. X is the preposition on\nShe drove X the bridge. X is the preposition", "example_title": "Grammar exercise 2", "group": "English"}, {"text": "Traduction en fran\u00e7ais: Dans cet essai je vais m'interroger sur la conscience des mod\u00e8les d'intelligence artificielle r\u00e9cents comme les mod\u00e8les de langue. Pour commencer, je m'int\u00e9resserai \u00e0 la notion de conscience et \u00e0 ce qui la caract\u00e9rise. Ensuite, j'aborderai la question de l'intelligence et de son lien avec le langage. Enfin, dans une derni\u00e8re partie je me pencherai sur le cas de l'IA et sur sa conscience.\nTraduction en espagnol:", "example_title": "Translation to Span