system HF staff commited on
Commit
567aa82
1 Parent(s): 0b5aa12

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -88
README.md CHANGED
@@ -1,4 +1,3 @@
1
-
2
  ---
3
  language:
4
  - en
@@ -14,97 +13,14 @@ metrics:
14
  - perplexity
15
  ---
16
 
17
- # Blenderbot-3B
18
-
19
  ## Model description
20
 
 
 
21
 
22
- + [Paper](https://arxiv.org/abs/1907.06616).
23
- + [Original PARLAI Code]
24
-
25
- The abbreviation FSMT stands for FairSeqMachineTranslation
26
-
27
- All four models are available:
28
-
29
- * [wmt19-en-ru](https://huggingface.co/facebook/wmt19-en-ru)
30
- * [wmt19-ru-en](https://huggingface.co/facebook/wmt19-ru-en)
31
- * [wmt19-en-de](https://huggingface.co/facebook/wmt19-en-de)
32
- * [wmt19-de-en](https://huggingface.co/facebook/wmt19-de-en)
33
-
34
- ## Intended uses & limitations
35
-
36
- #### How to use
37
-
38
- ```python
39
- from transformers.tokenization_fsmt import FSMTTokenizer
40
- from transformers.modeling_fsmt import FSMTForConditionalGeneration
41
- mname = "facebook/wmt19-en-ru"
42
- tokenizer = FSMTTokenizer.from_pretrained(mname)
43
- model = FSMTForConditionalGeneration.from_pretrained(mname)
44
-
45
- input = "Machine learning is great, isn't it?"
46
- input_ids = tokenizer.encode(input, return_tensors="pt")
47
- outputs = model.generate(input_ids)
48
- decoded = tokenizer.decode(outputs[0], skip_special_tokens=True)
49
- print(decoded) # Машинное обучение - это здорово, не так ли?
50
-
51
- ```
52
-
53
- #### Limitations and bias
54
-
55
- - The original (and this ported model) doesn't seem to handle well inputs with repeated sub-phrases, [content gets truncated](https://discuss.huggingface.co/t/issues-with-translating-inputs-containing-repeated-phrases/981)
56
-
57
- ## Training data
58
-
59
- Pretrained weights were left identical to the original model released by fairseq. For more details, please, see the [paper](https://arxiv.org/abs/1907.06616).
60
-
61
- ## Eval results
62
-
63
- pair | fairseq | transformers
64
- -------|---------|----------
65
- en-ru | [36.4](http://matrix.statmt.org/matrix/output/1914?run_id=6724) | 33.47
66
-
67
- The score is slightly below the score reported by `fairseq`, since `transformers`` currently doesn't support:
68
- - model ensemble, therefore the best performing checkpoint was ported (``model4.pt``).
69
- - re-ranking
70
-
71
- The score was calculated using this code:
72
-
73
- ```bash
74
- git clone https://github.com/huggingface/transformers
75
- cd transformers
76
- export PAIR=en-ru
77
- export DATA_DIR=data/$PAIR
78
- export SAVE_DIR=data/$PAIR
79
- export BS=8
80
- export NUM_BEAMS=15
81
- mkdir -p $DATA_DIR
82
- sacrebleu -t wmt19 -l $PAIR --echo src > $DATA_DIR/val.source
83
- sacrebleu -t wmt19 -l $PAIR --echo ref > $DATA_DIR/val.target
84
- echo $PAIR
85
- PYTHONPATH="src:examples/seq2seq" python examples/seq2seq/run_eval.py facebook/wmt19-$PAIR $DATA_DIR/val.source $SAVE_DIR/test_translations.txt --reference_path $DATA_DIR/val.target --score_path $SAVE_DIR/test_bleu.json --bs $BS --task translation --num_beams $NUM_BEAMS
86
- ```
87
- note: fairseq reports using a beam of 50, so you should get a slightly higher score if re-run with `--num_beams 50`.
88
-
89
- ## Data Sources
90
-
91
- - [training, etc.](http://www.statmt.org/wmt19/)
92
- - [test set](http://matrix.statmt.org/test_sets/newstest2019.tgz?1556572561)
93
-
94
-
95
- ### BibTeX entry and citation info
96
-
97
- ```bibtex
98
- @inproceedings{...,
99
- year={2020},
100
- title={Facebook FAIR's WMT19 News Translation Task Submission},
101
- author={Ng, Nathan and Yee, Kyra and Baevski, Alexei and Ott, Myle and Auli, Michael and Edunov, Sergey},
102
- booktitle={Proc. of WMT},
103
- }
104
- ```
105
 
 
106
 
107
- ## TODO
108
 
109
- - port model ensemble (fairseq uses 4 model checkpoints)
110
 
 
 
1
  ---
2
  language:
3
  - en
 
13
  - perplexity
14
  ---
15
 
 
 
16
  ## Model description
17
 
18
+ + Paper: [Recipes for building an open-domain chatbot](https://arxiv.org/abs/1907.06616)
19
+ + [Original PARLAI Code](https://parl.ai/projects/recipes/)
20
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
21
 
22
+ ### Abstract
23
 
 
24
 
25
+ Building open-domain chatbots is a challenging area for machine learning research. While prior work has shown that scaling neural models in the number of parameters and the size of the data they are trained on gives improved results, we show that other ingredients are important for a high-performing chatbot. Good conversation requires a number of skills that an expert conversationalist blends in a seamless way: providing engaging talking points and listening to their partners, both asking and answering questions, and displaying knowledge, empathy and personality appropriately, depending on the situation. We show that large scale models can learn these skills when given appropriate training data and choice of generation strategy. We build variants of these recipes with 90M, 2.7B and 9.4B parameter neural models, and make our models and code publicly available. Human evaluations show our best models are superior to existing approaches in multi-turn dialogue in terms of engagingness and humanness measurements. We then discuss the limitations of this work by analyzing failure cases of our models.
26