julien-c HF staff commited on
Commit
329a94f
1 Parent(s): 8d16f50

Migrate model card from transformers-repo

Browse files

Read announcement at https://discuss.huggingface.co/t/announcement-all-model-cards-will-be-migrated-to-hf-co-model-repos/2755
Original file history: https://github.com/huggingface/transformers/commits/master/model_cards/google/roberta2roberta_L-24_wikisplit/README.md

Files changed (1) hide show
  1. README.md +36 -0
README.md ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ license: apache-2.0
4
+ ---
5
+
6
+ # Roberta2Roberta_L-24_wikisplit EncoderDecoder model
7
+
8
+ The model was introduced in
9
+ [this paper](https://arxiv.org/abs/1907.12461) by Sascha Rothe, Shashi Narayan, Aliaksei Severyn and first released in [this repository](https://tfhub.dev/google/bertseq2seq/roberta24_cnndm/1).
10
+
11
+ The model is an encoder-decoder model that was initialized on the `roberta-large` checkpoints for both the encoder
12
+ and decoder and fine-tuned on sentence splitting on the [WikiSplit](https://github.com/google-research-datasets/wiki-split) dataset.
13
+
14
+ Disclaimer: The model card has been written by the Hugging Face team.
15
+
16
+ ## How to use
17
+
18
+ You can use this model for sentence splitting, *e.g.*
19
+
20
+ **IMPORTANT**: The model was not trained on the `"` (double quotation mark) character -> so the before tokenizing the text,
21
+ it is advised to replace all `"` (double quotation marks) with two single `'` (single quotation mark).
22
+
23
+ ```python
24
+ from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
25
+
26
+ tokenizer = AutoTokenizer.from_pretrained("google/roberta2roberta_L-24_wikisplit")
27
+ model = AutoModelForSeq2SeqLM.from_pretrained("google/roberta2roberta_L-24_wikisplit")
28
+
29
+ long_sentence = """Due to the hurricane, Lobsterfest has been canceled, making Bob very happy about it and he decides to open Bob 's Burgers for customers who were planning on going to Lobsterfest."""
30
+
31
+ input_ids = tokenizer(tokenizer.bos_token + long_sentence + tokenizer.eos_token, return_tensors="pt").input_ids
32
+ output_ids = model.generate(input_ids)[0]
33
+ print(tokenizer.decode(output_ids, skip_special_tokens=True))
34
+ # should output
35
+ # Due to the hurricane, Lobsterfest has been canceled, making Bob very happy about it. He decides to open Bob's Burgers for customers who were planning on going to Lobsterfest.
36
+ ```