royleibov commited on
Commit
fd295cb
1 Parent(s): 62be76c

Add zipnn text

Browse files
Files changed (1) hide show
  1. README.md +71 -7
README.md CHANGED
@@ -6,8 +6,56 @@ license: mit
6
  datasets:
7
  - bookcorpus
8
  - wikipedia
 
 
9
  ---
10
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
  # RoBERTa base model
12
 
13
  Pretrained model on English language using a masked language modeling (MLM) objective. It was introduced in
@@ -50,7 +98,11 @@ You can use this model directly with a pipeline for masked language modeling:
50
 
51
  ```python
52
  >>> from transformers import pipeline
53
- >>> unmasker = pipeline('fill-mask', model='roberta-base')
 
 
 
 
54
  >>> unmasker("Hello I'm a <mask> model.")
55
 
56
  [{'sequence': "<s>Hello I'm a male model.</s>",
@@ -79,8 +131,12 @@ Here is how to use this model to get the features of a given text in PyTorch:
79
 
80
  ```python
81
  from transformers import RobertaTokenizer, RobertaModel
82
- tokenizer = RobertaTokenizer.from_pretrained('roberta-base')
83
- model = RobertaModel.from_pretrained('roberta-base')
 
 
 
 
84
  text = "Replace me by any text you'd like."
85
  encoded_input = tokenizer(text, return_tensors='pt')
86
  output = model(**encoded_input)
@@ -90,8 +146,12 @@ and in TensorFlow:
90
 
91
  ```python
92
  from transformers import RobertaTokenizer, TFRobertaModel
93
- tokenizer = RobertaTokenizer.from_pretrained('roberta-base')
94
- model = TFRobertaModel.from_pretrained('roberta-base')
 
 
 
 
95
  text = "Replace me by any text you'd like."
96
  encoded_input = tokenizer(text, return_tensors='tf')
97
  output = model(encoded_input)
@@ -104,7 +164,11 @@ neutral. Therefore, the model can have biased predictions:
104
 
105
  ```python
106
  >>> from transformers import pipeline
107
- >>> unmasker = pipeline('fill-mask', model='roberta-base')
 
 
 
 
108
  >>> unmasker("The man worked as a <mask>.")
109
 
110
  [{'sequence': '<s>The man worked as a mechanic.</s>',
@@ -231,4 +295,4 @@ Glue test results:
231
 
232
  <a href="https://huggingface.co/exbert/?model=roberta-base">
233
  <img width="300px" src="https://cdn-media.huggingface.co/exbert/button.png">
234
- </a>
 
6
  datasets:
7
  - bookcorpus
8
  - wikipedia
9
+ base_model:
10
+ - FacebookAI/roberta-base
11
  ---
12
 
13
+ # Disclaimer and Requirements
14
+
15
+ This model is a clone of [**FacebookAI/roberta-base**](https://huggingface.co/FacebookAI/roberta-base) compressed using ZipNN. Compressed losslessly to 54% its original size, ZipNN saved ~0.25GB in storage and potentially ~5PB in data transfer **monthly**.
16
+
17
+ ### Requirement
18
+
19
+ In order to use the model, ZipNN is necessary:
20
+ ```bash
21
+ pip install zipnn
22
+ ```
23
+ ### Use This Model
24
+ ```python
25
+ # Use a pipeline as a high-level helper
26
+ from transformers import pipeline
27
+ from zipnn import zipnn_hf
28
+
29
+ zipnn_hf()
30
+
31
+ pipe = pipeline("fill-mask", model="royleibov/roberta-base-ZipNN-Compressed")
32
+ ```
33
+ ```python
34
+ # Load model directly
35
+ import torch
36
+ from transformers import AutoTokenizer, AutoModelForMaskedLM
37
+ from zipnn import zipnn_hf
38
+
39
+ zipnn_hf()
40
+
41
+ tokenizer = AutoTokenizer.from_pretrained("royleibov/roberta-base-ZipNN-Compressed")
42
+ model = AutoModelForMaskedLM.from_pretrained("royleibov/roberta-base-ZipNN-Compressed")
43
+ ```
44
+ ### ZipNN
45
+ ZipNN also allows you to seemlessly save local disk space in your cache after the model is downloaded.
46
+
47
+ To compress the cached model, simply run:
48
+ ```bash
49
+ python zipnn_compress_path.py safetensors --model royleibov/roberta-base-ZipNN-Compressed --hf_cache
50
+ ```
51
+
52
+ The model will be decompressed automatically and safely as long as `zipnn_hf()` is added at the top of the file like in the [example above](#use-this-model).
53
+
54
+ To decompress manualy, simply run:
55
+ ```bash
56
+ python zipnn_decompress_path.py --model royleibov/roberta-base-ZipNN-Compressed --hf_cache
57
+ ```
58
+
59
  # RoBERTa base model
60
 
61
  Pretrained model on English language using a masked language modeling (MLM) objective. It was introduced in
 
98
 
99
  ```python
100
  >>> from transformers import pipeline
101
+ >>> from zipnn import zipnn_hf
102
+
103
+ >>> zipnn_hf()
104
+
105
+ >>> unmasker = pipeline('fill-mask', model='royleibov/roberta-base-ZipNN-Compressed')
106
  >>> unmasker("Hello I'm a <mask> model.")
107
 
108
  [{'sequence': "<s>Hello I'm a male model.</s>",
 
131
 
132
  ```python
133
  from transformers import RobertaTokenizer, RobertaModel
134
+ from zipnn import zipnn_hf
135
+
136
+ zipnn_hf()
137
+
138
+ tokenizer = RobertaTokenizer.from_pretrained('royleibov/roberta-base-ZipNN-Compressed')
139
+ model = RobertaModel.from_pretrained('royleibov/roberta-base-ZipNN-Compressed')
140
  text = "Replace me by any text you'd like."
141
  encoded_input = tokenizer(text, return_tensors='pt')
142
  output = model(**encoded_input)
 
146
 
147
  ```python
148
  from transformers import RobertaTokenizer, TFRobertaModel
149
+ from zipnn import zipnn_hf
150
+
151
+ zipnn_hf()
152
+
153
+ tokenizer = RobertaTokenizer.from_pretrained('royleibov/roberta-base-ZipNN-Compressed')
154
+ model = TFRobertaModel.from_pretrained('royleibov/roberta-base-ZipNN-Compressed')
155
  text = "Replace me by any text you'd like."
156
  encoded_input = tokenizer(text, return_tensors='tf')
157
  output = model(encoded_input)
 
164
 
165
  ```python
166
  >>> from transformers import pipeline
167
+ >>> from zipnn import zipnn_hf
168
+
169
+ >>> zipnn_hf()
170
+
171
+ >>> unmasker = pipeline('fill-mask', model='royleibov/roberta-base-ZipNN-Compressed')
172
  >>> unmasker("The man worked as a <mask>.")
173
 
174
  [{'sequence': '<s>The man worked as a mechanic.</s>',
 
295
 
296
  <a href="https://huggingface.co/exbert/?model=roberta-base">
297
  <img width="300px" src="https://cdn-media.huggingface.co/exbert/button.png">
298
+ </a>