File size: 1,896 Bytes
0b6d276
dcc4a67
 
38fec8e
38e67fd
38fec8e
 
0b6d276
5aebaf1
 
 
40d5ee4
 
38fec8e
40d5ee4
 
 
5aebaf1
 
38fec8e
 
 
 
5aebaf1
 
38fec8e
 
 
 
 
 
 
 
 
 
 
 
 
5aebaf1
38fec8e
 
 
5aebaf1
 
 
 
38fec8e
5aebaf1
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
---
language:
- en
license: apache-2.0
widget:
- text: The nodes of a computer network may include [MASK].
library_name: transformers
---

# NetBERT 📶

<img align="left" src="illustration.jpg" width="150"/>
<br><br><br>

&nbsp;&nbsp;&nbsp;NetBERT is a [BERT-base](https://huggingface.co/bert-base-cased) model further pre-trained on a huge corpus of computer networking text (~23Gb).

<br><br>

## Usage

You can use the raw model for masked language modeling (MLM), but it's mostly intended to be fine-tuned on a downstream task, especially one that uses the whole sentence to make decisions such as text classification, extractive question answering, or semantic search.

You can use this model directly with a pipeline for [masked language modeling](https://huggingface.co/tasks/fill-mask):

```python
from transformers import pipeline

unmasker = pipeline('fill-mask', model='antoinelouis/netbert')
unmasker("The nodes of a computer network may include [MASK].")
```

You can also use this model to [extract the features](https://huggingface.co/tasks/feature-extraction) of a given text:

```python
from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained('antoinelouis/netbert')
model = AutoModel.from_pretrained('antoinelouis/netbert')

text = "Replace me by any text you'd like."
encoded_input = tokenizer(text, return_tensors='pt')
output = model(**encoded_input)
```

## Documentation

Detailed documentation on the pre-trained model, its implementation, and the data can be found on [Github](https://github.com/antoiloui/netbert/blob/master/docs/index.md).

## Citation

For attribution in academic contexts, please cite this work as:

```
@mastersthesis{louis2020netbert,
    title={NetBERT: A Pre-trained Language Representation Model for Computer Networking},
    author={Louis, Antoine},
    year={2020},
    school={University of Liege}
}
```