File size: 8,633 Bytes
d733de3
73222d0
1492782
73222d0
175672b
 
 
 
 
 
 
 
73222d0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
<!-- livebook:{"app_settings":{"access_type":"public","slug":"examples"}} -->

# examples

```elixir
Mix.install(
  [
    {:kino_bumblebee, "~> 0.2.1"},
    {:exla, "~> 0.5.1"}
  ],
  config: [nx: [default_backend: EXLA.Backend]]
)
```

## Introduction

In this notebook we go through a number of examples to get a quick overview of what Bumblebee brings to the table.

<!-- livebook:{"branch_parent_index":0} -->

## Image classification

Let's start with image classification. First, we load a pre-trained [ResNet-50](https://huggingface.co/microsoft/resnet-50) model from a HuggingFace repository. We also load the corresponding featurizer for preprocessing input images.

```elixir
{:ok, resnet} = Bumblebee.load_model({:hf, "microsoft/resnet-50"})
{:ok, featurizer} = Bumblebee.load_featurizer({:hf, "microsoft/resnet-50"})

:ok
```

Next, we use the high-level API to build an end-to-end task definition on top of the model we just loaded. We will also need an image to work with, so let's show an image input.

```elixir
serving = Bumblebee.Vision.image_classification(resnet, featurizer)

image_input = Kino.Input.image("Image", size: {224, 224})
```

Bumblebee implements end-to-end tasks using `Nx.Serving`. With serving we can choose to either do a one-off run, or to start a supervised process that automatically batches multiple inference requests. Thanks to this abstraction we can do quick experimentation and then plug the task into a production app with minimal effort.

In this case we will do the one-off run for the selected image:

```elixir
image = Kino.Input.read(image_input)

# Build a tensor from the raw pixel data
image =
  image.data
  |> Nx.from_binary(:u8)
  |> Nx.reshape({image.height, image.width, 3})

Nx.Serving.run(serving, image)
```

### Manual inference

<!-- livebook:{"break_markdown":true} -->

Note that we are dealing with regular `Axon` models and the high-level API is just a convenience. If you need full control over the inference flow, you can do it manually. In this case, we would pass the image through the featurizer to get normalized model inputs, then we would run the model and finally extract the most probable label.

```elixir
inputs = Bumblebee.apply_featurizer(featurizer, image)
outputs = Axon.predict(resnet.model, resnet.params, inputs)

id = outputs.logits |> Nx.argmax() |> Nx.to_number()
resnet.spec.id_to_label[id]
```

You can try a number of other models, just replace the repository id with one of these:

* [**facebook/convnext-tiny-224**](https://huggingface.co/facebook/convnext-tiny-224) (ConvNeXT)

* [**google/vit-base-patch16-224**](https://huggingface.co/google/vit-base-patch16-224) (ViT)

* [**facebook/deit-base-distilled-patch16-224**](https://huggingface.co/facebook/deit-base-distilled-patch16-224) (DeiT)

<!-- livebook:{"branch_parent_index":0} -->

## Fill-mask

Now time for some text processing. Specifically, we want to fill in the missing word in a sentence. This time we load the [BERT](https://huggingface.co/bert-base-uncased) model together with a matching tokenizer. We will use the tokenizer to preprocess our text input.

```elixir
{:ok, bert} = Bumblebee.load_model({:hf, "bert-base-uncased"})
{:ok, tokenizer} = Bumblebee.load_tokenizer({:hf, "bert-base-uncased"})

serving = Bumblebee.Text.fill_mask(bert, tokenizer)

text_input = Kino.Input.text("Sentence with mask", default: "The capital of [MASK] is Paris.")
```

```elixir
text = Kino.Input.read(text_input)

Nx.Serving.run(serving, text)
```

Again, you can try other models, such as [**albert-base-v2**](https://huggingface.co/albert-base-v2) or [**roberta-base**](https://huggingface.co/roberta-base).

<!-- livebook:{"branch_parent_index":0} -->

## Text classification

In this example we will analyze text sentiment.

We will use the [BERTweet](https://huggingface.co/finiteautomata/bertweet-base-sentiment-analysis) model, trained to classify text into one of three categories: positive (POS), negative (NEG) or neutral (NEU).

```elixir
{:ok, bertweet} = Bumblebee.load_model({:hf, "finiteautomata/bertweet-base-sentiment-analysis"})
{:ok, tokenizer} = Bumblebee.load_tokenizer({:hf, "vinai/bertweet-base"})

serving = Bumblebee.Text.text_classification(bertweet, tokenizer)

text_input = Kino.Input.text("Text", default: "This cat is so cute.")
```

*Note: this time we need to load a matching tokenizer from a different repository.*

```elixir
text = Kino.Input.read(text_input)
Nx.Serving.run(serving, text)
```

<!-- livebook:{"branch_parent_index":0} -->

## Named-entity recognition

In this section we look at token classification, more specifically named-entity recognition (NER), where the objective is to identify and categorize entities in text. We will once again use a fine-tuned [BERT](https://huggingface.co/dslim/bert-base-NER) model.

```elixir
{:ok, bert} = Bumblebee.load_model({:hf, "dslim/bert-base-NER"})
{:ok, tokenizer} = Bumblebee.load_tokenizer({:hf, "bert-base-cased"})

serving = Bumblebee.Text.token_classification(bert, tokenizer, aggregation: :same)

text_input =
  Kino.Input.text("Text",
    default: "Rachel Green works at Ralph Lauren in New York City in the sitcom Friends"
  )
```

```elixir
text = Kino.Input.read(text_input)
Nx.Serving.run(serving, text)
```

<!-- livebook:{"branch_parent_index":0} -->

## Text generation

Generation is where things get even more exciting. In this section w will use the extremely popular [GPT-2](https://huggingface.co/gpt2) model to generate text continuation.

Generation generally is an iterative process, where the model predicts the sentence token by token, adhering to some constraints. Again, we will make use of a higher-level API based on `Nx.Serving`.

```elixir
{:ok, gpt2} = Bumblebee.load_model({:hf, "gpt2"})
{:ok, tokenizer} = Bumblebee.load_tokenizer({:hf, "gpt2"})
{:ok, generation_config} = Bumblebee.load_generation_config({:hf, "gpt2"})

serving = Bumblebee.Text.generation(gpt2, tokenizer, generation_config)

text_input = Kino.Input.text("Text", default: "Yesterday, I was reading a book and")
```

```elixir
text = Kino.Input.read(text_input)
Nx.Serving.run(serving, text)
```

There is also [gpt2-medium](https://huggingface.co/gpt2-medium) and [gpt2-large](https://huggingface.co/gpt2-large) - heavier versions of the model with much more parameters.

<!-- livebook:{"branch_parent_index":0} -->

## Question answering

Another text-related task is question answering, where the objective is to retrieve the answer to a question based on a given text. We will work with a [RoBERTa](https://huggingface.co/deepset/roberta-base-squad2) model trained to do just that.

```elixir
{:ok, roberta} = Bumblebee.load_model({:hf, "deepset/roberta-base-squad2"})
{:ok, tokenizer} = Bumblebee.load_tokenizer({:hf, "roberta-base"})

serving = Bumblebee.Text.question_answering(roberta, tokenizer)

question_input =
  Kino.Input.text("Question",
    default: "Which name is also used to describe the Amazon rainforest in English?"
  )

context_input =
  Kino.Input.textarea("Context",
    default:
      ~s/The Amazon rainforest (Portuguese: Floresta Amaz么nica or Amaz么nia; Spanish: Selva Amaz贸nica, Amazon铆a or usually Amazonia; French: For锚t amazonienne; Dutch: Amazoneregenwoud), also known in English as Amazonia or the Amazon Jungle, is a moist broadleaf forest that covers most of the Amazon basin of South America. This basin encompasses 7,000,000 square kilometres (2,700,000 sq mi), of which 5,500,000 square kilometres (2,100,000 sq mi) are covered by the rainforest. This region includes territory belonging to nine nations. The majority of the forest is contained within Brazil, with 60% of the rainforest, followed by Peru with 13%, Colombia with 10%, and with minor amounts in Venezuela, Ecuador, Bolivia, Guyana, Suriname and French Guiana. States or departments in four nations contain "Amazonas" in their names. The Amazon represents over half of the planet's remaining rainforests, and comprises the largest and most biodiverse tract of tropical rainforest in the world, with an estimated 390 billion individual trees divided into 16,000 species./
  )

Kino.Layout.grid([question_input, context_input])
```

```elixir
question = Kino.Input.read(question_input)
context = Kino.Input.read(context_input)
Nx.Serving.run(serving, %{question: question, context: context})
```

## Final notes

The examples we covered should give you a good idea of what Bumblebee is about. We are excited about enabling easy access to the pre-trained, powerful deep learning models in Elixir. We are actively working on adding more models and high-level APIs, so stay tuned 馃殌