File size: 2,486 Bytes
7fdb2b0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
29d47a7
edfd351
d95fa1b
7fdb2b0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3b8448c
7fdb2b0
 
 
3b8448c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7fdb2b0
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
---
language: en
datasets:
- Dizex/InstaFoodSet
widget:
- text: "Today's meal: Fresh olive poké bowl topped with chia seeds. Very delicious!"
  example_title: "Food example 1"
- text: "Tartufo Pasta with garlic flavoured butter and olive oil, egg yolk, parmigiano and pasta water."
  example_title: "Food example 2"  
tags:
- Instagram
- NER
- Named Entity Recognition
- Food Entity Extraction
- Social Media
- Informal text
- RoBERTa
license: mit
---
# InstaFoodRoBERTa-NER

## Model description

**InstaFoodRoBERTa-NER** is a fine-tuned BERT model that is ready to use for **Named Entity Recognition** of Food entities on social media like informal text (e.g. Instagram, X, Reddit). It has been trained to recognize a single entity: food (FOOD).

Specifically, this model is a [*roberta-base*](https://huggingface.co/roberta-base) model that was fine-tuned on a dataset consisting of 400 English Instagram posts related to food. The [dataset](https://huggingface.co/datasets/Dizex/InstaFoodSet) is open source.


## Intended uses

#### How to use

You can use this model with Transformers *pipeline* for NER.

```python
from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline

tokenizer = AutoTokenizer.from_pretrained("Dizex/InstaFoodRoBERTa-NER")
model = AutoModelForTokenClassification.from_pretrained("Dizex/InstaFoodRoBERTa-NER")

pipe = pipeline("ner", model=model, tokenizer=tokenizer)
example = "Today's meal: Fresh olive poké bowl topped with chia seeds. Very delicious!"

ner_entity_results = pipe(example, aggregation_strategy="simple")
print(ner_entity_results)
```

To get the extracted food entities as strings you can use the following code:

```python
def convert_entities_to_list(text, entities: list[dict]) -> list[str]:
        ents = []
        for ent in entities:
            e = {"start": ent["start"], "end": ent["end"], "label": ent["entity_group"]}
            if ents and -1 <= ent["start"] - ents[-1]["end"] <= 1 and ents[-1]["label"] == e["label"]:
                ents[-1]["end"] = e["end"]
                continue
            ents.append(e)

        return [text[e["start"]:e["end"]] for e in ents]

print(convert_entities_to_list(example, ner_entity_results))
```

This will result in the following output:

```python
['olive poké bowl', 'chia seeds']
```



## Performance on [InstaFoodSet](https://huggingface.co/datasets/Dizex/InstaFoodSet)
metric|val
-|-
f1 |0.91
precision |0.89
recall |0.93