File size: 2,486 Bytes
7fdb2b0 29d47a7 edfd351 d95fa1b 7fdb2b0 3b8448c 7fdb2b0 3b8448c 7fdb2b0 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 |
---
language: en
datasets:
- Dizex/InstaFoodSet
widget:
- text: "Today's meal: Fresh olive poké bowl topped with chia seeds. Very delicious!"
example_title: "Food example 1"
- text: "Tartufo Pasta with garlic flavoured butter and olive oil, egg yolk, parmigiano and pasta water."
example_title: "Food example 2"
tags:
- Instagram
- NER
- Named Entity Recognition
- Food Entity Extraction
- Social Media
- Informal text
- RoBERTa
license: mit
---
# InstaFoodRoBERTa-NER
## Model description
**InstaFoodRoBERTa-NER** is a fine-tuned BERT model that is ready to use for **Named Entity Recognition** of Food entities on social media like informal text (e.g. Instagram, X, Reddit). It has been trained to recognize a single entity: food (FOOD).
Specifically, this model is a [*roberta-base*](https://huggingface.co/roberta-base) model that was fine-tuned on a dataset consisting of 400 English Instagram posts related to food. The [dataset](https://huggingface.co/datasets/Dizex/InstaFoodSet) is open source.
## Intended uses
#### How to use
You can use this model with Transformers *pipeline* for NER.
```python
from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline
tokenizer = AutoTokenizer.from_pretrained("Dizex/InstaFoodRoBERTa-NER")
model = AutoModelForTokenClassification.from_pretrained("Dizex/InstaFoodRoBERTa-NER")
pipe = pipeline("ner", model=model, tokenizer=tokenizer)
example = "Today's meal: Fresh olive poké bowl topped with chia seeds. Very delicious!"
ner_entity_results = pipe(example, aggregation_strategy="simple")
print(ner_entity_results)
```
To get the extracted food entities as strings you can use the following code:
```python
def convert_entities_to_list(text, entities: list[dict]) -> list[str]:
ents = []
for ent in entities:
e = {"start": ent["start"], "end": ent["end"], "label": ent["entity_group"]}
if ents and -1 <= ent["start"] - ents[-1]["end"] <= 1 and ents[-1]["label"] == e["label"]:
ents[-1]["end"] = e["end"]
continue
ents.append(e)
return [text[e["start"]:e["end"]] for e in ents]
print(convert_entities_to_list(example, ner_entity_results))
```
This will result in the following output:
```python
['olive poké bowl', 'chia seeds']
```
## Performance on [InstaFoodSet](https://huggingface.co/datasets/Dizex/InstaFoodSet)
metric|val
-|-
f1 |0.91
precision |0.89
recall |0.93
|