my_gradio / guides /10_other-tutorials /named-entity-recognition.md
xray918's picture
Upload folder using huggingface_hub
0ad74ed verified
|
raw
history blame
3.28 kB

Named-Entity Recognition

Related spaces: https://huggingface.co/spaces/rajistics/biobert_ner_demo, https://huggingface.co/spaces/abidlabs/ner, https://huggingface.co/spaces/rajistics/Financial_Analyst_AI Tags: NER, TEXT, HIGHLIGHT

Introduction

Named-entity recognition (NER), also known as token classification or text tagging, is the task of taking a sentence and classifying every word (or "token") into different categories, such as names of people or names of locations, or different parts of speech.

For example, given the sentence:

Does Chicago have any Pakistani restaurants?

A named-entity recognition algorithm may identify:

  • "Chicago" as a location
  • "Pakistani" as an ethnicity

and so on.

Using gradio (specifically the HighlightedText component), you can easily build a web demo of your NER model and share that with the rest of your team.

Here is an example of a demo that you'll be able to build:

$demo_ner_pipeline

This tutorial will show how to take a pretrained NER model and deploy it with a Gradio interface. We will show two different ways to use the HighlightedText component -- depending on your NER model, either of these two ways may be easier to learn!

Prerequisites

Make sure you have the gradio Python package already installed. You will also need a pretrained named-entity recognition model. You can use your own, while in this tutorial, we will use one from the transformers library.

Approach 1: List of Entity Dictionaries

Many named-entity recognition models output a list of dictionaries. Each dictionary consists of an entity, a "start" index, and an "end" index. This is, for example, how NER models in the transformers library operate:

from transformers import pipeline
ner_pipeline = pipeline("ner")
ner_pipeline("Does Chicago have any Pakistani restaurants")

Output:

[{'entity': 'I-LOC',
  'score': 0.9988978,
  'index': 2,
  'word': 'Chicago',
  'start': 5,
  'end': 12},
 {'entity': 'I-MISC',
  'score': 0.9958592,
  'index': 5,
  'word': 'Pakistani',
  'start': 22,
  'end': 31}]

If you have such a model, it is very easy to hook it up to Gradio's HighlightedText component. All you need to do is pass in this list of entities, along with the original text to the model, together as dictionary, with the keys being "entities" and "text" respectively.

Here is a complete example:

$code_ner_pipeline $demo_ner_pipeline

Approach 2: List of Tuples

An alternative way to pass data into the HighlightedText component is a list of tuples. The first element of each tuple should be the word or words that are being classified into a particular entity. The second element should be the entity label (or None if they should be unlabeled). The HighlightedText component automatically strings together the words and labels to display the entities.

In some cases, this can be easier than the first approach. Here is a demo showing this approach using Spacy's parts-of-speech tagger:

$code_text_analysis $demo_text_analysis


And you're done! That's all you need to know to build a web-based GUI for your NER model.

Fun tip: you can share your NER demo instantly with others simply by setting share=True in launch().