Token Classification

Token classification is a task in which a label is assigned to some tokens in a text. Some popular token classification subtasks are Named Entity Recognition (NER) and Part-of-Speech (PoS) tagging.

For more details about the token-classification task, check out its dedicated page! You will find examples and related materials.

Recommended models

dslim/bert-base-NER: A robust performance model to identify people, locations, organizations and names of miscellaneous entities.
FacebookAI/xlm-roberta-large-finetuned-conll03-english: A strong model to identify people, locations, organizations and names in multiple languages.
blaze999/Medical-NER: A token classification model specialized on medical entity recognition.

Explore all available models and find the one that suits you best here.

Using the API

Language

Client

Provider

Settings

from huggingface_hub import InferenceClient

client = InferenceClient(
    provider="hf-inference",
    api_key="hf_xxxxxxxxxxxxxxxxxxxxxxxx",
)

result = client.token_classification(
    inputs="My name is Sarah Jessica Parker but you can call me Jessica",
    model="dslim/bert-base-NER",
)

API specification

Request

Headers
authorization	string	Authentication header in the form `'Bearer: hf_**'` when `hf_**` is a personal user access token with “Inference Providers” permission. You can generate one from your settings page.

Payload
inputs*	string	The input text data
parameters	object
ignore_labels	string[]	A list of labels to ignore
stride	integer	The number of overlapping tokens between chunks when splitting the input text.
aggregation_strategy	string	One of the following:
(#1)	’none’	Do not aggregate tokens
(#2)	’simple’	Group consecutive tokens with the same label in a single entity.
(#3)	’first’	Similar to “simple”, also preserves word integrity (use the label predicted for the first token in a word).
(#4)	’average’	Similar to “simple”, also preserves word integrity (uses the label with the highest score, averaged across the word’s tokens).
(#5)	’max’	Similar to “simple”, also preserves word integrity (uses the label with the highest score across the word’s tokens).

Response

Body
(array)	object[]	Output is an array of objects.
entity_group	string	The predicted label for a group of one or more tokens
entity	string	The predicted label for a single token
score	number	The associated score / probability
word	string	The corresponding text
start	integer	The character position in the input where this group begins.
end	integer	The character position in the input where this group ends.