Inference Providers documentation
Token Classification
Token Classification
Token classification is a task in which a label is assigned to some tokens in a text. Some popular token classification subtasks are Named Entity Recognition (NER) and Part-of-Speech (PoS) tagging.
For more details about the token-classification
task, check out its dedicated page! You will find examples and related materials.
Recommended models
- dslim/bert-base-NER: A robust performance model to identify people, locations, organizations and names of miscellaneous entities.
- FacebookAI/xlm-roberta-large-finetuned-conll03-english: A strong model to identify people, locations, organizations and names in multiple languages.
- blaze999/Medical-NER: A token classification model specialized on medical entity recognition.
Explore all available models and find the one that suits you best here.
Using the API
Copied
from huggingface_hub import InferenceClient
client = InferenceClient(
provider="hf-inference",
api_key="hf_xxxxxxxxxxxxxxxxxxxxxxxx",
)
result = client.token_classification(
inputs="My name is Sarah Jessica Parker but you can call me Jessica",
model="dslim/bert-base-NER",
)
API specification
Request
Headers | ||
---|---|---|
authorization | string | Authentication header in the form 'Bearer: hf_****' when hf_**** is a personal user access token with “Inference Providers” permission. You can generate one from your settings page. |
Payload | ||
---|---|---|
inputs* | string | The input text data |
parameters | object | |
ignore_labels | string[] | A list of labels to ignore |
stride | integer | The number of overlapping tokens between chunks when splitting the input text. |
aggregation_strategy | string | One of the following: |
(#1) | ’none’ | Do not aggregate tokens |
(#2) | ’simple’ | Group consecutive tokens with the same label in a single entity. |
(#3) | ’first’ | Similar to “simple”, also preserves word integrity (use the label predicted for the first token in a word). |
(#4) | ’average’ | Similar to “simple”, also preserves word integrity (uses the label with the highest score, averaged across the word’s tokens). |
(#5) | ’max’ | Similar to “simple”, also preserves word integrity (uses the label with the highest score across the word’s tokens). |
Response
Body | ||
---|---|---|
(array) | object[] | Output is an array of objects. |
entity_group | string | The predicted label for a group of one or more tokens |
entity | string | The predicted label for a single token |
score | number | The associated score / probability |
word | string | The corresponding text |
start | integer | The character position in the input where this group begins. |
end | integer | The character position in the input where this group ends. |