--- license: apache-2.0 language: - en metrics: - f1 library_name: transformers pipeline_tag: token-classification tags: - token classification - information extraction - NER - relation extraction - text cleaning --- # UTC-DeBERTa-large - universal token classifier 🚀 Meet the second version of our prompt-tuned universal token classification model 🚀 This line of models can perform various information extraction tasks by analysing input prompts and recognizing parts of texts that satisfy prompts. In comparison with the first version, the second one is more general and can recognised as entities, whole sentences, and even paragraphs. To use a model, just specify a prompt, for example : ***“Identify all positive aspects of the product mentioned by John: “*** and put your target text. This is a model based on `DeBERTaV3-large` that was trained on multiple token classification tasks or tasks that can be represented in this way. Such *multi-task fine-tuning* enabled better generalization; even small models can be used for zero-shot named entity recognition and demonstrate good performance on reading comprehension tasks. The model can be used for the following tasks: * Named entity recognition (NER); * Open information extraction; * Question answering; * Relation extraction; * Coreference resolution; * Text cleaning; * Summarization; #### How to use There are few ways how you can use this model, one of the way is to utilize `token-classification` pipeline from transformers: ```python from transformers import AutoTokenizer, AutoModelForTokenClassification from transformers import pipeline def process(text, prompt, treshold=0.5): """ Processes text by preparing prompt and adjusting indices. Args: text (str): The text to process prompt (str): The prompt to prepend to the text Returns: list: A list of dicts with adjusted spans and scores """ # Concatenate text and prompt for full input input_ = f"{prompt}\n{text}" results = nlp(input_) # Run NLP on full input processed_results = [] prompt_length = len(prompt) # Get prompt length for result in results: # check whether score is higher than treshold if result['score']