Instructions to use traceo/DrugRecon-1.0 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use traceo/DrugRecon-1.0 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("token-classification", model="traceo/DrugRecon-1.0")# Load model directly from transformers import AutoTokenizer, AutoModelForTokenClassification tokenizer = AutoTokenizer.from_pretrained("traceo/DrugRecon-1.0") model = AutoModelForTokenClassification.from_pretrained("traceo/DrugRecon-1.0") - Notebooks
- Google Colab
- Kaggle
DrugRecon-1.0
Introduction
This model was introduced in the article: What’s new on the market? Combining internet traces and pretrained language models to recognize emerging drug names. DrugRecon-1.0 is a RoBERTa-large model fine-tuned to recognize drug names mentioned in posts and comments from drug-related online forums.
Its intended use is to help researchers and public health agencies to monitor trends in online discussions of psychoactive substances, and to uncover new names for potentially emerging novel psychoactive substances (NPS). The model is meant to be used alongside human expert validation to interpret ambiguous terms, investigate findings, and manage model errors.
Training Data
The model was fine-tuned on a private, manually annotated corpus collected from specific drug-related sections of two online forums: Drugs-Forum and Dread.
- The training and validation sets comprise a total of 12,731 messages, containing 20,613 manually annotated drug names.
- The standard BIO2 scheme was employed to label tokens.
Evaluation Results
The model's performance was evaluated on its ability to recognize both 'Known' and 'Unseen' drug names. 'Unseen' scenario was simulated by replacing existing entities in evaluation set 1 with drug names not encountered during fine-tuning.
The model was benchmarks across two private, manually annotated test sets:
- Evaluation Set 1: This set contains 2,249 messages (3,576 manual annotations) sourced from the same platforms as training (Drugs-Forum and Dread).
- Evaluation Set 2: This set contains 2,000 messages (5,492 manual annotations) sourced from Reddit.
| Scenario | Precision (%) | Recall (%) | F1-Score (%) | |
|---|---|---|---|---|
| Evaluation Set 1 | Known | 85.7 | 89.6 | 87.6 |
| Unseen | 86.1 | 89.6 | 87.8 | |
| Evaluation Set 2 | Known | 93.8 | 94.2 | 94.0 |
| Unseen | 92.9 | 92.9 | 92.9 |
Limitations and Biases
- Language Constraint: The model was fine-tuned exclusively on English-language messages. Applying it to non-English content or heavily multilingual environments may result in significant performance degradation.
- Contextual Shifts: The model may struggle when applied to environments/platforms where the context of messages differs significantly from the online forums on which the model was fine-tuned (e.g., SMS style messages, ad titles, etc.).
- False Positives: Because the model relies heavily on contextual cues, terms that are not specific drug names but are used in very similar contexts can be incorrectly tagged. For example, the model may mistakenly tag general drug classes (e.g., benzos), chemical precursors, or non-targeted compounds (e.g., dietary supplements).
- Entity Boundary Errors: The model occasionally struggles to identify the exact boundaries of certain drug names. This can result in truncated or fragmented predictions.
Getting Started
from transformers import pipeline
classifier = pipeline("ner", model="traceo/DrugRecon-1.0", aggregation_strategy="simple")
results = classifier("Be careful, it was advertised as ketamine but tested positive for 2-FDCK and deschloroketamine.")
print(results)
[{'entity_group': 'DRUG',
'score': np.float32(0.99362385),
'word': ' ketamine',
'start': 33,
'end': 41},
{'entity_group': 'DRUG',
'score': np.float32(0.9998994),
'word': ' 2-FDCK',
'start': 66,
'end': 72},
{'entity_group': 'DRUG',
'score': np.float32(0.9995101),
'word': ' deschloroketamine',
'start': 77,
'end': 94}]
Note
aggregation_strategy="simple" usually works well in the pipeline. However, the model may occasionally struggle with some entity boundaries. For important monitoring workflows, the raw model outputs (aggregation_strategy="none") could be combined with a custom post-processing function to ensure entity boundaries are better reconstructed.
Citation
If you found this model useful for your work, please cite the original article:
@article{GRENIER2026112958,
title = {What’s new on the market? Combining internet traces and pretrained language models to recognize emerging drug names},
author = {Guillaume Grenier and Marina Charest and Pierre Esseiva and Quentin Rossy},
journal = {Forensic Science International},
volume = {385},
pages = {112958},
year = {2026},
issn = {0379-0738},
doi = {https://doi.org/10.1016/j.forsciint.2026.112958},
url = {https://www.sciencedirect.com/science/article/pii/S0379073826001453}
}
- Downloads last month
- 26
Model tree for traceo/DrugRecon-1.0
Base model
FacebookAI/roberta-large