DrugRecon-1.0

Introduction

This model was introduced in the article: What’s new on the market? Combining internet traces and pretrained language models to recognize emerging drug names. DrugRecon-1.0 is a RoBERTa-large model fine-tuned to recognize drug names mentioned in posts and comments from drug-related online forums.

Its intended use is to help researchers and public health agencies to monitor trends in online discussions of psychoactive substances, and to uncover new names for potentially emerging novel psychoactive substances (NPS). The model is meant to be used alongside human expert validation to interpret ambiguous terms, investigate findings, and manage model errors.

Training Data

The model was fine-tuned on a private, manually annotated corpus collected from specific drug-related sections of two online forums: Drugs-Forum and Dread.

The training and validation sets comprise a total of 12,731 messages, containing 20,613 manually annotated drug names.
The standard BIO2 scheme was employed to label tokens.

Evaluation Results

The model's performance was evaluated on its ability to recognize both 'Known' and 'Unseen' drug names. 'Unseen' scenario was simulated by replacing existing entities in evaluation set 1 with drug names not encountered during fine-tuning.

The model was benchmarks across two private, manually annotated test sets:

Evaluation Set 1: This set contains 2,249 messages (3,576 manual annotations) sourced from the same platforms as training (Drugs-Forum and Dread).
Evaluation Set 2: This set contains 2,000 messages (5,492 manual annotations) sourced from Reddit.

	Scenario	Precision (%)	Recall (%)	F1-Score (%)
Evaluation Set 1	Known	85.7	89.6	87.6
	Unseen	86.1	89.6	87.8
Evaluation Set 2	Known	93.8	94.2	94.0
	Unseen	92.9	92.9	92.9

Limitations and Biases

Language Constraint: The model was fine-tuned exclusively on English-language messages. Applying it to non-English content or heavily multilingual environments may result in significant performance degradation.
Contextual Shifts: The model may struggle when applied to environments/platforms where the context of messages differs significantly from the online forums on which the model was fine-tuned (e.g., SMS style messages, ad titles, etc.).
False Positives: Because the model relies heavily on contextual cues, terms that are not specific drug names but are used in very similar contexts can be incorrectly tagged. For example, the model may mistakenly tag general drug classes (e.g., benzos), chemical precursors, or non-targeted compounds (e.g., dietary supplements).
Entity Boundary Errors: The model occasionally struggles to identify the exact boundaries of certain drug names. This can result in truncated or fragmented predictions.

Getting Started

from transformers import pipeline

classifier = pipeline("ner", model="traceo/DrugRecon-1.0", aggregation_strategy="simple")

results = classifier("Be careful, it was advertised as ketamine but tested positive for 2-FDCK and deschloroketamine.")

print(results)

[{'entity_group': 'DRUG',
  'score': np.float32(0.99362385),
  'word': ' ketamine',
  'start': 33,
  'end': 41},
 {'entity_group': 'DRUG',
  'score': np.float32(0.9998994),
  'word': ' 2-FDCK',
  'start': 66,
  'end': 72},
 {'entity_group': 'DRUG',
  'score': np.float32(0.9995101),
  'word': ' deschloroketamine',
  'start': 77,
  'end': 94}]

Note

aggregation_strategy="simple" usually works well in the pipeline. However, the model may occasionally struggle with some entity boundaries. For important monitoring workflows, the raw model outputs (aggregation_strategy="none") could be combined with a custom post-processing function to ensure entity boundaries are better reconstructed.

Citation

If you found this model useful for your work, please cite the original article:

@article{GRENIER2026112958,
  title = {What’s new on the market? Combining internet traces and pretrained language models to recognize emerging drug names},
  author = {Guillaume Grenier and Marina Charest and Pierre Esseiva and Quentin Rossy},
  journal = {Forensic Science International},
  volume = {385},
  pages = {112958},
  year = {2026},
  issn = {0379-0738},
  doi = {https://doi.org/10.1016/j.forsciint.2026.112958},
  url = {https://www.sciencedirect.com/science/article/pii/S0379073826001453}
}

Downloads last month: 26

Safetensors

Model size

0.4B params

Tensor type

F32

Model tree for traceo/DrugRecon-1.0

Base model

FacebookAI/roberta-large

Finetuned

(468)

this model

Collection including traceo/DrugRecon-1.0

DrugRecon

Collection

1 item • Updated Apr 22