YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Med-Extract Pipeline

Fine-tune Qwen2.5 for medication extraction from clinical text into strict JSON.

Project Structure

schema.py                          # Pydantic models, normalization maps, JSON schema
data/
  __init__.py
  prepare.py                       # MACCROBAT loader, entity grouping, doc-level partition
notebooks/
  01_explore_dataset.ipynb         # EDA + audit gate + yield gate

Quick Start

git clone https://huggingface.co/Jagan666/med-extract-pipeline
cd med-extract-pipeline
pip install pydantic pandas pyarrow datasets

# Run the data pipeline
python data/prepare.py

# Or open the notebook in Colab/Jupyter
jupyter notebook notebooks/01_explore_dataset.ipynb

Output Schema

{
  "medications": [
    {
      "drug": "aspirin",
      "dose": "81 mg",
      "route": "oral",
      "frequency": "daily",
      "duration": "for 6 weeks"
    }
  ]
}

Datasets

  • Primary: MACCROBAT โ€” 200 PMC case reports (MIT)
  • Supplementary: BioLeaflets โ€” 1,336 EMA leaflets (Apache-2.0)
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support