YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Med-Extract Pipeline
Fine-tune Qwen2.5 for medication extraction from clinical text into strict JSON.
Project Structure
schema.py # Pydantic models, normalization maps, JSON schema
data/
__init__.py
prepare.py # MACCROBAT loader, entity grouping, doc-level partition
notebooks/
01_explore_dataset.ipynb # EDA + audit gate + yield gate
Quick Start
git clone https://huggingface.co/Jagan666/med-extract-pipeline
cd med-extract-pipeline
pip install pydantic pandas pyarrow datasets
# Run the data pipeline
python data/prepare.py
# Or open the notebook in Colab/Jupyter
jupyter notebook notebooks/01_explore_dataset.ipynb
Output Schema
{
"medications": [
{
"drug": "aspirin",
"dose": "81 mg",
"route": "oral",
"frequency": "daily",
"duration": "for 6 weeks"
}
]
}
Datasets
- Primary: MACCROBAT โ 200 PMC case reports (MIT)
- Supplementary: BioLeaflets โ 1,336 EMA leaflets (Apache-2.0)
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support