DoctoBERT-fr β€” Training Checkpoints

DoctoBERT

πŸ€— Blog | πŸ“„ Paper | πŸ’» Code | 🌐 FineMed | 🩺 DoctoBERT

πŸ“š Introduction

Per-stage MosaicML Composer training-state checkpoints for the DoctoBERT-fr encoder family.

Each .pt is the full state (model, optimizer, scheduler, RNG), for resuming or analysis. DoctoBERT-fr keeps the final checkpoint of each stage; DoctoModernBERT-fr also keeps intermediate checkpoints every ~5,000 batches.

For ready-to-use inference models, use DoctoBERT-fr-base and DoctoModernBERT-fr-base.

βš–οΈ License

Released under Apache-2.0. Trained on FineMed-fr and FineMed-rephrased-fr, which derive from FineWeb-2 / FinePDFs (ODC-BY 1.0) and FineWiki (CC BY-SA 4.0); please attribute those upstream sources.

πŸ›οΈ Acknowledgments

This work was granted access to the HPC resources of IDRIS (Jean Zay) under the allocations 2025-AD011016291 and 2026-A0200617487 made by GENCI.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Collection including doctolib-lab/doctobert-fr-ckpts

Paper for doctolib-lab/doctobert-fr-ckpts