ClinFly / README.md
Kévin Yauy
Update README.md
a850505

A newer version of the Streamlit SDK is available: 1.40.0

Upgrade
metadata
title: ClinFly
emoji: ✈️
sdk_version: 1.25.0
sdk: streamlit
app_file: clinfly_app_st.py
pinned: true

ClinFly

Contact : kevin.yauy@chu-montpellier.fr

Introduction

ClinFly is an automated framework designed to facilitate precision medicine (PM) for rare diseases. It addresses the challenge of precision phenotyping and data sharing across different languages.

ClinFly can anonymize, translate, and summarize clinical reports using Human Phenotype Ontology (HPO) terms, ensuring compliance with medical data privacy standards. The output includes a de-identified translated clinical report and a summary report in HPO format.

By streamlining the translation and anonymization of clinical reports, ClinFly aims to enhance inter-hospital data sharing, expedite medical discoveries, and pave the way for an international patient file accessible to non-English speakers.

Pipeline

Installation

To install ClinFly on your local machine, you need the poetry package manager. Navigate to the project folder and run:

poetry install

You'll need poppler and tesseract-ocr installed in your os to use OCR on pdf file.

If you need to generate a requirements.txt file, use the following command:

poetry export --without-hashes --format=requirements.txt > requirements.txt

Usage

Graphical User Interface

For single report usage with interactive analysis, ClinFly provides a web application accessible at https://huggingface.co/spaces/kyauy/ClinFly.

To run the Streamlit application on your local computer, activate the poetry shell and run the clinfly_app_st.py file:

poetry shell
streamlit run clinfly_app_st.py

Command Line Interface

For processing multiple reports with offline options, use the command line interface provided by clinfly_app_cli.py.

The input should be a TSV .tsv file structured as follows (see data/test.tsv for an example):

Report_id_1   Doe  John  Report text 
...
Report_id_X   Doe  John  Report text

You can also put a symlink to a pdf file (see data/test_pdf.tsv for an example)

Report_id_X   Doe  John  Example.pdf

Outputs will be placed in the results folder according to the file extension, using first three columns in filename.

  • The deidentify report will be generated and placed in the results/Reports folder.
  • Three HPO extraction outputs will be generated in TSV, TXT and JSON folders.

To run the CLI application on your local computer :

poetry shell
<python running version> clinfly_app_cli.py --file <input txt file with the reports> --language <language of the file> --model_dir <The output directory of the model (OPTIONAL)> --result_dir <The output directory of the generated result (OPTIONAL)>

Acknoledgments and reference

ClinFly is a tool developed by University Hospital of Montpellier and Hospices Civils de Lyon. Thanks to the developers, especially Enody Gernet for his contribution.

If you use ClinFly, please cite:

Gauthier et al., Assessing feasibility and risk to translate, de-identify and summarize medical letters using deep learning. medrXiv (2023). https://doi.org/10.1101/2023.07.27.23293234