anicolson commited on
Commit
9034591
1 Parent(s): 9724cf9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +33 -0
README.md CHANGED
@@ -29,6 +29,39 @@ The abstract from the paper:
29
 
30
  "This study investigates the integration of diverse patient data sources into multimodal language models for automated chest X-ray (CXR) report generation. Traditionally, CXR report generation relies solely on CXR images and limited radiology data, overlooking valuable information from patient health records, particularly from emergency departments. Utilising the MIMIC-CXR and MIMIC-IV-ED datasets, we incorporate detailed patient information such as aperiodic vital signs, medications, and clinical history to enhance diagnostic accuracy. We introduce a novel approach to transform these heterogeneous data sources into embeddings that prompt a multimodal language model, significantly enhancing the diagnostic accuracy of generated radiology reports. Our comprehensive evaluation demonstrates the benefits of using a broader set of patient data, underscoring the potential for enhanced diagnostic capabilities and better patient outcomes through the integration of multimodal data in CXR report generation."
31
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
32
  ## Example
33
 
34
  ```python
 
29
 
30
  "This study investigates the integration of diverse patient data sources into multimodal language models for automated chest X-ray (CXR) report generation. Traditionally, CXR report generation relies solely on CXR images and limited radiology data, overlooking valuable information from patient health records, particularly from emergency departments. Utilising the MIMIC-CXR and MIMIC-IV-ED datasets, we incorporate detailed patient information such as aperiodic vital signs, medications, and clinical history to enhance diagnostic accuracy. We introduce a novel approach to transform these heterogeneous data sources into embeddings that prompt a multimodal language model, significantly enhancing the diagnostic accuracy of generated radiology reports. Our comprehensive evaluation demonstrates the benefits of using a broader set of patient data, underscoring the potential for enhanced diagnostic capabilities and better patient outcomes through the integration of multimodal data in CXR report generation."
31
 
32
+ ## MIMIC-CXR & MIMIC-IV-ED Dataset:
33
+
34
+ MIMIC-CXR, MIMIC-CXR-JPG, and MIMIC-IV-ED must be in the same Physio Net directory. E.g.:
35
+
36
+ ```shell
37
+ user@cluster:~$ ls /home/user/physionet.org/files
38
+ mimic-cxr mimic-cxr-jpg mimic-iv-ed
39
+ ```
40
+
41
+ ### Download MIMIC-CXR-JPG:
42
+ Download the MIMIC-CXR-JPG dataset from https://physionet.org/content/mimic-cxr-jpg, e.g.,
43
+ ```shell
44
+ wget -r -N -c -np --user <username> --ask-password https://physionet.org/files/mimic-cxr-jpg/2.1.0/
45
+ ```
46
+ Note that you must be a credentialised user to access this dataset.
47
+
48
+ ### Download the reports from MIMIC-CXR:
49
+ MIMIC-CXR-JPG does not include the radiology reports and are instead included with MIMIC-CXR (the DICOM version of the dataset). To download this dataset and avoid downloading the DICOM files (which are very large), use `--reject dcm` with the wget command from https://physionet.org/content/mimic-cxr, e.g,
50
+ ```shell
51
+ wget -r -N -c -np --reject dcm --user <username> --ask-password https://physionet.org/files/mimic-cxr/2.0.0/
52
+ ```
53
+ Note that you must be a credentialised user to access this dataset.
54
+
55
+ ### Download MIMIC-IV-ED:
56
+ Download the MIMIC-IV-ED dataset from https://physionet.org/content/mimic-iv-ed, e.g.,
57
+ ```shell
58
+ wget -r -N -c -np --user <username> --ask-password https://physionet.org/files/mimic-iv-ed/2.2/
59
+ ```
60
+ Note that you must be a credentialised user to access this dataset.
61
+
62
+ ### Prepare the dataset:
63
+ Run the [prepare_dataset.ipynb](https://github.com/aehrc/anon/blob/main/prepare_dataset.ipynb) notebook from https://github.com/aehrc/anon and change the paths accordingly. It should take roughly an hour. The most time-consuming tasks are extracting sections from the radiology reports and matching CXR studies to ED stays.
64
+
65
  ## Example
66
 
67
  ```python