aehrc
/

cxrmate-ed

Model card Files Files and versions Community

anicolson commited on Jun 30, 2024

Commit

6f7f115

verified ·

1 Parent(s): a1f73d2

Upload model

Browse files

Files changed (9) hide show

README.md +199 -0
config.json +237 -0
dataset.py +382 -0
generation_config.json +7 -0
model.safetensors +3 -0
modelling_cxrmate_ed.py +1129 -0
modelling_uniformer.py +412 -0
records.py +369 -0
tables.py +159 -0

README.md ADDED Viewed

	@@ -0,0 +1,199 @@

+---
+library_name: transformers
+tags: []
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]

config.json ADDED Viewed

	@@ -0,0 +1,237 @@

+{
+  "architectures": [
+    "MIMICIVEDCXRMultimodalModel"
+  ],
+  "auto_map": {
+    "AutoModel": "modelling_cxrmate_ed.MIMICIVEDCXRMultimodalModel"
+  },
+  "decoder": {
+    "_name_or_path": "",
+    "add_cross_attention": false,
+    "add_time_deltas": true,
+    "architectures": null,
+    "attention_bias": false,
+    "attention_dropout": 0.0,
+    "bad_words_ids": null,
+    "begin_suppress_tokens": null,
+    "bos_token_id": 1,
+    "chunk_size_feed_forward": 0,
+    "cross_attention_hidden_size": null,
+    "decoder_start_token_id": null,
+    "diversity_penalty": 0.0,
+    "do_sample": false,
+    "early_stopping": false,
+    "ed_module_columns": [
+      "triage_chiefcomplaint",
+      "triage_pain",
+      "vitalsign_pain"
+    ],
+    "encoder_no_repeat_ngram_size": 0,
+    "eos_token_id": 2,
+    "exponential_decay_length_penalty": null,
+    "finetuning_task": null,
+    "forced_bos_token_id": null,
+    "forced_eos_token_id": null,
+    "hidden_act": "silu",
+    "hidden_size": 768,
+    "id2label": {
+      "0": "LABEL_0",
+      "1": "LABEL_1"
+    },
+    "include_time_delta": true,
+    "index_value_encoder_config": {
+      "edstays": 40,
+      "triage": 7,
+      "vitalsign": 1177
+    },
+    "index_value_encoder_intermediate_size": 2048,
+    "initializer_range": 0.02,
+    "intermediate_size": 3072,
+    "is_decoder": true,
+    "is_encoder_decoder": false,
+    "label2id": {
+      "LABEL_0": 0,
+      "LABEL_1": 1
+    },
+    "length_penalty": 1.0,
+    "max_length": 20,
+    "max_position_embeddings": 2048,
+    "mimic_cxr_columns": [
+      "indication",
+      "history"
+    ],
+    "min_length": 0,
+    "model_type": "llama",
+    "no_repeat_ngram_size": 0,
+    "num_attention_heads": 12,
+    "num_beam_groups": 1,
+    "num_beams": 1,
+    "num_hidden_layers": 6,
+    "num_key_value_heads": 12,
+    "num_return_sequences": 1,
+    "num_token_types": 19,
+    "output_attentions": false,
+    "output_hidden_states": false,
+    "output_scores": false,
+    "pad_token_id": 4,
+    "prefix": null,
+    "pretraining_tp": 1,
+    "problem_type": null,
+    "pruned_heads": {},
+    "remove_invalid_values": false,
+    "repetition_penalty": 1.0,
+    "return_dict": true,
+    "return_dict_in_generate": false,
+    "rms_norm_eps": 1e-06,
+    "rope_scaling": null,
+    "rope_theta": 10000.0,
+    "sep_token_id": null,
+    "suppress_tokens": null,
+    "task_specific_params": null,
+    "temperature": 1.0,
+    "tf_legacy_loss": false,
+    "tie_encoder_decoder": false,
+    "tie_word_embeddings": false,
+    "time_delta_monotonic_inversion": true,
+    "token_type_to_token_type_id": {
+      "comparison": 15,
+      "edstays": 1,
+      "findings": 12,
+      "history": 11,
+      "image": 14,
+      "impression": 13,
+      "indication": 10,
+      "medrecon": 0,
+      "medrecon_name": 6,
+      "mimic_cxr_2_0_0_metadata": 5,
+      "previous_findings": 16,
+      "previous_image": 18,
+      "previous_impression": 17,
+      "pyxis": 4,
+      "triage": 2,
+      "triage_chiefcomplaint": 7,
+      "triage_pain": 8,
+      "vitalsign": 3,
+      "vitalsign_pain": 9
+    },
+    "tokenizer_class": null,
+    "top_k": 50,
+    "top_p": 1.0,
+    "torch_dtype": null,
+    "torchscript": false,
+    "typical_p": 1.0,
+    "use_bfloat16": false,
+    "use_cache": true,
+    "vocab_size": 30000,
+    "zero_time_delta_value": 1.0
+  },
+  "encoder": {
+    "_name_or_path": "",
+    "add_cross_attention": false,
+    "architectures": null,
+    "attention_probs_dropout_prob": 0.0,
+    "attn_drop_rate": 0.0,
+    "bad_words_ids": null,
+    "begin_suppress_tokens": null,
+    "bos_token_id": null,
+    "chunk_size_feed_forward": 0,
+    "conv_stem": false,
+    "cross_attention_hidden_size": null,
+    "decoder_start_token_id": null,
+    "depth": [
+      5,
+      8,
+      20,
+      7
+    ],
+    "diversity_penalty": 0.0,
+    "do_sample": false,
+    "drop_path_rate": 0.3,
+    "drop_rate": 0.0,
+    "early_stopping": false,
+    "embed_dim": [
+      64,
+      128,
+      320,
+      512
+    ],
+    "encoder_no_repeat_ngram_size": 0,
+    "encoder_stride": 16,
+    "eos_token_id": null,
+    "exponential_decay_length_penalty": null,
+    "finetuning_task": null,
+    "forced_bos_token_id": null,
+    "forced_eos_token_id": null,
+    "head_dim": 64,
+    "hidden_act": "gelu",
+    "hidden_dropout_prob": 0.0,
+    "hidden_size": 768,
+    "id2label": {
+      "0": "LABEL_0",
+      "1": "LABEL_1"
+    },
+    "image_size": 384,
+    "in_chans": 3,
+    "initializer_range": 0.02,
+    "intermediate_size": 3072,
+    "is_decoder": false,
+    "is_encoder_decoder": false,
+    "label2id": {
+      "LABEL_0": 0,
+      "LABEL_1": 1
+    },
+    "layer_norm_eps": 1e-06,
+    "length_penalty": 1.0,
+    "max_length": 20,
+    "min_length": 0,
+    "mlp_ratio": 4,
+    "model_type": "vit",
+    "no_repeat_ngram_size": 0,
+    "num_attention_heads": 12,
+    "num_beam_groups": 1,
+    "num_beams": 1,
+    "num_channels": 3,
+    "num_classes": 1000,
+    "num_hidden_layers": 12,
+    "num_return_sequences": 1,
+    "output_attentions": false,
+    "output_hidden_states": false,
+    "output_scores": false,
+    "pad_token_id": null,
+    "patch_size": [
+      4,
+      2,
+      2,
+      2
+    ],
+    "prefix": null,
+    "problem_type": null,
+    "projection_size": 768,
+    "pruned_heads": {},
+    "qk_scale": null,
+    "qkv_bias": true,
+    "remove_invalid_values": false,
+    "repetition_penalty": 1.0,
+    "representation_size": null,
+    "return_dict": true,
+    "return_dict_in_generate": false,
+    "sep_token_id": null,
+    "suppress_tokens": null,
+    "task_specific_params": null,
+    "temperature": 1.0,
+    "tf_legacy_loss": false,
+    "tie_encoder_decoder": false,
+    "tie_word_embeddings": true,
+    "tokenizer_class": null,
+    "top_k": 50,
+    "top_p": 1.0,
+    "torch_dtype": null,
+    "torchscript": false,
+    "typical_p": 1.0,
+    "use_bfloat16": false
+  },
+  "model_type": "vision-encoder-decoder",
+  "tie_word_embeddings": false,
+  "torch_dtype": "float32",
+  "transformers_version": "4.39.0"
+}

dataset.py ADDED Viewed

	@@ -0,0 +1,382 @@

+import os
+import struct
+import lmdb
+import numpy as np
+import pandas as pd
+import torch
+from torch.utils.data import Dataset
+from torchvision.io import decode_image, read_image
+from data.mimic_cxr.dcm_processing import load_and_preprocess_dcm_uint16
+from tools.mimic_iv.ed_cxr.records import EDCXRSubjectRecords
+from tools.utils import mimic_cxr_image_path
+# Ordered by oblique, lateral, AP, and then PA views so that PA views are closest in position to the generated tokens (and oblique is furtherest).
+VIEW_ORDER = ['LPO', 'RAO', 'LAO', 'SWIMMERS', 'XTABLE LATERAL', 'LL', 'LATERAL',  'AP AXIAL', 'AP RLD', 'AP LLD', 'AP', 'PA RLD', 'PA LLD', 'PA']
+class StudyIDEDStayIDSubset(Dataset):
+    """
+    Study ID & ED stay ID subset. Examples are indexed by the study identifier.
+    Information from the ED module is added by finding the study_id that is within
+    the timespan of the stay_id for the subject_id. The history and indication
+    sections are also included.
+    """
+    def __init__(
+        self,
+        mimic_iv_duckdb_path,
+        split,
+        dataset_dir=None,
+        max_images_per_study=None,
+        transforms=None,
+        images=True,
+        columns='study_id, dicom_id, subject_id, findings, impression',
+        and_condition='',
+        records=None,
+        study_id_inclusion_list=None,
+        return_images=True,
+        ed_module=True,
+        extension='jpg',
+        images_rocksdb_path=None,
+        jpg_lmdb_path=None,
+        jpg_rocksdb_path=None,
+    ):
+        """
+        Argument/s:
+            mimic_iv_duckdb_path - Path to MIMIC-IV DuckDB database.
+            split - 'train', 'validate', or 'test'.
+            dataset_dir - Dataset directory.
+            max_images_per_study - the maximum number of images per study.
+            transforms - torchvision transformations.
+            colour_space - PIL target colour space.
+            images - flag to return processed images.
+            columns - which columns to query on.
+            and_condition - AND condition to add to the SQL query.
+            records - MIMIC-IV records class instance.
+            study_id_inclusion_list - studies not in this list are excluded.
+            return_images - return CXR images for the study as tensors.
+            ed_module - use the ED module.
+            extension - 'jpg' or 'dcm'.
+            images_rocksdb_path - path to image RocksDB database.
+            jpg_lmdb_path - path to LMDB .jpg database.
+            jpg_rocksdb_path - path to RocksDB .jpg database.
+        """
+        super(StudyIDEDStayIDSubset, self).__init__()
+        self.split = split
+        self.dataset_dir = dataset_dir
+        self.max_images_per_study = max_images_per_study
+        self.transforms = transforms
+        self.images = images
+        self.columns = columns
+        self.and_condition = and_condition
+        self.return_images = return_images
+        self.ed_module = ed_module
+        self.extension = extension
+        self.images_rocksdb_path = images_rocksdb_path
+        self.jpg_lmdb_path = jpg_lmdb_path
+        self.jpg_rocksdb_path = jpg_rocksdb_path
+        # If max images per study is not set:
+        self.max_images_per_study = float('inf') if self.max_images_per_study is None else self.max_images_per_study
+        assert self.extension == 'jpg' or self.extension == 'dcm'
+        if self.dataset_dir is not None and self.images_rocksdb_path is None:
+            if self.extension == 'jpg':
+                if 'physionet.org/files/mimic-cxr-jpg/2.0.0/files' not in self.dataset_dir:
+                    self.dataset_dir = os.path.join(self.dataset_dir, 'physionet.org/files/mimic-cxr-jpg/2.0.0/files')
+            elif self.extension == 'dcm':
+                if 'physionet.org/files/mimic-cxr/2.0.0/files' not in self.dataset_dir:
+                    self.dataset_dir = os.path.join(self.dataset_dir, 'physionet.org/files/mimic-cxr/2.0.0/files')
+        # Open the RocksDB images database:
+        if self.images_rocksdb_path is not None:
+            import rocksdb
+            # Define the column families:
+            column_families = {
+                b'shape': rocksdb.ColumnFamilyOptions(),
+                b'image': rocksdb.ColumnFamilyOptions(),
+            }
+            opts = rocksdb.Options()
+            opts.max_open_files = 1e+5
+            self.images_db = rocksdb.DB(self.images_rocksdb_path, opts, column_families=column_families, read_only=True)
+            self.shape_handle = self.images_db.get_column_family(b'shape')
+            self.image_handle = self.images_db.get_column_family(b'image')
+            self.shape_dtype = np.int32
+            self.image_dtype = np.uint16
+        # Prepare the RocksDB .jpg database:
+        if self.jpg_rocksdb_path is not None:
+            import rocksdb
+            opts = rocksdb.Options()
+            opts.max_open_files = 1e+5
+            self.images_db = rocksdb.DB(self.jpg_rocksdb_path, opts, read_only=True)
+        # Prepare the LMDB .jpg database:
+        if self.jpg_lmdb_path is not None:
+            print('Loading images using LMDB.')
+            # Map size:
+            map_size = int(0.65 * (1024 ** 4))
+            assert isinstance(map_size, int)
+            self.env = lmdb.open(self.jpg_lmdb_path, map_size=map_size, lock=False, readonly=True)
+            self.txn = self.env.begin(write=False)
+        self.records = EDCXRSubjectRecords(database_path=mimic_iv_duckdb_path) if records is None else records
+        query = f"""
+        SELECT {columns}
+        FROM mimic_cxr
+        WHERE split = '{split}'
+        {and_condition}
+        ORDER BY study_id
+        """
+        # For multi-image, the study identifiers make up the training examples:
+        df = self.records.connect.sql(query).df()
+        # Drop studies that don't have a findings or impression section:
+        df = df.dropna(subset=['findings', 'impression'], how='any')
+        # This study has two rows in edstays (removed as it causes issues):
+        if self.ed_module:
+            df = df[df['study_id'] != 59128861]
+        # Exclude studies not in list:
+        if study_id_inclusion_list is not None:
+            df = df[df['study_id'].isin(study_id_inclusion_list)]
+        # Example study identifiers for the subset:
+        self.examples = df['study_id'].unique().tolist()
+        # Record statistics:
+        self.num_study_ids = len(self.examples)
+        self.num_dicom_ids = len(df['dicom_id'].unique().tolist())
+        self.num_subject_ids = len(df['subject_id'].unique().tolist())
+    def __len__(self):
+        return self.num_study_ids
+    def __getitem__(self, index):
+        study_id = self.examples[index]
+        # Get the study:
+        study = self.records.connect.sql(
+            f"""
+            SELECT dicom_id, study_id, subject_id, study_datetime, ViewPosition
+            FROM mimic_cxr
+            WHERE (study_id = {study_id});
+            """
+        ).df()
+        subject_id = study.iloc[0, study.columns.get_loc('subject_id')]
+        study_id = study.iloc[0, study.columns.get_loc('study_id')]
+        study_datetime = study['study_datetime'].max()
+        example_dict = {
+            'study_ids': study_id,
+            'subject_id': subject_id,
+            'index': index,
+        }
+        example_dict.update(self.records.return_mimic_cxr_features(study_id))
+        if self.ed_module:
+            edstays = self.records.connect.sql(
+                f"""
+                SELECT stay_id, intime, outtime
+                FROM edstays
+                WHERE (subject_id = {subject_id})
+                AND intime < '{study_datetime}'
+                AND outtime > '{study_datetime}';
+                """
+            ).df()
+            assert len(edstays) <= 1
+            stay_id = edstays.iloc[0, edstays.columns.get_loc('stay_id')] if not edstays.empty else None
+            self.records.clear_start_end_times()
+            example_dict.update(self.records.return_ed_module_features(stay_id, study_datetime))
+            example_dict['stay_ids'] = stay_id
+        if self.return_images:
+            example_dict['images'], example_dict['image_time_deltas'] = self.get_images(study, study_datetime)
+        return example_dict
+    def get_images(self, example, reference_time):
+        """
+        Get the image/s for a given example.
+        Argument/s:
+            example - dataframe for the example.
+            reference_time - reference_time for time delta.
+        Returns:
+            The image/s for the example
+        """
+        # Sample if over max_images_per_study. Only allowed during training:
+        if len(example) > self.max_images_per_study:
+            assert self.split == 'train'
+            example = example.sample(n=self.max_images_per_study, axis=0)
+        # Order by ViewPostion:
+        example['ViewPosition'] = example['ViewPosition'].astype(pd.CategoricalDtype(categories=VIEW_ORDER, ordered=True))
+        # Sort the DataFrame based on the categorical column
+        example = example.sort_values(by=['study_datetime', 'ViewPosition'])
+        # Load and pre-process each CXR:
+        images, time_deltas = [], []
+        for _, row in example.iterrows():
+            images.append(
+                self.load_and_preprocess_image(
+                    row['subject_id'],
+                    row['study_id'],
+                    row['dicom_id'],
+                ),
+            )
+            time_deltas.append(self.records.compute_time_delta(row['study_datetime'], reference_time, to_tensor=False))
+        if self.transforms is not None:
+            images = torch.stack(images, 0)
+        return images, time_deltas
+    def load_and_preprocess_image(self, subject_id, study_id, dicom_id):
+        """
+        Load and preprocess an image using torchvision.transforms.v2:
+            https://pytorch.org/vision/stable/auto_examples/transforms/plot_transforms_getting_started.html#sphx-glr-auto-examples-transforms-plot-transforms-getting-started-py
+        Argument/s:
+            subject_id - subject identifier.
+            study_id - study identifier.
+            dicom_id - DICOM identifier.
+        Returns:
+            image - Tensor of the CXR.
+        """
+        if self.extension == 'jpg':
+            if self.jpg_rocksdb_path is not None:
+                # Convert to bytes:
+                key = bytes(dicom_id, 'utf-8')
+                # Retrieve image:
+                image = bytearray(self.images_db.get(key))
+                image = torch.frombuffer(image, dtype=torch.uint8)
+                image = decode_image(image)
+            elif self.jpg_lmdb_path is not None:
+                # Convert to bytes:
+                key = bytes(dicom_id, 'utf-8')
+                # Retrieve image:
+                image = bytearray(self.txn.get(key))
+                image = torch.frombuffer(image, dtype=torch.uint8)
+                image = decode_image(image)
+            else:
+                image_file_path = mimic_cxr_image_path(self.dataset_dir, subject_id, study_id, dicom_id, self.extension)
+                image = read_image(image_file_path)
+        elif self.extension == 'dcm':
+            if self.images_rocksdb_path is not None:
+                key = dicom_id.encode('utf-8')
+                # Retrieve the serialized image shape associated with the key:
+                shape_bytes = self.images_db.get((self.shape_handle, key), key)
+                shape = struct.unpack('iii', shape_bytes)
+                np.frombuffer(shape_bytes, dtype=self.shape_dtype).reshape(3)
+                # Retrieve the serialized image data associated with the key:
+                image_bytes = self.images_db.get((self.image_handle, key), key)
+                image = np.frombuffer(image_bytes, dtype=self.image_dtype).reshape(*shape)
+            else:
+                image_file_path = mimic_cxr_image_path(self.dataset_dir, subject_id, study_id, dicom_id, self.extension)
+                image = load_and_preprocess_dcm_uint16(image_file_path)
+            # Convert to a torch tensor:
+            image = torch.from_numpy(image)
+        if self.transforms is not None:
+            image = self.transforms(image)
+        return image
+if __name__ == '__main__':
+    import time
+    from tqdm import tqdm
+    num_samples = 20
+    datasets = []
+    datasets.append(
+        StudyIDEDStayIDSubset(
+            dataset_dir='/datasets/work/hb-mlaifsp-mm/work/archive',
+            mimic_iv_duckdb_path='/scratch3/nic261/database/mimic_iv_duckdb_rev_b.db',
+            split='train',
+            extension='jpg',
+            ed_module=False,
+        ),
+    )
+    datasets.append(
+        StudyIDEDStayIDSubset(
+            dataset_dir='/scratch3/nic261/datasets',
+            mimic_iv_duckdb_path='/scratch3/nic261/database/mimic_iv_duckdb_rev_b.db',
+            split='train',
+            extension='jpg',
+            ed_module=False,
+        ),
+    )
+    datasets.append(
+        StudyIDEDStayIDSubset(
+            jpg_lmdb_path='/scratch3/nic261/database/mimic_cxr_jpg_lmdb_rev_a.db',
+            mimic_iv_duckdb_path='/scratch3/nic261/database/mimic_iv_duckdb_rev_b.db',
+            split='train',
+            extension='jpg',
+            ed_module=False,
+        ),
+    )
+    datasets.append(
+        StudyIDEDStayIDSubset(
+            jpg_rocksdb_path='/scratch3/nic261/database/mimic_cxr_jpg_rocksdb.db',
+            mimic_iv_duckdb_path='/scratch3/nic261/database/mimic_iv_duckdb_rev_b.db',
+            split='train',
+            extension='jpg',
+            ed_module=False,
+        )
+    )
+    assert (datasets[1][0]['images'][0] == datasets[2][0]['images'][0]).all().item()
+    assert (datasets[1][5]['images'][0] == datasets[2][5]['images'][0]).all().item()
+    for d in datasets:
+        start_time = time.time()
+        indices = torch.randperm(len(d))[:num_samples]  # Get random indices.
+        for i in tqdm(indices):
+            _ = d[i]
+        end_time = time.time()
+        elapsed_time = end_time - start_time
+        print(f"Elapsed time: {elapsed_time} seconds")

generation_config.json ADDED Viewed

	@@ -0,0 +1,7 @@

+{
+  "_from_model_config": true,
+  "bos_token_id": 1,
+  "eos_token_id": 2,
+  "pad_token_id": 4,
+  "transformers_version": "4.39.0"
+}

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e4b1ed2a5298bb8999cb91a9b905ace6733e5c66ebdef9702baa4d421428fad3
+size 644854104

modelling_cxrmate_ed.py ADDED Viewed

	@@ -0,0 +1,1129 @@

+import csv
+import functools
+import math
+import os
+import re
+from collections import OrderedDict
+from glob import glob
+from pathlib import Path
+from typing import Dict, List, Optional, Tuple, Union
+import duckdb
+import pandas as pd
+import streamlit as st
+import torch
+import transformers
+from torch.nn import CrossEntropyLoss
+from tqdm import tqdm
+from transformers import PreTrainedTokenizerFast, VisionEncoderDecoderModel
+from transformers.configuration_utils import PretrainedConfig
+from transformers.modeling_outputs import Seq2SeqLMOutput
+from transformers.modeling_utils import PreTrainedModel
+from transformers.models.vision_encoder_decoder.configuration_vision_encoder_decoder import (
+    VisionEncoderDecoderConfig,
+)
+from transformers.utils import logging
+from .dataset import StudyIDEDStayIDSubset
+from .modelling_uniformer import MultiUniFormerWithProjectionHead
+from .records import EDCXRSubjectRecords
+from .tables import ed_module_tables
+logger = logging.get_logger(__name__)
+def create_lookup_table(df, columns, start_idx):
+    df = df.groupby(columns).head(1)[columns].sort_values(by=columns)
+    indices = range(start_idx, start_idx + len(df))
+    df['index'] = indices
+    return df, indices[-1]
+class FNNEncoder(torch.nn.Module):
+    def __init__(self, num_features, intermediate_size, decoder_hidden_size):
+        super().__init__()
+        self.up_proj = torch.nn.Linear(num_features, intermediate_size, bias=False)
+        self.down_proj = torch.nn.Linear(intermediate_size, decoder_hidden_size, bias=False)
+        self.act_fn = torch.nn.SiLU()
+    def forward(self, x):
+        return self.down_proj(self.act_fn(self.up_proj(x)))
+class MIMICIVEDCXRMultimodalModel(VisionEncoderDecoderModel):
+    config_class = VisionEncoderDecoderConfig
+    base_model_prefix = "vision_encoder_decoder"
+    main_input_name = "input_ids"
+    supports_gradient_checkpointing = True
+    def __init__(
+        self,
+        config: Optional[PretrainedConfig] = None,
+        encoder: Optional[PreTrainedModel] = None,
+        decoder: Optional[PreTrainedModel] = None,
+        DefaultEncoderClass = MultiUniFormerWithProjectionHead,
+        DefaultDecoderClass = transformers.LlamaForCausalLM,
+    ):
+        if decoder:
+            assert not decoder.config.add_cross_attention, '"add_cross_attention" must be False for the given decoder'
+            assert decoder.config.is_decoder, '"is_decoder" must be True for the given decoder'
+        if config is None and (encoder is None or decoder is None):
+            raise ValueError("Either a configuration or an encoder and a decoder has to be provided.")
+        if config is None:
+            config = VisionEncoderDecoderConfig.from_encoder_decoder_configs(encoder.config, decoder.config)
+        else:
+            if not isinstance(config, self.config_class):
+                raise ValueError(f"Config: {config} has to be of type {self.config_class}")
+        config.tie_word_embeddings = False
+        # Initialize with config:
+        PreTrainedModel.__init__(self, config)
+        # Encoder:
+        if encoder is None:
+            encoder = DefaultEncoderClass(config=config.encoder)
+        # Decoder:
+        if decoder is None:
+            assert not config.decoder.add_cross_attention
+            decoder = DefaultDecoderClass(config=config.decoder)
+        self.encoder = encoder
+        self.decoder = decoder
+        if self.encoder.config.to_dict() != self.config.encoder.to_dict():
+            logger.warning(
+                f"Config of the encoder: {self.encoder.__class__} is overwritten by shared encoder config:"
+                f" {self.config.encoder}"
+            )
+        if self.decoder.config.to_dict() != self.config.decoder.to_dict():
+            logger.warning(
+                f"Config of the decoder: {self.decoder.__class__} is overwritten by shared decoder config:"
+                f" {self.config.decoder}"
+            )
+        self.encoder.config = self.config.encoder
+        self.decoder.config = self.config.decoder
+        assert config.decoder.is_decoder
+        assert not config.decoder.is_encoder_decoder
+        assert 'pad_token_id' in self.decoder.config.__dict__
+        assert 'time_delta_monotonic_inversion' in self.decoder.config.__dict__
+        assert 'zero_time_delta_value' in self.decoder.config.__dict__
+        assert 'add_time_deltas' in self.decoder.config.__dict__
+        assert isinstance(self.decoder.config.time_delta_monotonic_inversion, bool)
+        assert isinstance(self.decoder.config.zero_time_delta_value, float)
+        for k, v in self.decoder.config.index_value_encoder_config.items():
+            setattr(
+                self,
+                f'{k}_index_value_encoder',
+                FNNEncoder(
+                    num_features=v,
+                    intermediate_size=self.decoder.config.index_value_encoder_intermediate_size,
+                    decoder_hidden_size=self.decoder.config.hidden_size,
+                ),
+            )
+        if self.decoder.config.add_time_deltas:
+            self.time_delta_encoder = FNNEncoder(
+                num_features=1,
+                intermediate_size=self.decoder.config.index_value_encoder_intermediate_size,
+                decoder_hidden_size=self.decoder.config.hidden_size,
+            )
+        self.token_type_embeddings = torch.nn.Embedding(self.decoder.config.num_token_types, self.decoder.config.hidden_size)
+    @classmethod
+    def from_encoder_decoder_pretrained(
+        cls,
+        encoder_pretrained_model_name_or_path: str = None,
+        decoder_pretrained_model_name_or_path: str = None,
+        *model_args,
+        **kwargs,
+    ) -> PreTrainedModel:
+        r"""
+        Instantiate an encoder and a decoder from one or two base classes of the library from pretrained model
+        checkpoints.
+        The model is set in evaluation mode by default using `model.eval()` (Dropout modules are deactivated). To train
+        the model, you need to first set it back in training mode with `model.train()`.
+        Params:
+            encoder_pretrained_model_name_or_path (`str`, *optional*):
+                Information necessary to initiate the image encoder. Can be either:
+                    - A string, the *model id* of a pretrained model hosted inside a model repo on huggingface.co. An
+                      example is `google/vit-base-patch16-224-in21k`.
+                    - A path to a *directory* containing model weights saved using
+                      [`~PreTrainedModel.save_pretrained`], e.g., `./my_model_directory/`.
+                    - A path or url to a *tensorflow index checkpoint file* (e.g, `./tf_model/model.ckpt.index`). In
+                      this case, `from_tf` should be set to `True` and a configuration object should be provided as
+                      `config` argument. This loading path is slower than converting the TensorFlow checkpoint in a
+                      PyTorch model using the provided conversion scripts and loading the PyTorch model afterwards.
+            decoder_pretrained_model_name_or_path (`str`, *optional*, defaults to `None`):
+                Information necessary to initiate the text decoder. Can be either:
+                    - A string, the *model id* of a pretrained model hosted inside a model repo on huggingface.co.
+                    - A path to a *directory* containing model weights saved using
+                      [`~PreTrainedModel.save_pretrained`], e.g., `./my_model_directory/`.
+                    - A path or url to a *tensorflow index checkpoint file* (e.g, `./tf_model/model.ckpt.index`). In
+                      this case, `from_tf` should be set to `True` and a configuration object should be provided as
+                      `config` argument. This loading path is slower than converting the TensorFlow checkpoint in a
+                      PyTorch model using the provided conversion scripts and loading the PyTorch model afterwards.
+            model_args (remaining positional arguments, *optional*):
+                All remaning positional arguments will be passed to the underlying model's `__init__` method.
+            kwargs (remaining dictionary of keyword arguments, *optional*):
+                Can be used to update the configuration object (after it being loaded) and initiate the model (e.g.,
+                `output_attentions=True`).
+                - To update the encoder configuration, use the prefix *encoder_* for each configuration parameter.
+                - To update the decoder configuration, use the prefix *decoder_* for each configuration parameter.
+                - To update the parent model configuration, do not use a prefix for each configuration parameter.
+                Behaves differently depending on whether a `config` is provided or automatically loaded.
+        Example:
+        ```python
+        >>> from transformers import VisionEncoderDecoderModel
+        >>> # initialize a vit-bert from a pretrained ViT and a pretrained BERT model. Note that the cross-attention layers will be randomly initialized
+        >>> model = VisionEncoderDecoderModel.from_encoder_decoder_pretrained(
+        ...     "google/vit-base-patch16-224-in21k", "google-bert/bert-base-uncased"
+        ... )
+        >>> # saving model after fine-tuning
+        >>> model.save_pretrained("./vit-bert")
+        >>> # load fine-tuned model
+        >>> model = VisionEncoderDecoderModel.from_pretrained("./vit-bert")
+        ```"""
+        kwargs_encoder = {
+            argument[len("encoder_") :]: value for argument, value in kwargs.items() if argument.startswith("encoder_")
+        }
+        kwargs_decoder = {
+            argument[len("decoder_") :]: value for argument, value in kwargs.items() if argument.startswith("decoder_")
+        }
+        # remove encoder, decoder kwargs from kwargs
+        for key in kwargs_encoder.keys():
+            del kwargs["encoder_" + key]
+        for key in kwargs_decoder.keys():
+            del kwargs["decoder_" + key]
+        # Load and initialize the encoder and decoder
+        # The distinction between encoder and decoder at the model level is made
+        # by the value of the flag `is_decoder` that we need to set correctly.
+        encoder = kwargs_encoder.pop("model", None)
+        if encoder is None:
+            if encoder_pretrained_model_name_or_path is None:
+                raise ValueError(
+                    "If `encoder_model` is not defined as an argument, a `encoder_pretrained_model_name_or_path` has "
+                    "to be defined."
+                )
+            if "config" not in kwargs_encoder:
+                encoder_config, kwargs_encoder = transformers.AutoConfig.from_pretrained(
+                    encoder_pretrained_model_name_or_path, **kwargs_encoder, return_unused_kwargs=True
+                )
+                if encoder_config.is_decoder is True or encoder_config.add_cross_attention is True:
+                    logger.info(
+                        f"Initializing {encoder_pretrained_model_name_or_path} as a encoder model "
+                        "from a decoder model. Cross-attention and casual mask are disabled."
+                    )
+                    encoder_config.is_decoder = False
+                    encoder_config.add_cross_attention = False
+                kwargs_encoder["config"] = encoder_config
+            encoder = transformers.AutoModel.from_pretrained(encoder_pretrained_model_name_or_path, *model_args, **kwargs_encoder)
+        decoder = kwargs_decoder.pop("model", None)
+        if decoder is None:
+            if decoder_pretrained_model_name_or_path is None:
+                raise ValueError(
+                    "If `decoder_model` is not defined as an argument, a `decoder_pretrained_model_name_or_path` has "
+                    "to be defined."
+                )
+            if "config" not in kwargs_decoder:
+                decoder_config, kwargs_decoder = transformers.AutoConfig.from_pretrained(
+                    decoder_pretrained_model_name_or_path, **kwargs_decoder, return_unused_kwargs=True
+                )
+                if decoder_config.is_decoder is False or decoder_config.add_cross_attention is False:
+                    logger.info(
+                        f"Initializing {decoder_pretrained_model_name_or_path} as a decoder model. Cross attention"
+                        f" layers are added to {decoder_pretrained_model_name_or_path} and randomly initialized if"
+                        f" {decoder_pretrained_model_name_or_path}'s architecture allows for cross attention layers."
+                    )
+                    decoder_config.is_decoder = True
+                    decoder_config.add_cross_attention = False
+                kwargs_decoder["config"] = decoder_config
+            if kwargs_decoder["config"].is_decoder is False or kwargs_decoder["config"].add_cross_attention is False:
+                logger.warning(
+                    f"Decoder model {decoder_pretrained_model_name_or_path} is not initialized as a decoder. "
+                    f"In order to initialize {decoder_pretrained_model_name_or_path} as a decoder, "
+                    "make sure that the attributes `is_decoder` and `add_cross_attention` of `decoder_config` "
+                    "passed to `.from_encoder_decoder_pretrained(...)` are set to `True` or do not pass a "
+                    "`decoder_config` to `.from_encoder_decoder_pretrained(...)`"
+                )
+            decoder = transformers.AutoModelForCausalLM.from_pretrained(decoder_pretrained_model_name_or_path, **kwargs_decoder)
+        # instantiate config with corresponding kwargs
+        config = VisionEncoderDecoderConfig.from_encoder_decoder_configs(encoder.config, decoder.config, **kwargs)
+        # make sure input & output embeddings is not tied
+        config.tie_word_embeddings = False
+        return cls(encoder=encoder, decoder=decoder, config=config)
+    def forward(
+        self,
+        decoder_input_ids: Optional[torch.LongTensor] = None,
+        decoder_attention_mask: Optional[torch.FloatTensor] = None,
+        decoder_token_type_ids: Optional[torch.LongTensor] = None,
+        encoder_outputs: Optional[Tuple[torch.FloatTensor]] = None,
+        past_key_values: Optional[Tuple[Tuple[torch.FloatTensor]]] = None,
+        decoder_inputs_embeds: Optional[torch.FloatTensor] = None,
+        decoder_position_ids: Optional[torch.LongTensor] = None,
+        labels: Optional[torch.LongTensor] = None,
+        use_cache: Optional[bool] = None,
+        output_attentions: Optional[bool] = None,
+        output_hidden_states: Optional[bool] = None,
+        return_dict: Optional[bool] = None,
+        **kwargs,
+    ) -> Union[Tuple[torch.FloatTensor], Seq2SeqLMOutput]:
+        return_dict = return_dict if return_dict is not None else self.config.use_return_dict
+        kwargs_decoder = {
+            argument[len("decoder_") :]: value for argument, value in kwargs.items() if argument.startswith("decoder_")
+        }
+        assert decoder_position_ids is not None
+        assert decoder_attention_mask is not None
+        assert decoder_attention_mask.dtype == torch.long, f'The dtype for {decoder_attention_mask} was {decoder_attention_mask.dtype}. It should be torch.long'
+        assert decoder_token_type_ids is not None
+        if decoder_inputs_embeds is None:
+            decoder_inputs_embeds = self.decoder.get_input_embeddings()(decoder_input_ids)
+        decoder_inputs_embeds += self.token_type_embeddings(decoder_token_type_ids)
+        # Generation:
+        decoder_outputs = self.decoder(
+            inputs_embeds=decoder_inputs_embeds,
+            attention_mask=decoder_attention_mask,
+            position_ids=decoder_position_ids,
+            output_attentions=output_attentions,
+            output_hidden_states=output_hidden_states,
+            use_cache=use_cache,
+            past_key_values=past_key_values,
+            return_dict=return_dict,
+            **kwargs_decoder,
+        )
+        # Loss:
+        loss = None
+        if labels is not None:
+            logits = decoder_outputs.logits if return_dict else decoder_outputs[0]
+            loss_fct = CrossEntropyLoss()
+            loss = loss_fct(logits.reshape(-1, self.decoder.config.vocab_size), labels.reshape(-1))
+        if not return_dict:
+            if loss is not None:
+                return (loss,) + decoder_outputs + encoder_outputs
+            else:
+                return decoder_outputs + encoder_outputs
+        return Seq2SeqLMOutput(
+            loss=loss,
+            logits=decoder_outputs.logits,
+            past_key_values=decoder_outputs.past_key_values,
+            decoder_hidden_states=decoder_outputs.hidden_states,
+            decoder_attentions=decoder_outputs.attentions,
+        )
+    def prepare_inputs_for_generation(
+        self,
+        input_ids,
+        special_token_ids,
+        prompt_attention_mask,
+        prompt_position_ids,
+        token_type_id_sections=None,
+        past_key_values=None,
+        use_cache=None,
+        **kwargs,
+    ):
+        """
+        Modification of:
+            https://github.com/huggingface/transformers/blob/main/src/transformers/models/encoder_decoder/modeling_encoder_decoder.py#L660
+        """
+        report_attention_mask = (input_ids != self.decoder.config.pad_token_id).long()
+        if past_key_values is None:
+            # 4D attention mask:
+            decoder_attention_mask = self.create_4d_attention_mask_mixed_causality(prompt_attention_mask, report_attention_mask)
+            # Position identifiers accounting for padding:
+            report_position_ids = report_attention_mask.cumsum(-1) + prompt_position_ids.max(dim=1).values[:, None]
+            report_position_ids.masked_fill_(report_attention_mask == 0, 1)
+            decoder_position_ids = torch.cat([prompt_position_ids, report_position_ids], dim=1)
+            # `inputs_embeds` are only to be used in the 1st generation step:
+            inputs_embeds = torch.cat([kwargs['decoder_inputs_embeds'], self.decoder.get_input_embeddings()(input_ids)], dim=1)
+            decoder_token_type_ids = self.token_ids_to_token_type_ids(input_ids, special_token_ids, token_type_id_sections)
+            decoder_token_type_ids = torch.cat(
+                [
+                    kwargs['decoder_token_type_ids'],
+                    decoder_token_type_ids,
+                ],
+                dim=1,
+            )  # Add image token type identifiers.
+            input_dict = {
+                'decoder_input_ids': input_ids,
+                'decoder_inputs_embeds': inputs_embeds,
+                'decoder_token_type_ids': decoder_token_type_ids,
+            }
+        else:
+            # 4D attention mask:
+            decoder_attention_mask = self.create_4d_attention_mask_mixed_causality_past_key_values(prompt_attention_mask, report_attention_mask)
+            # Position identifiers accounting for padding:
+            decoder_position_ids = report_attention_mask.cumsum(-1) + prompt_position_ids.max(dim=1).values[:, None]
+            decoder_position_ids.masked_fill_(report_attention_mask == 0, 1)
+            # Always place token_ids_to_token_type_ids_past_key_values before input_ids = input_ids[:, remove_prefix_length:]:
+            decoder_token_type_ids = self.token_ids_to_token_type_ids_past_key_values(input_ids, special_token_ids, token_type_id_sections)
+            decoder_position_ids = decoder_position_ids[:, -1:]
+            past_length = past_key_values[0][0].shape[2]
+            # Some generation methods only pass the last input ID:
+            if input_ids.shape[1] > past_length:
+                remove_prefix_length = past_length
+            else:
+                # Keep only the final ID:
+                remove_prefix_length = input_ids.shape[1] - 1
+            input_ids = input_ids[:, remove_prefix_length:]
+            input_dict = {'decoder_input_ids': input_ids, 'decoder_token_type_ids': decoder_token_type_ids}
+        input_dict.update(
+            {
+                'decoder_attention_mask': decoder_attention_mask,
+                'decoder_position_ids': decoder_position_ids,
+                'past_key_values': past_key_values,
+                'use_cache': use_cache,
+            }
+        )
+        return input_dict
+    def token_ids_to_token_type_ids(self, token_ids, special_token_ids, token_type_id_sections=None):
+        """
+        Extract token type identifiers from the token identifiers.
+        Argument/s:
+            token_ids - token identifiers.
+            special_token_ids - special token identifiers that indicate the separation between sections.
+            token_type_id_section - token type identifier for each section.
+        Returns:
+            token_type_ids - token type identifiers.
+        """
+        token_type_id_sections = token_type_id_sections if token_type_id_sections is not None else list(range(len(special_token_ids) + 1))
+        mbatch_size, seq_len = token_ids.shape
+        token_type_ids = torch.full_like(token_ids, token_type_id_sections[0], dtype=torch.long, device=token_ids.device)
+        for i, j in enumerate(special_token_ids):
+            # Find first occurrence of special tokens that indicate the boundary between sections:
+            cols = (token_ids == j).int().argmax(dim=1)
+            rows = torch.arange(mbatch_size, device=token_ids.device)
+            # https://huggingface.co/docs/transformers/model_doc/bert#transformers.BertTokenizer.create_token_type_ids_from_sequences.example
+            cols += 1
+            # Ensure that the column index is not out of bounds. If 0, then token_id not present.
+            # This is safe as index 0 is always a special token (now equal to 1 due to +1):
+            rows = rows[torch.logical_and(cols != 1, cols < seq_len)]
+            cols = cols[torch.logical_and(cols != 1, cols < seq_len)]
+            # Indices to that correspond to the second sequence:
+            if rows.nelement() != 0:
+                ids = torch.stack([
+                    torch.stack([x, z]) for (x, y) in zip(rows, cols) for z in torch.arange(
+                        y, seq_len, device=token_ids.device,
+                    )
+                ])
+                token_type_ids[ids[:, 0], ids[:, 1]] = token_type_id_sections[i + 1]
+        return token_type_ids
+    def token_ids_to_token_type_ids_past_key_values(self, token_ids, special_token_ids, token_type_id_sections=None):
+        """
+        Extract token type identifiers from the token identifiers if past != None. Make sure to input all the
+        token_ids (e.g., do not input input_ids = input_ids[:, remove_prefix_length:] from prepare_inputs_for_generation).
+        Argument/s:
+            token_ids - token identifiers.
+            special_token_ids - special token identifiers that indicate the separation between sections.
+        Returns:
+            token_type_ids - token type identifiers.
+        """
+        token_type_id_sections = token_type_id_sections if token_type_id_sections is not None else list(range(len(special_token_ids) + 1))
+        token_type_ids = torch.full([token_ids.shape[0], 1], token_type_id_sections[0], dtype=torch.long, device=token_ids.device)
+        # https://huggingface.co/docs/transformers/model_doc/bert#transformers.BertTokenizer.create_token_type_ids_from_sequences.example
+        token_ids = token_ids[:, :-1]
+        for i, j in enumerate(special_token_ids):
+            # Find first occurrence of special token, which indicates the boundary between sections:
+            exists = torch.any(token_ids == j, dim=1, keepdim=True)
+            token_type_ids[exists] = token_type_id_sections[i + 1]
+        return token_type_ids
+    def tokenize_report_teacher_forcing(self, findings: str, impression: str, tokenizer: PreTrainedTokenizerFast, max_len: int):
+        """
+        Tokenize the reports and creates the inputs and targets for teacher forcing.
+        Argument/s:
+            findings - findings sections.
+            impression - impression sections.
+            return_token_type_ids - return the token type identifiers.
+            tokenizer - Hugging Face tokenizer.
+            max_len - maximum number of tokens.
+        Returns:
+            decoder_input_ids - the token identifiers for the input of the decoder.
+            decoder_attention_mask - the attention mask for the decoder_input_ids.
+            label_ids - the label token identifiers for the decoder.
+        """
+        # Prepare the sections for the tokenizer by placing special tokens between each section:
+        reports = [f'{tokenizer.bos_token}{i}{tokenizer.sep_token}{j}{tokenizer.eos_token}' for i, j in
+                  zip(findings, impression)]
+        # Tokenize the report:
+        tokenized = tokenizer(
+            reports,
+            padding='longest',
+            truncation=True,
+            max_length=max_len + 1,  # +1 to account for the bias between input and target.
+            return_tensors='pt',
+            return_token_type_ids=False,
+            add_special_tokens=False,
+        ).to(self.device)
+        # Modify for language modelling:
+        batch_dict = {
+            # Labels for the decoder (shifted right by one for autoregression):
+            'label_ids': tokenized['input_ids'][:, 1:].detach().clone(),
+            # Remove last token identifier to match the sequence length of the labels:
+            'decoder_input_ids': tokenized['input_ids'][:, :-1],
+            # Attention mask for the decoder_input_ids (remove first token so that the eos_token_id is not considered):
+            'decoder_attention_mask': tokenized['attention_mask'][:, 1:],
+        }
+        return batch_dict
+    def tokenize_report_teacher_forcing_rev_a(self, tokenizer: PreTrainedTokenizerFast, max_len: int, findings: Optional[str] = None, impression: Optional[str] = None, reports: Optional[str] = None):
+        """
+        Tokenize the reports and creates the inputs and targets for teacher forcing.
+        Argument/s:
+            tokenizer - Hugging Face tokenizer.
+            max_len - maximum number of tokens.
+            findings - findings sections.
+            impression - impression sections.
+            reports - prepared reports, with special tokens and report sections.
+        Returns:
+            decoder_input_ids - the token identifiers for the input of the decoder.
+            decoder_attention_mask - the attention mask for the decoder_input_ids.
+            label_ids - the label token identifiers for the decoder.
+        """
+        # Prepare the sections for the tokenizer by placing special tokens between each section:
+        if reports is None:
+            assert findings and impression, "If 'reports' is not defined, 'findings' and 'impression' need to be defined."
+            reports = [f'{tokenizer.bos_token}{i}{tokenizer.sep_token}{j}{tokenizer.eos_token}' for i, j in
+                    zip(findings, impression)]
+        # Tokenize the report:
+        tokenized = tokenizer(
+            reports,
+            padding='longest',
+            truncation=True,
+            max_length=max_len + 1,  # +1 to account for the bias between input and target.
+            return_tensors='pt',
+            return_token_type_ids=False,
+            add_special_tokens=False,
+        ).to(self.device)
+        # Modify for language modelling:
+        batch_dict = {
+            # Labels for the decoder (shifted right by one for autoregression):
+            'label_ids': tokenized['input_ids'][:, 1:].detach().clone(),
+            # Remove last token identifier to match the sequence length of the labels:
+            'decoder_input_ids': tokenized['input_ids'][:, :-1],
+            # Attention mask for the decoder_input_ids (remove first token so that the eos_token_id is not considered):
+            'decoder_attention_mask': tokenized['attention_mask'][:, 1:],
+        }
+        return batch_dict
+    def split_and_decode_sections(self, token_ids, special_token_ids, tokenizer: PreTrainedTokenizerFast):
+        """
+        Split the token identifiers into sections, then convert the token identifiers into strings.
+        Argument/s:
+            token_ids - token identifiers.
+            special_token_ids - special token identifiers that indicate the end of each section.
+            tokenizer - Hugging Face tokenizer.
+        Returns:
+            token_type_ids - token type identifiers.
+        """
+        _, seq_len = token_ids.shape
+        # The number of sections is the same as the number of special_token_ids:
+        num_sections = len(special_token_ids)
+        sections = {k: [] for k in range(num_sections)}
+        for i in token_ids:
+            prev_col = 0
+            for j, k in enumerate(special_token_ids):
+                # The maximum sequence length was exceeded, thus no more tokens:
+                if prev_col >= seq_len:
+                    sections[j].append('')
+                    continue
+                # Find first occurrence of special tokens that indicate the boundary between sections:
+                col = (i == k).int().argmax().item()
+                # If equal to 0, token was not found, set the column to the sequence length (as the decoder exceeded
+                # the maximum sequence length):
+                if col == 0:
+                    col = seq_len
+                # Extract section token identifiers:
+                section_token_ids = i[prev_col:col]
+                prev_col = col
+                section_string = tokenizer.decode(section_token_ids, skip_special_tokens=True)
+                sections[j].append(section_string)
+        return tuple(sections.values())
+    def tokenize_text_columns(self, tokenizer: PreTrainedTokenizerFast, **kwargs):
+        """
+        Tokenize the text columns from MIMIC-IV ED and MIMIC-CXR (excluding the findings and impression sections).
+        Time deltas for the input_ids are also prepared here.
+        Argument/s:
+            tokenizer - Hugging Face tokenizer.
+        Returns:
+            ed - dictionary containing the input_ids, token_type_ids, attention_mask and time_deltas for the ED module columns.
+            cxr - dictionary containing the input_ids, token_type_ids, and attention_mask for MIMIC-CXR columns.
+        """
+        batch_size = len(kwargs['index'])
+        tokenized = {
+            'input_ids': {i: [] for i in range(batch_size)},
+            'token_type_ids': {i: [] for i in range(batch_size)},
+            'time_delta': {i: [] for i in range(batch_size)},
+            'attention_mask': torch.empty(batch_size, 0, 1, device=self.device),
+        }
+        for i in self.decoder.config.ed_module_columns + self.decoder.config.mimic_cxr_columns + ['previous_findings', 'previous_impression']:
+            if i in kwargs:
+                if f'{i}_time_delta' not in kwargs:
+                    kwargs[f'{i}_time_delta'] = [[self.decoder.config.zero_time_delta_value for _ in j] if j is not None else None for j in kwargs[i]]
+                for x, (y, z) in enumerate(zip(kwargs[i], kwargs[f'{i}_time_delta'])):
+                    if y is not None:
+                        assert isinstance(y, list)
+                        assert isinstance(z, list)
+                        for text, time_delta in zip(y, z):
+                            tokenized['input_ids'][x].append(
+                                tokenizer(text, add_special_tokens=False, return_tensors='pt')['input_ids'].to(device=self.device)
+                            )
+                            tokenized['token_type_ids'][x].append(
+                                torch.full(
+                                    (1, tokenized['input_ids'][x][-1].shape[-1]),
+                                    self.decoder.config.token_type_to_token_type_id[i],
+                                    dtype=torch.long,
+                                    device=self.device,
+                                )
+                            )
+                            tokenized['time_delta'][x].append(
+                                torch.full(
+                                    (1, tokenized['input_ids'][x][-1].shape[-1]),
+                                    time_delta,
+                                    dtype=torch.float32,
+                                    device=self.device,
+                                )
+                            )
+        tokenized['input_ids'] = [torch.cat(j, dim=1).T if j else torch.empty(0, 1, dtype=torch.long, device=self.device) for j in tokenized['input_ids'].values()]
+        tokenized['token_type_ids'] = [torch.cat(j, dim=1).T if j else torch.empty(0, 1, dtype=torch.long, device=self.device) for j in tokenized['token_type_ids'].values()]
+        tokenized['time_delta'] = [torch.cat(j, dim=1).T if j else torch.empty(0, 1, device=self.device) for j in tokenized['time_delta'].values()]
+        tokenized['input_ids'] = torch.nn.utils.rnn.pad_sequence(
+            tokenized['input_ids'], batch_first=True, padding_value=tokenizer.pad_token_id
+        )[:, :, 0]
+        tokenized['token_type_ids'] = torch.nn.utils.rnn.pad_sequence(
+            tokenized['token_type_ids'], batch_first=True, padding_value=0,
+        )[:, :, 0]
+        tokenized['attention_mask'] = (tokenized['input_ids'] != tokenizer.pad_token_id).int()
+        tokenized['time_delta'] = torch.nn.utils.rnn.pad_sequence(
+            tokenized['time_delta'], batch_first=True, padding_value=0,
+        )
+        return tokenized
+    def prepare_inputs(
+        self,
+        images,
+        tokenizer: PreTrainedTokenizerFast,
+        tokenized_report=None,
+        sep_token_id=None,
+        section_ids=None,
+        **batch,
+    ):
+        """
+        Tokenize the text columns from MIMIC-IV ED and MIMIC-CXR (excluding the findings and impression sections).
+        Argument/s:
+            images - images.
+            tokenizer - Hugging Face tokenizer.
+            tokenized_report - if training/teacher forcing, input the tokenized_report dict to include it in the prepared inputs.
+            separator_token_id - separator token identifier.
+            section_ids - section identifiers for the findings and impression sections.
+        Returns:
+            inputs_embeds - input embeddings.
+            attention_mask - attention mask.
+            token_type_ids - token type identifiers.
+            position_ids - position identifiers.
+            bos_token_ids - bos_token_ids for generation.
+        """
+        input_ids = []
+        inputs_embeds = []
+        token_type_ids = []
+        attention_mask = []
+        time_delta = []
+        position_ids = None
+        bos_token_ids = None
+        # Index and value columns:
+        batch_size = len(batch['index'])
+        for k in self.decoder.config.index_value_encoder_config.keys():
+            if f'{k}_index_value_feats' not in batch:
+                batch[f'{k}_index_value_feats'] = torch.empty(batch_size, 0, self.decoder.config.index_value_encoder_config[k], device=self.device)
+            inputs_embeds.append(
+                getattr(self, f'{k}_index_value_encoder')(batch[f'{k}_index_value_feats'])
+            )
+            token_type_ids.append(batch[f'{k}_index_value_token_type_ids'] if f'{k}_index_value_token_type_ids' in batch else torch.empty(batch_size, 0, dtype=torch.long, device=self.device))
+            attention_mask.append(batch[f'{k}_index_value_mask'] if f'{k}_index_value_mask' in batch else torch.empty(batch_size, 0, dtype=torch.long, device=self.device))
+            if f'{k}_time_delta' in batch:
+                time_delta.append(batch[f'{k}_time_delta'])
+            else:
+                time_delta_index_value = torch.zeros(*batch[f'{k}_index_value_mask'].shape, 1, device=self.device) if f'{k}_index_value_mask' in batch else torch.empty(batch_size, 0, 1, device=self.device)
+                time_delta.append(time_delta_index_value)
+        # Tokenize text columns for prompt:
+        tokenized = self.tokenize_text_columns(tokenizer, **batch)
+        input_ids.append(tokenized['input_ids'])
+        token_type_ids.append(tokenized['token_type_ids'])
+        attention_mask.append(tokenized['attention_mask'])
+        time_delta.append(tokenized['time_delta'])
+        # Image encoder:
+        encoder_outputs = self.encoder(images)
+        inputs_embeds.append(encoder_outputs[0])
+        inputs_per_image = encoder_outputs[0].shape[-2] // images.shape[1]
+        padded_image_time_deltas = [i + [self.decoder.config.zero_time_delta_value] * (images.shape[1] - len(i)) for i in batch['image_time_deltas']]
+        time_delta_image_features = torch.tensor(padded_image_time_deltas, device=self.device).repeat_interleave(inputs_per_image, dim=1)
+        token_type_ids.append(
+            torch.where(
+                time_delta_image_features == self.decoder.config.zero_time_delta_value,
+                self.decoder.config.token_type_to_token_type_id['image'],
+                self.decoder.config.token_type_to_token_type_id['previous_image'],
+            ),
+        )
+        attention_mask.append(encoder_outputs[1])
+        time_delta.append(time_delta_image_features[:, :, None])
+        # Compute embeddings from token identifiers:
+        input_ids = torch.cat(input_ids, dim=1)
+        inputs_embeds.append(self.decoder.get_input_embeddings()(input_ids))
+        # Concatentate time deltas and input embeddings before adding time delta embedding to prompt:
+        time_delta = torch.cat(time_delta, dim=1)
+        inputs_embeds = torch.cat(inputs_embeds, dim=1)
+        # Add time delta embeddings to prompt:
+        if time_delta.shape[1] > 0 and self.decoder.config.add_time_deltas:
+            time_delta = time_delta.to(dtype=inputs_embeds.dtype)
+            inputs_embeds += self.time_delta_encoder(time_delta)
+        # Concatentate the attention mask:
+        attention_mask = torch.cat(attention_mask, dim=1)
+        # Position identifiers:
+        position_ids = self.position_ids_from_time_deltas_and_attention_mask(time_delta, attention_mask)
+        # Tokenize report:
+        if tokenized_report is not None:
+            inputs_embeds = torch.cat([inputs_embeds, self.decoder.get_input_embeddings()(tokenized_report['decoder_input_ids'])], dim=1)
+            report_token_type_ids = self.token_ids_to_token_type_ids(
+                token_ids=tokenized_report['decoder_input_ids'],
+                special_token_ids=[sep_token_id],
+                token_type_id_sections=section_ids,
+            )
+            token_type_ids.append(report_token_type_ids)
+            # Position identifiers accounting for padding:
+            report_position_ids = tokenized_report['decoder_attention_mask'].cumsum(-1) + position_ids.max(dim=1).values[:, None]
+            report_position_ids.masked_fill_(tokenized_report['decoder_attention_mask'] == 0, 1)
+            position_ids = torch.cat([position_ids, report_position_ids], dim=1)
+            # 4D attention mask:
+            attention_mask = self.create_4d_attention_mask_mixed_causality(attention_mask, tokenized_report['decoder_attention_mask'])
+            # attention_mask_diagonal = torch.diagonal(attention_mask[:, 0], dim1=1, dim2=2)
+        else:
+            # BOS token identifiers for inference/generation:
+            bos_token_ids = torch.full((encoder_outputs[0].shape[0], 1), tokenizer.bos_token_id, dtype=torch.long, device=self.device)
+        # Concatentate the token type identifiers:
+        token_type_ids = torch.cat(token_type_ids, dim=1)
+        assert inputs_embeds.shape[1] == attention_mask.shape[-1]
+        assert inputs_embeds.shape[1] == token_type_ids.shape[1]
+        return inputs_embeds, attention_mask, token_type_ids, position_ids, bos_token_ids
+    @staticmethod
+    def create_4d_attention_mask_mixed_causality(non_causal_2d_attention_mask, causal_2d_attention_mask):
+        prompt_seq_len = non_causal_2d_attention_mask.shape[-1]
+        report_seq_len = causal_2d_attention_mask.shape[-1]
+        non_causal_2d_attention_mask = non_causal_2d_attention_mask[:, None, None, :]
+        causal_2d_attention_mask = causal_2d_attention_mask[:, None, None, :]
+        # Upper left of attention matrix:
+        upper_left = non_causal_2d_attention_mask.expand(-1, -1, prompt_seq_len, -1)
+        upper_left = upper_left * non_causal_2d_attention_mask
+        upper_left = upper_left * non_causal_2d_attention_mask.permute(0, 1, 3, 2)
+        causal_mask = torch.tril(
+            torch.ones(
+                (
+                    report_seq_len,
+                    report_seq_len,
+                ),
+                dtype=torch.long,
+                device=causal_2d_attention_mask.device,
+            ),
+        )
+        # Lower right of attention matrix:
+        lower_right = causal_2d_attention_mask.expand(-1, -1, report_seq_len, -1)
+        lower_right = lower_right * causal_2d_attention_mask.permute(0, 1, 3, 2)
+        lower_right = lower_right * causal_mask
+        # Upper right of attention matrix:
+        upper_right = torch.zeros(
+            causal_2d_attention_mask.shape[0],
+            1,
+            prompt_seq_len,
+            report_seq_len,
+            dtype=torch.long,
+            device=causal_2d_attention_mask.device,
+        )
+        # Lower left of attention matrix:
+        lower_left = non_causal_2d_attention_mask.expand(-1, -1, report_seq_len, -1)
+        lower_left = lower_left * causal_2d_attention_mask.permute(0, 1, 3, 2)
+        left = torch.cat((upper_left, lower_left), dim=2)
+        right = torch.cat((upper_right, lower_right), dim=2)
+        mixed_causality_4d_attention_mask = torch.cat((left, right), dim=-1)
+        return mixed_causality_4d_attention_mask
+    @staticmethod
+    def create_4d_attention_mask_mixed_causality_past_key_values(non_causal_2d_attention_mask, causal_2d_attention_mask):
+        non_causal_2d_attention_mask = non_causal_2d_attention_mask[:, None, None, :]
+        causal_2d_attention_mask = causal_2d_attention_mask[:, None, None, :]
+        mixed_causality_4d_attention_mask = torch.cat((non_causal_2d_attention_mask, causal_2d_attention_mask), dim=-1)
+        return mixed_causality_4d_attention_mask
+    def position_ids_from_time_deltas_and_attention_mask(self, time_deltas, attention_mask):
+        _, col_indices = torch.sort(torch.where(attention_mask == 1, time_deltas[:, :, 0], torch.finfo(time_deltas.dtype).min), descending=not self.decoder.config.time_delta_monotonic_inversion)
+        num_rows, num_cols, _ = time_deltas.shape
+        row_indices = torch.arange(num_rows, device=time_deltas.device).view(-1, 1).repeat(1, num_cols).view(-1)
+        position_ids = torch.zeros_like(col_indices, device=time_deltas.device)
+        position_ids[row_indices, col_indices.flatten()] = torch.arange(num_cols, device=time_deltas.device)[None, :].expand(num_rows, -1).flatten()
+        position_ids.masked_fill_(attention_mask == 0, 1)  # Following: https://github.com/huggingface/transformers/blob/c5f0288bc7d76f65996586f79f69fba8867a0e67/src/transformers/models/llama/modeling_llama.py#L1285
+        return position_ids
+    @staticmethod
+    def prepare_data(physionet_dir, database_path, dataset_dir=None):
+        dataset_dir = physionet_dir if dataset_dir is None else dataset_dir
+        sectioned_dir = os.path.join(dataset_dir, 'mimic_cxr_sectioned')
+        mimic_cxr_sectioned_path = os.path.join(sectioned_dir, 'mimic_cxr_sectioned.csv')
+        if not os.path.exists(mimic_cxr_sectioned_path):
+            print(f'{mimic_cxr_sectioned_path} does not exist, creating...')
+            # Check if reports exist. Reports for the first and last patients are checked only for speed, this comprimises comprehensiveness for speed:
+            report_paths = [
+                os.path.join(physionet_dir, 'mimic-cxr/2.0.0/files/p10/p10000032/s50414267.txt'),
+                os.path.join(physionet_dir, 'mimic-cxr/2.0.0/files/p10/p10000032/s53189527.txt'),
+                os.path.join(physionet_dir, 'mimic-cxr/2.0.0/files/p10/p10000032/s53911762.txt'),
+                os.path.join(physionet_dir, 'mimic-cxr/2.0.0/files/p10/p10000032/s56699142.txt'),
+                os.path.join(physionet_dir, 'mimic-cxr/2.0.0/files/p19/p19999987/s55368167.txt'),
+                os.path.join(physionet_dir, 'mimic-cxr/2.0.0/files/p19/p19999987/s58621812.txt'),
+                os.path.join(physionet_dir, 'mimic-cxr/2.0.0/files/p19/p19999987/s58971208.txt'),
+            ]
+            assert all([os.path.isfile(i) for i in report_paths]), f"""The reports do not exist with the following regex: {os.path.join(physionet_dir, 'mimic-cxr/2.0.0/files/p1*/p1*/s*.txt')}.
+            "Please download them using wget -r -N -c -np --reject dcm --user <username> --ask-password https://physionet.org/files/mimic-cxr/2.0.0/"""
+            print('Extracting sections from reports...')
+            create_sectioned_files(
+                reports_path=os.path.join(physionet_dir, 'mimic-cxr', '2.0.0', 'files'),
+                output_path=sectioned_dir,
+                no_split=True,
+            )
+        if not os.path.exists(database_path):
+            connect = duckdb.connect(database_path)
+            csv_paths = []
+            csv_paths.append(glob(os.path.join(physionet_dir, 'mimic-iv-ed', '*', 'ed', 'edstays.csv.gz'))[0])
+            csv_paths.append(glob(os.path.join(physionet_dir, 'mimic-iv-ed', '*', 'ed', 'medrecon.csv.gz'))[0])
+            csv_paths.append(glob(os.path.join(physionet_dir, 'mimic-iv-ed', '*', 'ed', 'pyxis.csv.gz'))[0])
+            csv_paths.append(glob(os.path.join(physionet_dir, 'mimic-iv-ed', '*', 'ed', 'triage.csv.gz'))[0])
+            csv_paths.append(glob(os.path.join(physionet_dir, 'mimic-iv-ed', '*', 'ed', 'vitalsign.csv.gz'))[0])
+            base_names = [os.path.basename(i) for i in csv_paths]
+            for i in ['edstays.csv.gz', 'medrecon.csv.gz', 'pyxis.csv.gz', 'triage.csv.gz', 'vitalsign.csv.gz']:
+                assert i in base_names, f"""Table {i} is missing from MIMIC-IV-ED.
+                    Please download the tables from https://physionet.org/content/mimic-iv-ed. Do not decompress them."""
+            csv_paths.append(glob(os.path.join(physionet_dir, 'mimic-cxr-jpg', '*', 'mimic-cxr-2.0.0-metadata.csv.gz'))[0])
+            csv_paths.append(glob(os.path.join(physionet_dir, 'mimic-cxr-jpg', '*', 'mimic-cxr-2.0.0-chexpert.csv.gz'))[0])
+            csv_paths.append(glob(os.path.join(physionet_dir, 'mimic-cxr-jpg', '*', 'mimic-cxr-2.0.0-split.csv.gz'))[0])
+            base_names = [os.path.basename(i) for i in csv_paths[-3:]]
+            for i in ['mimic-cxr-2.0.0-metadata.csv.gz', 'mimic-cxr-2.0.0-chexpert.csv.gz', 'mimic-cxr-2.0.0-split.csv.gz']:
+                assert i in base_names, f"""CSV file {i} is missing from MIMIC-IV-ED.
+                    Please download the tables from https://physionet.org/content/mimic-cxr-jpg. Do not decompress them."""
+            for i in csv_paths:
+                name = Path(i).stem.replace('.csv', '').replace('.gz', '').replace('-', '_').replace('.', '_')
+                print(f'Copying {name} into database...')
+                connect.sql(f"CREATE OR REPLACE TABLE {name} AS FROM '{i}';")
+            # MIMIC-CXR report sections:
+            print(f'Copying mimic_cxr_sectioned into database...')
+            connect.sql(f"CREATE OR REPLACE TABLE mimic_cxr_sectioned AS FROM '{mimic_cxr_sectioned_path}';")
+            connect.sql("ALTER TABLE mimic_cxr_sectioned RENAME COLUMN column0 TO study;")
+            connect.sql("ALTER TABLE mimic_cxr_sectioned RENAME COLUMN column1 TO impression;")
+            connect.sql("ALTER TABLE mimic_cxr_sectioned RENAME COLUMN column2 TO findings;")
+            connect.sql("ALTER TABLE mimic_cxr_sectioned RENAME COLUMN column3 TO indication;")
+            connect.sql("ALTER TABLE mimic_cxr_sectioned RENAME COLUMN column4 TO history;")
+            connect.sql("ALTER TABLE mimic_cxr_sectioned RENAME COLUMN column5 TO last_paragraph;")
+            connect.sql("ALTER TABLE mimic_cxr_sectioned RENAME COLUMN column6 TO comparison;")
+            connect.sql("DELETE FROM mimic_cxr_sectioned WHERE study='study';")
+            splits = connect.sql("FROM mimic_cxr_2_0_0_split").df()
+            reports = connect.sql("FROM mimic_cxr_sectioned").df()
+            metadata = connect.sql("FROM mimic_cxr_2_0_0_metadata").df()
+            chexpert = connect.sql("FROM mimic_cxr_2_0_0_chexpert").df()
+            # Create datetime column:
+            metadata['StudyTime'] = metadata['StudyTime'].astype(int)
+            metadata['study_datetime'] = pd.to_datetime(
+                metadata.apply(lambda x: f'{x["StudyDate"]} {x["StudyTime"]:06}', axis=1),
+                format='%Y%m%d %H%M%S',
+            )
+            reports.rename(columns={'study': 'study_id'}, inplace=True)
+            reports.study_id = reports.study_id.str[1:].astype('int32')
+            df = pd.merge(splits, reports, on='study_id')
+            df = pd.merge(df, metadata, on=['dicom_id', 'study_id', 'subject_id'])
+            df = pd.merge(df, chexpert, on=['study_id', 'subject_id'])
+            connect.sql(f"CREATE OR REPLACE TABLE mimic_cxr AS SELECT * FROM df")
+            # Create lookup tables (do this only for ED tables, as the MIMIC-CXR metadata table is not useful):
+            for k, v in ed_module_tables.items():
+                if v.load and v.index_columns:
+                    start_idx = 0
+                    for i in v.index_columns_source:
+                        lut_name = f'{k}_{i}_lut'
+                        table = k
+                        lut, end_idx = create_lookup_table(connect.sql(f"SELECT {i} FROM {table}").df(), [i], start_idx)
+                        start_idx = end_idx + 1
+                        lut = lut.rename(columns={'index': f'{i}_index'})
+                        print(f'Creating {lut_name}...')
+                        connect.sql(f"CREATE OR REPLACE TABLE {lut_name} AS SELECT * FROM lut")
+                        if f'{i}_index' in connect.sql(f"FROM {k} LIMIT 0").df().columns:
+                            connect.sql(
+                                f"""
+                                ALTER TABLE {k}
+                                DROP COLUMN {i}_index;
+                                """
+                            )
+                        connect.sql(
+                            f"""
+                                CREATE OR REPLACE TABLE {k} AS
+                                SELECT {k}.*, {lut_name}.{i}_index
+                                FROM {k} LEFT JOIN {lut_name}
+                                ON {k}.{i} = {lut_name}.{i}
+                            """
+                        )
+                    connect.sql(
+                        f"""
+                            CREATE TABLE IF NOT EXISTS lut_info (table_name VARCHAR PRIMARY KEY, start_index INT, end_index INT);
+                            INSERT OR REPLACE INTO lut_info VALUES ('{k}', {0}, {end_idx});
+                        """
+                    )
+            table_studies = {
+                'edstays': [],
+                'triage': [],
+                'medrecon': [],
+                'vitalsign': [],
+                'pyxis': [],
+            }
+            stay_id_tables = ['triage']
+            stay_id_charttime_tables = ['medrecon', 'vitalsign', 'pyxis']
+            df = connect.sql(f"FROM mimic_cxr").df()
+            # DICOM identifiers can have different datetimes, so use most recent datetime for the study:
+            df = df.sort_values(by='study_datetime', ascending=False)
+            df = df.groupby('study_id').first().reset_index()
+            for _, row in tqdm(df.iterrows(), total=df.shape[0]):
+                edstays = connect.sql(
+                    f"""
+                    SELECT stay_id, intime, outtime
+                    FROM edstays
+                    WHERE (subject_id = {row['subject_id']})
+                    AND intime < '{row['study_datetime']}'
+                    AND outtime > '{row['study_datetime']}';
+                    """
+                ).df()
+                if len(edstays) > 0:
+                    for i in edstays['stay_id'].to_list():
+                        table_studies['edstays'].append({'study_id': row['study_id'], 'stay_id': i})
+                        for j in stay_id_tables:
+                            table = connect.sql(
+                                f"""
+                                SELECT stay_id
+                                FROM {j}
+                                WHERE (stay_id = {i});
+                                """
+                            ).df()
+                            for k in table['stay_id'].to_list():
+                                table_studies[j].append({'study_id': row['study_id'], 'stay_id': k})
+                        for j in stay_id_charttime_tables:
+                            table = connect.sql(
+                                f"""
+                                SELECT stay_id
+                                FROM {j}
+                                WHERE (stay_id = {i})
+                                AND charttime < '{row['study_datetime']}';
+                                """
+                            ).df()
+                            for k in table['stay_id'].to_list():
+                                table_studies[j].append({'study_id': row['study_id'], 'stay_id': k})
+            for k, v in table_studies.items():
+                df = pd.DataFrame(v)
+                df = df.drop_duplicates(subset=['study_id', 'stay_id'])
+                connect.sql(f"CREATE TABLE {k}_study_ids AS SELECT * FROM df")
+    @staticmethod
+    def get_dataset(split, transforms, database_path, mimic_cxr_jpg_dir, max_images_per_study=5):
+        records = EDCXRSubjectRecords(database_path=database_path, time_delta_map=lambda x: 1 / math.sqrt(x + 1))
+        dataset = StudyIDEDStayIDSubset(
+                mimic_iv_duckdb_path=database_path,
+                dataset_dir=mimic_cxr_jpg_dir,
+                transforms=transforms,
+                split=split,
+                max_images_per_study=max_images_per_study,
+                records=records,
+            )
+        print(f'No. of examples: {dataset.__len__()}.')
+        print(
+            f'No. of training dicom_ids, study_ids, & subject_ids: {dataset.num_dicom_ids},',
+            f'{dataset.num_study_ids}, & {dataset.num_subject_ids}.',
+        )

modelling_uniformer.py ADDED Viewed

	@@ -0,0 +1,412 @@

+from collections import OrderedDict
+from functools import partial
+from typing import Optional, Tuple, Union
+from math import isqrt
+import torch
+import torch.nn as nn
+from timm.models.layers import DropPath, to_2tuple, trunc_normal_
+from transformers import ViTConfig
+from transformers.modeling_outputs import ModelOutput
+from transformers.modeling_utils import PreTrainedModel
+from transformers.utils import logging
+logger = logging.get_logger(__name__)
+layer_scale = False
+init_value = 1e-6
+class Mlp(nn.Module):
+    def __init__(self, in_features, hidden_features=None, out_features=None, act_layer=nn.GELU, drop=0.):
+        super().__init__()
+        out_features = out_features or in_features
+        hidden_features = hidden_features or in_features
+        self.fc1 = nn.Linear(in_features, hidden_features)
+        self.act = act_layer()
+        self.fc2 = nn.Linear(hidden_features, out_features)
+        self.drop = nn.Dropout(drop)
+    def forward(self, x):
+        x = self.fc1(x)
+        x = self.act(x)
+        x = self.drop(x)
+        x = self.fc2(x)
+        x = self.drop(x)
+        return x
+class CMlp(nn.Module):
+    def __init__(self, in_features, hidden_features=None, out_features=None, act_layer=nn.GELU, drop=0.):
+        super().__init__()
+        out_features = out_features or in_features
+        hidden_features = hidden_features or in_features
+        self.fc1 = nn.Conv2d(in_features, hidden_features, 1)
+        self.act = act_layer()
+        self.fc2 = nn.Conv2d(hidden_features, out_features, 1)
+        self.drop = nn.Dropout(drop)
+    def forward(self, x):
+        x = self.fc1(x)
+        x = self.act(x)
+        x = self.drop(x)
+        x = self.fc2(x)
+        x = self.drop(x)
+        return x
+class Attention(nn.Module):
+    def __init__(self, dim, num_heads=8, qkv_bias=False, qk_scale=None, attn_drop=0., proj_drop=0.):
+        super().__init__()
+        self.num_heads = num_heads
+        head_dim = dim // num_heads
+        self.scale = qk_scale or head_dim ** -0.5
+        self.qkv = nn.Linear(dim, dim * 3, bias=qkv_bias)
+        self.attn_drop = nn.Dropout(attn_drop)
+        self.proj = nn.Linear(dim, dim)
+        self.proj_drop = nn.Dropout(proj_drop)
+    def forward(self, x):
+        B, N, C = x.shape
+        qkv = self.qkv(x).reshape(B, N, 3, self.num_heads, C // self.num_heads).permute(2, 0, 3, 1, 4)
+        q, k, v = qkv[0], qkv[1], qkv[2]
+        attn = (q @ k.transpose(-2, -1)) * self.scale
+        attn = attn.softmax(dim=-1)
+        attn = self.attn_drop(attn)
+        x = (attn @ v).transpose(1, 2).reshape(B, N, C)
+        x = self.proj(x)
+        x = self.proj_drop(x)
+        return x
+class CBlock(nn.Module):
+    def __init__(self, dim, mlp_ratio=4., drop=0., drop_path=0., act_layer=nn.GELU):
+        super().__init__()
+        self.pos_embed = nn.Conv2d(dim, dim, 3, padding=1, groups=dim)
+        self.norm1 = nn.BatchNorm2d(dim)
+        self.conv1 = nn.Conv2d(dim, dim, 1)
+        self.conv2 = nn.Conv2d(dim, dim, 1)
+        self.attn = nn.Conv2d(dim, dim, 5, padding=2, groups=dim)
+        self.drop_path = DropPath(drop_path) if drop_path > 0. else nn.Identity()
+        self.norm2 = nn.BatchNorm2d(dim)
+        mlp_hidden_dim = int(dim * mlp_ratio)
+        self.mlp = CMlp(in_features=dim, hidden_features=mlp_hidden_dim, act_layer=act_layer, drop=drop)
+    def forward(self, x):
+        x = x + self.pos_embed(x)
+        x = x + self.module_1(x)
+        x = x + self.module_2(x)
+        return x
+    def module_1(self, x):
+        x = self.norm1(x.to(dtype=self.norm1.weight.dtype))  # Won't autocast to the dtype of the parameters of nn.BatchNorm2d.
+        x = self.conv1(x)
+        x = self.attn(x)
+        x = self.conv2(x)
+        x = self.drop_path(x)
+        return x
+    def module_2(self, x):
+        x = self.norm2(x.to(dtype=self.norm2.weight.dtype))  # Won't autocast to the dtype of the parameters of nn.BatchNorm2d.
+        x = self.mlp(x)
+        x = self.drop_path(x)
+        return x
+class SABlock(nn.Module):
+    def __init__(self, dim, num_heads, mlp_ratio=4., qkv_bias=False, qk_scale=None, drop=0., attn_drop=0.,
+                 drop_path=0., act_layer=nn.GELU, norm_layer=nn.LayerNorm):
+        super().__init__()
+        self.pos_embed = nn.Conv2d(dim, dim, 3, padding=1, groups=dim)
+        self.norm1 = norm_layer(dim)
+        self.attn = Attention(
+            dim,
+            num_heads=num_heads, qkv_bias=qkv_bias, qk_scale=qk_scale,
+            attn_drop=attn_drop, proj_drop=drop)
+        self.drop_path = DropPath(drop_path) if drop_path > 0. else nn.Identity()
+        self.norm2 = norm_layer(dim)
+        mlp_hidden_dim = int(dim * mlp_ratio)
+        self.mlp = Mlp(in_features=dim, hidden_features=mlp_hidden_dim, act_layer=act_layer, drop=drop)
+        global layer_scale
+        self.ls = layer_scale
+        if self.ls:
+            global init_value
+            print(f"Use layer_scale: {layer_scale}, init_values: {init_value}")
+            self.gamma_1 = nn.Parameter(init_value * torch.ones((dim)),requires_grad=True)
+            self.gamma_2 = nn.Parameter(init_value * torch.ones((dim)),requires_grad=True)
+    def forward(self, x):
+        x = x + self.pos_embed(x)
+        B, N, H, W = x.shape
+        x = x.flatten(2).transpose(1, 2)
+        if self.ls:
+            x = x + self.drop_path(self.gamma_1 * self.attn(self.norm1(x)))
+            x = x + self.drop_path(self.gamma_2 * self.mlp(self.norm2(x)))
+        else:
+            x = x + self.drop_path(self.attn(self.norm1(x)))
+            x = x + self.drop_path(self.mlp(self.norm2(x)))
+        x = x.transpose(1, 2).reshape(B, N, H, W)
+        return x
+class HeadEmbedding(nn.Module):
+    def __init__(self, in_channels, out_channels):
+        super(HeadEmbedding, self).__init__()
+        self.proj = nn.Sequential(
+            nn.Conv2d(in_channels, out_channels // 2, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1)),
+            nn.BatchNorm2d(out_channels // 2),
+            nn.GELU(),
+            nn.Conv2d(out_channels // 2, out_channels, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1)),
+            nn.BatchNorm2d(out_channels),
+        )
+    def forward(self, x):
+        x = self.proj(x)
+        return x
+class MiddleEmbedding(nn.Module):
+    def __init__(self, in_channels, out_channels):
+        super(MiddleEmbedding, self).__init__()
+        self.proj = nn.Sequential(
+            nn.Conv2d(in_channels, out_channels, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1)),
+            nn.BatchNorm2d(out_channels),
+        )
+    def forward(self, x):
+        x = self.proj(x)
+        return x
+class PatchEmbed(nn.Module):
+    def __init__(self, image_size=224, patch_size=16, in_chans=3, embed_dim=768):
+        super().__init__()
+        image_size = to_2tuple(image_size)
+        patch_size = to_2tuple(patch_size)
+        num_patches_height = image_size[0] // patch_size[0]
+        num_patches_width = image_size[1] // patch_size[1]
+        num_patches = num_patches_height * num_patches_width
+        self.image_size = image_size
+        self.patch_size = patch_size
+        self.num_patches = num_patches
+        self.proj = nn.Conv2d(in_chans, embed_dim, kernel_size=patch_size, stride=patch_size)
+        self.norm = nn.LayerNorm(embed_dim)
+    def forward(self, x):
+        _, _, H, W = x.shape
+        assert H == self.image_size[0] and W == self.image_size[1], \
+            f"Input image size ({H}*{W}) doesn't match model ({self.image_size[0]}*{self.image_size[1]})."
+        x = self.proj(x)
+        B, _, H, W = x.shape
+        x = x.flatten(2).transpose(1, 2)
+        x = self.norm(x)
+        x = x.reshape(B, H, W, -1).permute(0, 3, 1, 2).contiguous()
+        return x
+class UniFormer(nn.Module):
+    def __init__(self, depth=[3, 4, 8, 3], image_size=224, in_chans=3, num_classes=1000, embed_dim=[64, 128, 320, 512],
+                 head_dim=64, mlp_ratio=4., qkv_bias=True, qk_scale=None, representation_size=None, patch_size=[4, 2, 2, 2],
+                 drop_rate=0., attn_drop_rate=0., drop_path_rate=0., conv_stem=False, layer_norm_eps=1e-6, **kwargs):
+        super().__init__()
+        self.num_classes = num_classes
+        self.num_features = self.embed_dim = embed_dim  # num_features for consistency with other models
+        norm_layer = partial(nn.LayerNorm, eps=layer_norm_eps)
+        if conv_stem:
+            self.patch_embed1 = HeadEmbedding(in_channels=in_chans, out_channels=embed_dim[0])
+            self.patch_embed2 = MiddleEmbedding(in_channels=embed_dim[0], out_channels=embed_dim[1])
+            self.patch_embed3 = MiddleEmbedding(in_channels=embed_dim[1], out_channels=embed_dim[2])
+            self.patch_embed4 = MiddleEmbedding(in_channels=embed_dim[2], out_channels=embed_dim[3])
+        else:
+            self.patch_embed1 = PatchEmbed(
+                image_size=image_size, patch_size=patch_size[0], in_chans=in_chans, embed_dim=embed_dim[0])
+            self.patch_embed2 = PatchEmbed(
+                image_size=image_size // patch_size[0], patch_size=patch_size[1], in_chans=embed_dim[0], embed_dim=embed_dim[1])
+            self.patch_embed3 = PatchEmbed(
+                image_size=image_size // (patch_size[0]*patch_size[1]), patch_size=patch_size[2], in_chans=embed_dim[1], embed_dim=embed_dim[2])
+            self.patch_embed4 = PatchEmbed(
+                image_size=image_size // (patch_size[0]*patch_size[1]*patch_size[2]), patch_size=patch_size[3], in_chans=embed_dim[2], embed_dim=embed_dim[3])
+        self.pos_drop = nn.Dropout(p=drop_rate)
+        dpr = [x.item() for x in torch.linspace(0, drop_path_rate, sum(depth))]  # stochastic depth decay rule
+        num_heads = [dim // head_dim for dim in embed_dim]
+        self.blocks1 = nn.ModuleList([
+            CBlock(dim=embed_dim[0], mlp_ratio=mlp_ratio, drop=drop_rate, drop_path=dpr[i])
+            for i in range(depth[0])])
+        self.blocks2 = nn.ModuleList([
+            CBlock(dim=embed_dim[1], mlp_ratio=mlp_ratio, drop=drop_rate, drop_path=dpr[i+depth[0]])
+            for i in range(depth[1])])
+        self.blocks3 = nn.ModuleList([
+            SABlock(
+                dim=embed_dim[2], num_heads=num_heads[2], mlp_ratio=mlp_ratio, qkv_bias=qkv_bias, qk_scale=qk_scale,
+                drop=drop_rate, attn_drop=attn_drop_rate, drop_path=dpr[i+depth[0]+depth[1]], norm_layer=norm_layer)
+            for i in range(depth[2])])
+        self.blocks4 = nn.ModuleList([
+            SABlock(
+                dim=embed_dim[3], num_heads=num_heads[3], mlp_ratio=mlp_ratio, qkv_bias=qkv_bias, qk_scale=qk_scale,
+                drop=drop_rate, attn_drop=attn_drop_rate, drop_path=dpr[i+depth[0]+depth[1]+depth[2]], norm_layer=norm_layer)
+        for i in range(depth[3])])
+        self.norm = nn.BatchNorm2d(embed_dim[-1])
+        # Representation layer
+        if representation_size:
+            self.num_features = representation_size
+            self.pre_logits = nn.Sequential(OrderedDict([
+                ('fc', nn.Linear(embed_dim, representation_size)),
+                ('act', nn.Tanh())
+            ]))
+        else:
+            self.pre_logits = nn.Identity()
+    def forward_features(self, x):
+        x = self.patch_embed1(x)
+        x = self.pos_drop(x)
+        for blk in self.blocks1:
+            x = blk(x)
+        x = self.patch_embed2(x)
+        for blk in self.blocks2:
+            x = blk(x)
+        x = self.patch_embed3(x)
+        for blk in self.blocks3:
+            x = blk(x)
+        x = self.patch_embed4(x)
+        for blk in self.blocks4:
+            x = blk(x)
+        x = self.norm(x.to(dtype=self.norm.weight.dtype))  # Won't autocast to the dtype of the parameters of nn.BatchNorm2d.
+        x = self.pre_logits(x)
+        return x
+    def forward(self, x):
+        x = self.forward_features(x)
+        return x
+class UniFormerPreTrainedModel(PreTrainedModel):
+    """
+    An abstract class to handle weights initialization and a simple interface for downloading and loading pretrained
+    models.
+    """
+    config_class = ViTConfig
+    base_model_prefix = "vit"
+    main_input_name = "pixel_values"
+    def _init_weights(self, m):
+        if isinstance(m, nn.Linear):
+            trunc_normal_(m.weight, std=.02)
+            if isinstance(m, nn.Linear) and m.bias is not None:
+                nn.init.constant_(m.bias, 0)
+        elif isinstance(m, nn.LayerNorm):
+            nn.init.constant_(m.bias, 0)
+            nn.init.constant_(m.weight, 1.0)
+class UniFormerProjectionHead(torch.nn.Module):
+    def __init__(self, config) -> None:
+        super().__init__()
+        # Layer normalisation before projection:
+        self.layer_norm = torch.nn.LayerNorm(config.embed_dim[-1], eps=config.layer_norm_eps)
+        # No bias as following layer normalisation with bias:
+        self.projection = torch.nn.Linear(config.embed_dim[-1], config.projection_size, bias=False)
+    def forward(self, x: torch.Tensor) -> torch.Tensor:
+        x = self.layer_norm(x)
+        x = self.projection(x)
+        return x
+class UniFormerModel(UniFormerPreTrainedModel):
+    def __init__(self, config):
+        super().__init__(config)
+        self.uniformer = UniFormer(**vars(config))
+        # Initialize weights and apply final processing:
+        self.post_init()
+    def forward(
+        self,
+        pixel_values: Optional[torch.Tensor] = None,
+        output_hidden_states: Optional[bool] = None,
+        return_dict: Optional[bool] = None,
+    ) -> Union[Tuple, ModelOutput]:
+        return_dict = return_dict if return_dict is not None else self.config.use_return_dict
+        last_hidden_state = self.uniformer(pixel_values)
+        # Flatten h x w:
+        last_hidden_state = torch.flatten(last_hidden_state, 2)
+        # Permute last hidden state:
+        last_hidden_state = torch.permute(last_hidden_state, [0, 2, 1])
+        # return last_hidden_state
+        if not return_dict:
+            return last_hidden_state
+        return ModelOutput(last_hidden_state=last_hidden_state)
+class MultiUniFormerWithProjectionHead(UniFormerPreTrainedModel):
+    def __init__(self, config):
+        super().__init__(config)
+        self.uniformer = UniFormer(**vars(config))
+        self.projection_head = UniFormerProjectionHead(config)
+        # Initialize weights and apply final processing:
+        self.post_init()
+    def forward(
+        self,
+        pixel_values: Optional[torch.Tensor] = None,
+        output_hidden_states: Optional[bool] = None,
+        return_dict: Optional[bool] = None,
+    ) -> Union[Tuple, ModelOutput]:
+        return_dict = return_dict if return_dict is not None else self.config.use_return_dict
+        # Flatten the batch and study_id dimensions:
+        assert len(pixel_values.shape) == 5, 'pixel_values must be B, S, C, H, W, where S is the max number of images for a study in the batch.'
+        last_hidden_state = self.uniformer(pixel_values.view(-1, *pixel_values.shape[2:]))
+        # last_hidden_state = self.uniformer(pixel_values.flatten(start_dim=0, end_dim=1))
+        # Flatten h x w:
+        last_hidden_state = torch.flatten(last_hidden_state, 2)
+        # Project the features for each spatial position to the decoder's hidden size:
+        projection = self.projection_head(torch.permute(last_hidden_state, [0, 2, 1]))
+        # Concatenate the features for each chest X-ray:
+        projection = projection.view(pixel_values.shape[0], -1, projection.shape[-1])
+        # Derive the attention mask from the pixel values:
+        mask = (pixel_values[:, :, 0, 0, 0] != 0.0)[:, :, None]
+        attention_mask = torch.ones(
+            [projection.shape[0], pixel_values.shape[1], projection.shape[1] // pixel_values.shape[1]],
+            dtype=torch.long,
+            device=mask.device,
+        )
+        attention_mask = attention_mask * mask
+        attention_mask = attention_mask.view(attention_mask.shape[0], -1)
+        if not return_dict:
+            return projection
+        return ModelOutput(last_hidden_state=projection, attention_mask=attention_mask)
+if __name__ == '__main__':
+    y = PatchEmbed()
+    y(torch.randn(2, 3, 224, 224))

records.py ADDED Viewed

	@@ -0,0 +1,369 @@

+import functools
+import os
+import re
+from collections import OrderedDict
+from typing import Dict, List, Optional
+import duckdb
+import pandas as pd
+import torch
+from .tables import ed_cxr_token_type_ids, ed_module_tables, mimic_cxr_tables
+def mimic_cxr_text_path(dir, subject_id, study_id, ext='txt'):
+    return os.path.join(dir, 'p' + str(subject_id)[:2], 'p' + str(subject_id),
+                        's' + str(study_id) + '.' + ext)
+def format(text):
+    # Remove newline, tab, repeated whitespaces, and leading and trailing whitespaces:
+    text = re.sub(r'\n|\t', ' ', text)
+    text = re.sub(r'\s+', ' ', text)
+    text = text.strip()
+    return text
+def rgetattr(obj, attr, *args):
+    def _getattr(obj, attr):
+        return getattr(obj, attr, *args)
+    return functools.reduce(_getattr, [obj] + attr.split('.'))
+def df_to_tensor_index_columns(
+        df: pd.DataFrame,
+        tensor: torch.Tensor,
+        group_idx_to_y_idx: Dict,
+        groupby: str,
+        index_columns: List[str],
+    ):
+    """
+    Converts a dataframe with index columns to a tensor, where each index of the y-axis is determined by the
+    'groupby' column.
+    """
+    assert len(group_idx_to_y_idx) == tensor.shape[0]
+    all_columns = index_columns + [groupby]
+    y_indices = [group_idx_to_y_idx[row[groupby]] for _, row in df[all_columns].iterrows() for i in index_columns if row[i] == row[i]]
+    x_indices = [row[i] for _, row in df[all_columns].iterrows() for i in index_columns if row[i] == row[i]]
+    tensor[y_indices, x_indices] = 1.0
+    return tensor
+def df_to_tensor_value_columns(
+        df: pd.DataFrame,
+        tensor: torch.Tensor,
+        group_idx_to_y_idx: Dict,
+        groupby: str,
+        value_columns: List[str],
+        value_column_to_idx: Dict,
+    ):
+    """
+    Converts a dataframe with value columns to a tensor, where each index of the y-axis is determined by the
+    'groupby' column. The x-index is determined by a dictionary using the column name.
+    """
+    assert len(group_idx_to_y_idx) == tensor.shape[0]
+    all_columns = value_columns + [groupby]
+    y_indices = [group_idx_to_y_idx[row[groupby]] for _, row in df[all_columns].iterrows() for i in value_columns if row[i] == row[i]]
+    x_indices = [value_column_to_idx[i] for _, row in df[all_columns].iterrows() for i in value_columns if row[i] == row[i]]
+    element_value = [row[i] for _, row in df[all_columns].iterrows() for i in value_columns if row[i] == row[i]]
+    tensor[y_indices, x_indices] = torch.tensor(element_value, dtype=tensor.dtype)
+    return tensor
+class EDCXRSubjectRecords:
+    def __init__(
+        self,
+        database_path: str,
+        dataset_dir: Optional[str] = None,
+        reports_dir: Optional[str] = None,
+        token_type_ids_starting_idx: Optional[int] = None,
+        time_delta_map = lambda x: x,
+        debug: bool = False
+    ):
+        self.database_path = database_path
+        self.dataset_dir = dataset_dir
+        self.reports_dir = reports_dir
+        self.time_delta_map = time_delta_map
+        self.debug = debug
+        self.connect = duckdb.connect(self.database_path, read_only=True)
+        self.streamlit_flag = False
+        self.clear_start_end_times()
+        # Module configurations:
+        self.ed_module_tables = ed_module_tables
+        self.mimic_cxr_tables = mimic_cxr_tables
+        lut_info = self.connect.sql("FROM lut_info").df()
+        for k, v in (self.ed_module_tables | self.mimic_cxr_tables).items():
+            if v.load and (v.value_columns or v.index_columns):
+                v.value_column_to_idx = {}
+                if v.index_columns:
+                    v.total_indices = lut_info[lut_info['table_name'] == k]['end_index'].item() + 1
+                else:
+                    v.total_indices = 0
+                for i in v.value_columns:
+                    v.value_column_to_idx[i] = v.total_indices
+                    v.total_indices += 1
+        # Token type identifiers:
+        self.token_type_to_token_type_id = ed_cxr_token_type_ids
+        if token_type_ids_starting_idx is not None:
+            self.token_type_to_token_type_id = {k: v + token_type_ids_starting_idx for k, v in self.token_type_to_token_type_id.items()}
+    def __len__(self):
+        return len(self.subject_ids)
+    def clear_start_end_times(self):
+        self.start_time, self.end_time = None, None
+    def admission_ed_stay_ids(self, hadm_id):
+        if hadm_id:
+            return self.connect.sql(f'SELECT stay_id FROM edstays WHERE subject_id = {self.subject_id} AND hadm_id = {hadm_id}').df()['stay_id'].tolist()
+        else:
+            return self.connect.sql(f'SELECT stay_id FROM edstays WHERE subject_id = {self.subject_id}').df()['stay_id'].tolist()
+    def subject_study_ids(self):
+        mimic_cxr = self.connect.sql(
+            f'SELECT study_id, study_datetime FROM mimic_cxr WHERE subject_id = {self.subject_id}',
+        ).df()
+        if self.start_time and self.end_time:
+            mimic_cxr = self.filter_admissions_by_time_span(mimic_cxr, 'study_datetime')
+        mimic_cxr = mimic_cxr.drop_duplicates(subset=['study_id']).sort_values(by='study_datetime')
+        return dict(zip(mimic_cxr['study_id'], mimic_cxr['study_datetime']))
+    def load_ed_module(self, hadm_id=None, stay_id=None, reference_time=None):
+        if not self.start_time and stay_id is not None:
+            edstay = self.connect.sql(
+                f"""
+                SELECT intime, outtime
+                FROM edstays
+                WHERE stay_id = {stay_id}
+                """
+            ).df()
+            self.start_time = edstay['intime'].item()
+            self.end_time = edstay['outtime'].item()
+        self.load_module(self.ed_module_tables, hadm_id=hadm_id, stay_id=stay_id, reference_time=reference_time)
+    def load_mimic_cxr(self, study_id, reference_time=None):
+        self.load_module(self.mimic_cxr_tables, study_id=study_id, reference_time=reference_time)
+        if self.streamlit_flag:
+            self.report_path = mimic_cxr_text_path(self.reports_dir, self.subject_id, study_id, 'txt')
+    def load_module(self, module_dict, hadm_id=None, stay_id=None, study_id=None, reference_time=None):
+        for k, v in module_dict.items():
+            if self.streamlit_flag or v.load:
+                query = f"FROM {k}"
+                conditions = []
+                if hasattr(self, 'subject_id') and v.subject_id_filter:
+                    conditions.append(f"subject_id={self.subject_id}")
+                if v.hadm_id_filter:
+                    assert hadm_id is not None
+                    conditions.append(f"hadm_id={hadm_id}")
+                if v.stay_id_filter:
+                    assert stay_id is not None
+                    conditions.append(f"stay_id={stay_id}")
+                if v.study_id_filter:
+                    assert study_id is not None
+                    conditions.append(f"study_id={study_id}")
+                if v.mimic_cxr_sectioned:
+                    assert study_id is not None
+                    conditions.append(f"study='s{study_id}'")
+                ands = ['AND'] * (len(conditions) * 2 - 1)
+                ands[0::2] = conditions
+                if conditions:
+                    query += " WHERE ("
+                    query += ' '.join(ands)
+                    query += ")"
+                df = self.connect.sql(query).df()
+                if v.load:
+                    columns = [v.groupby] + v.time_columns + v.index_columns + v.text_columns + v.value_columns + v.target_sections
+                    # Use the starting time of the stay/admission as the time:
+                    if v.use_start_time:
+                        df['start_time'] = self.start_time
+                        columns += ['start_time']
+                    if reference_time is not None:
+                        time_column = v.time_columns[-1] if not v.use_start_time else 'start_time'
+                        # Remove rows that are after the reference time to maintain causality:
+                        df = df[df[time_column] < reference_time]
+                if self.streamlit_flag:
+                    setattr(self, k, df)
+                if v.load:
+                    columns = list(dict.fromkeys(columns))  # remove repetitions.
+                    df = df.drop(columns=df.columns.difference(columns), axis=1)
+                    setattr(self, f'{k}_feats', df)
+    def return_ed_module_features(self, stay_id, reference_time=None):
+        example_dict = {}
+        if stay_id is not None:
+            self.load_ed_module(stay_id=stay_id, reference_time=reference_time)
+            for k, v in self.ed_module_tables.items():
+                if v.load:
+                    df = getattr(self, f'{k}_feats')
+                    if self.debug:
+                        example_dict.setdefault('ed_tables', []).append(k)
+                    if not df.empty:
+                        assert f'{k}_index_value_feats' not in example_dict
+                        # The y-index and the time for each group:
+                        time_column = v.time_columns[-1] if not v.use_start_time else 'start_time'
+                        group_idx_to_y_idx, group_idx_to_datetime = OrderedDict(), OrderedDict()
+                        groups = df.dropna(subset=v.index_columns + v.value_columns + v.text_columns, axis=0, how='all')
+                        groups = groups.drop_duplicates(subset=[v.groupby])[list(dict.fromkeys([v.groupby, time_column]))]
+                        groups = groups.reset_index(drop=True)
+                        for i, row in groups.iterrows():
+                            group_idx_to_y_idx[row[v.groupby]] = i
+                            group_idx_to_datetime[row[v.groupby]] = row[time_column]
+                        if (v.index_columns or v.value_columns) and group_idx_to_y_idx:
+                            example_dict[f'{k}_index_value_feats'] = torch.zeros(len(group_idx_to_y_idx), v.total_indices)
+                            if v.index_columns:
+                                example_dict[f'{k}_index_value_feats'] = df_to_tensor_index_columns(
+                                    df=df,
+                                    tensor=example_dict[f'{k}_index_value_feats'],
+                                    group_idx_to_y_idx=group_idx_to_y_idx,
+                                    groupby=v.groupby,
+                                    index_columns=v.index_columns,
+                                )
+                            if v.value_columns:
+                                example_dict[f'{k}_index_value_feats'] = df_to_tensor_value_columns(
+                                    df=df,
+                                    tensor=example_dict[f'{k}_index_value_feats'],
+                                    group_idx_to_y_idx=group_idx_to_y_idx,
+                                    groupby=v.groupby,
+                                    value_columns=v.value_columns,
+                                    value_column_to_idx=v.value_column_to_idx
+                                )
+                            example_dict[f'{k}_index_value_token_type_ids'] = torch.full(
+                                [example_dict[f'{k}_index_value_feats'].shape[0]],
+                                self.token_type_to_token_type_id[k],
+                                dtype=torch.long,
+                            )
+                            event_times = list(group_idx_to_datetime.values())
+                            assert all([i == i for i in event_times])
+                            time_delta = [self.compute_time_delta(i, reference_time) for i in event_times]
+                            example_dict[f'{k}_index_value_time_delta'] = torch.tensor(time_delta)[:, None]
+                            assert example_dict[f'{k}_index_value_feats'].shape[0] == example_dict[f'{k}_index_value_time_delta'].shape[0]
+                        if v.text_columns:
+                            for j in group_idx_to_y_idx.keys():
+                                group_text = df[df[v.groupby] == j]
+                                for i in v.text_columns:
+                                    column_text = [format(k) for k in list(dict.fromkeys(group_text[i].tolist())) if k is not None]
+                                    if column_text:
+                                        example_dict.setdefault(f'{k}_{i}', []).append(f"{', '.join(column_text)}.")
+                                        event_times = group_text[time_column].iloc[0]
+                                        time_delta = self.compute_time_delta(event_times, reference_time, to_tensor=False)
+                                        example_dict.setdefault(f'{k}_{i}_time_delta', []).append(time_delta)
+        return example_dict
+    def return_mimic_cxr_features(self, study_id, reference_time=None):
+        example_dict = {}
+        if study_id is not None:
+            self.load_mimic_cxr(study_id=study_id, reference_time=reference_time)
+            for k, v in self.mimic_cxr_tables.items():
+                if v.load:
+                    df = getattr(self, f'{k}_feats')
+                    if self.debug:
+                        example_dict.setdefault('mimic_cxr_inputs', []).append(k)
+                    if not df.empty:
+                        # The y-index for each group:
+                        group_idx_to_y_idx = OrderedDict()
+                        groups = df.dropna(
+                            subset=v.index_columns + v.value_columns + v.text_columns + v.target_sections,
+                            axis=0,
+                            how='all'
+                        )
+                        groups = groups.drop_duplicates(subset=[v.groupby])[[v.groupby]]
+                        groups = groups.reset_index(drop=True)
+                        for i, row in groups.iterrows():
+                            group_idx_to_y_idx[row[v.groupby]] = i
+                        if v.index_columns and group_idx_to_y_idx:
+                            example_dict[f'{k}_index_value_feats'] = torch.zeros(len(group_idx_to_y_idx), v.total_indices)
+                            if v.index_columns:
+                                example_dict[f'{k}_index_value_feats'] = df_to_tensor_index_columns(
+                                    df=df,
+                                    tensor=example_dict[f'{k}_index_value_feats'],
+                                    group_idx_to_y_idx=group_idx_to_y_idx,
+                                    groupby=v.groupby,
+                                    index_columns=v.index_columns,
+                                )
+                            example_dict[f'{k}_index_value_token_type_ids'] = torch.full(
+                                [example_dict[f'{k}_index_value_feats'].shape[0]],
+                                self.token_type_to_token_type_id[k],
+                                dtype=torch.long,
+                            )
+                    if v.text_columns:
+                        for j in group_idx_to_y_idx.keys():
+                            group_text = df[df[v.groupby] == j]
+                            for i in v.text_columns:
+                                column_text = [format(k) for k in list(dict.fromkeys(group_text[i].tolist())) if k is not None]
+                                if column_text:
+                                    example_dict.setdefault(f'{i}', []).append(f"{', '.join(column_text)}.")
+                    if v.target_sections:
+                        for j in group_idx_to_y_idx.keys():
+                            group_text = df[df[v.groupby] == j]
+                            for i in v.target_sections:
+                                column_text = [format(k) for k in list(dict.fromkeys(group_text[i].tolist())) if k is not None]
+                                assert len(column_text) == 1
+                                example_dict[i] = column_text[-1]
+        return example_dict
+    def compute_time_delta(self, event_time, reference_time, denominator = 3600, to_tensor=True):
+        """
+        How to we transform time-delta inputs? It appears that minutes are used as the input to
+        a weight matrix in "Self-Supervised Transformer for Sparse and Irregularly Sampled Multivariate
+        Clinical Time-Series". This is almost confirmed by the CVE class defined here:
+        https://github.com/sindhura97/STraTS/blob/main/strats_notebook.ipynb, where the input has
+        a size of one.
+        """
+        time_delta = reference_time - event_time
+        time_delta = time_delta.total_seconds() / (denominator)
+        assert isinstance(time_delta, float), f'time_delta should be float, not {type(time_delta)}.'
+        if time_delta < 0:
+            raise ValueError(f'time_delta should be greater than or equal to zero, not {time_delta}.')
+        time_delta = self.time_delta_map(time_delta)
+        if to_tensor:
+            time_delta = torch.tensor(time_delta)
+        return time_delta
+    def filter_admissions_by_time_span(self, df, time_column):
+        return df[(df[time_column] > self.start_time) & (df[time_column] <= self.end_time)]

tables.py ADDED Viewed

	@@ -0,0 +1,159 @@

+from collections import OrderedDict
+from typing import Optional
+ed_cxr_token_type_ids = {
+    'medrecon': 0,
+    'edstays': 1,
+    'triage': 2,
+    'vitalsign': 3,
+    'pyxis': 4,
+    'mimic_cxr_2_0_0_metadata': 5,
+    'medrecon_name': 6,
+    'triage_chiefcomplaint': 7,
+    'triage_pain': 8,
+    'vitalsign_pain': 9,
+    'indication': 10,
+    'history': 11,
+    'findings': 12,
+    'impression': 13,
+    'image': 14,
+    'comparison': 15,
+    'previous_findings': 16,
+    'previous_impression': 17,
+    'previous_image': 18,
+}
+NUM_ED_CXR_TOKEN_TYPE_IDS = max(ed_cxr_token_type_ids.values()) + 1
+class TableConfig:
+    def __init__(
+        self,
+        name: str,
+        hadm_id_filter: bool = False,
+        stay_id_filter: bool = False,
+        study_id_filter: bool = False,
+        subject_id_filter: bool = True,
+        load: Optional[bool] = None,
+        groupby: Optional[str] = None,
+        index_columns: list = [],
+        text_columns: list = [],
+        value_columns: list = [],
+        time_columns: list = [],
+        target_sections: list = [],
+        use_start_time: bool = False,
+        mimic_cxr_sectioned: bool = False,
+    ):
+        self.name = name
+        self.hadm_id_filter = hadm_id_filter
+        self.stay_id_filter = stay_id_filter
+        self.study_id_filter = study_id_filter
+        self.subject_id_filter = subject_id_filter
+        self.load = load
+        self.groupby = groupby
+        self.index_columns_source = [index_columns] if isinstance(index_columns, str) else index_columns
+        self.index_columns = [f'{i}_index' for i in self.index_columns_source]
+        self.text_columns = [text_columns] if isinstance(text_columns, str) else text_columns
+        self.value_columns = [value_columns] if isinstance(value_columns, str) else value_columns
+        self.time_columns = [time_columns] if isinstance(time_columns, str) else time_columns
+        self.target_sections = [target_sections] if isinstance(target_sections, str) else target_sections
+        self.use_start_time = use_start_time
+        self.mimic_cxr_sectioned = mimic_cxr_sectioned
+        assert self.time_columns is None or isinstance(self.time_columns, list)
+        self.value_column_to_idx = {}
+        self.total_indices = None
+# ed module:
+"""
+Order the tables for position_ids based on their order of occurance (for cases where their time deltas are matching).
+The way that they are ordered here is the way that they will be ordered as input.
+1. medrecon - the medications which the patient was taking prior to their ED stay.
+2. edstays - patient stays are tracked in the edstays table.
+3. triage - information collected from the patient at the time of triage.
+4. vitalsign - aperiodic vital signs documented for patients during their stay.
+5. pyxis - dispensation information for medications provided by the BD Pyxis MedStation (position is interchangable with 4).
+"""
+ed_module_tables = OrderedDict(
+    {
+        'medrecon': TableConfig(
+            'Medicine reconciliation',
+            stay_id_filter=True,
+            load=True,
+            index_columns=['gsn', 'ndc', 'etc_rn', 'etccode'],
+            text_columns='name',
+            groupby='stay_id',
+            use_start_time=True,
+        ),
+        'edstays': TableConfig(
+            'ED admissions',
+            stay_id_filter=True,
+            load=True,
+            index_columns=['gender', 'race', 'arrival_transport'],
+            groupby='stay_id',
+            time_columns='intime',
+        ),
+        'triage': TableConfig(
+            'Triage',
+            stay_id_filter=True,
+            load=True,
+            text_columns=['chiefcomplaint', 'pain'],
+            value_columns=['temperature', 'heartrate', 'resprate', 'o2sat', 'sbp', 'dbp', 'acuity'],
+            groupby='stay_id',
+            use_start_time=True,
+        ),
+        'vitalsign': TableConfig(
+            'Aperiodic vital signs',
+            stay_id_filter=True,
+            load=True,
+            index_columns=['rhythm'],
+            text_columns=['pain'],
+            value_columns=['temperature', 'heartrate', 'resprate', 'o2sat', 'sbp', 'dbp'],
+            groupby='charttime',
+            time_columns='charttime',
+        ),
+        'pyxis': TableConfig(
+            'Dispensation information for medications provided by the BD Pyxis MedStation',
+            stay_id_filter=True,
+            load=True,
+            index_columns=['med_rn', 'name', 'gsn_rn', 'gsn'],
+            groupby='charttime',
+            time_columns='charttime',
+        ),
+        'diagnosis': TableConfig('Diagnosis', stay_id_filter=True, hadm_id_filter=False),
+    }
+)
+# MIMIC-CXR module:
+mimic_cxr_tables = OrderedDict(
+    {
+        'mimic_cxr_2_0_0_metadata': TableConfig(
+            'Metadata',
+            study_id_filter=True,
+            load=True,
+            index_columns=[
+                'PerformedProcedureStepDescription',
+                'ViewPosition',
+                'ProcedureCodeSequence_CodeMeaning',
+                'ViewCodeSequence_CodeMeaning',
+                'PatientOrientationCodeSequence_CodeMeaning',
+            ],
+            groupby='study_id',
+        ),
+        'mimic_cxr_sectioned': TableConfig(
+            'Report sections',
+            mimic_cxr_sectioned=True,
+            subject_id_filter=False,
+            load=True,
+            groupby='study',
+            text_columns=['indication', 'history', 'comparison'],
+            target_sections=['findings', 'impression'],
+        ),
+        'mimic_cxr_2_0_0_chexpert': TableConfig('CheXpert', study_id_filter=True),
+        'mimic_cxr_2_0_0_split': TableConfig('Split', study_id_filter=True),
+    }
+)