Upload 10 files

Browse files

Files changed (10) hide show

README.md +218 -3
config.py +71 -0
dam3.1.ckpt +3 -0
featex.py +119 -0
model.py +185 -0
pipeline.py +43 -0
requirements.txt +5 -0
tuning/__init__.py +0 -0
tuning/indet_roc.py +416 -0
tuning/optimal_ordinal.py +510 -0

README.md CHANGED Viewed

@@ -1,3 +1,218 @@
----
-license: apache-2.0
----

+---
+license: apache-2.0
+language:
+- en
+base_model:
+- openai/whisper-small.en
+pipeline_tag: audio-classification
+---
+# Background
+In the United States nearly 21M adults suffer from depression each year [1], with depression serving as the nation’s leading cause of disability [2].
+Despite this, less than 4% of Americans receive mental health screenings from their primary care physicians during annual wellness visits.
+The pandemic and public campaigns of late have made strides toward positively increasing awareness of mental health struggles, but there remains a persisting stigma around depression and other mental health conditions.
+The influence of this stigma is especially marked in older adults. People aged 65 and older are less likely than any other age group to seek mental health support.
+Older adults – for whom depression significantly increases the risk of disability and morbidity – also tend to underreport mental health symptoms [3].
+In the US, this outlook becomes even more troubling when coupled with the rate at which the country’s population is aging: 1 out of every 6 people will be 60 years or over by 2030 [4].
+As widespread and prevalent as depression is, identifying and treating depression and other mental health conditions remains challenging and there is limited objectivity in the screening processes.
+# Depression–Anxiety Model (DAM)
+## Model Overview
+DAM is a clinical-grade, speech-based model designed to screen for signs of depression and anxiety using voice biomarkers.
+To the best of our knowledge, it is the first model developed explicitly for clinical-grade mental health assessment from speech without reliance on linguistic content or transcription.
+The model operates exclusively on the acoustic properties of the speech signal, extracting depression- and anxiety-specific voice biomarkers rather than semantic or lexical information.
+Numerous studies [5–7] have demonstrated that paralinguistic features – such as spectral entropy, pitch variability, fundamental frequency, and related acoustic measures – exhibit strong correlations with depression and anxiety.
+Building on this body of evidence, DAM extends prior approaches by leveraging deep learning to learn fine-grained vocal biomarkers directly from the raw speech signal, yielding representations that demonstrate greater predictive power than hand-engineered paralinguistic features.
+DAM analyzes spoken audio to estimate depression and anxiety severity scores which can be subsequently mapped to standardized clinical scales, such as **PHQ-9** (Patient Health Questionnaire-9) for depression and **GAD-7** (Generalized Anxiety Disorder-7) for anxiety.
+## Data
+The model was trained and evaluated on a large-scale speech dataset collected from approximately 35,000 individuals via phone, tablet, or web app, which corresponds to ~863 hours of speech data.
+Ground-truth labels were derived from both clinician-administered and self-reported PHQ-9 and GAD-7 questionnaires, ensuring strong alignment with established clinical assessment standards.
+The data consists predominantly of American English speech. However, a broad range of accents is represented, providing robustness across diverse speaking styles.
+The audio data itself cannot be shared for privacy reasons. Demographic statistics, model scores, and associated metadata for each audio stream are available for threshold tuning at https://huggingface.co/datasets/KintsugiHealth/dam-dataset.
+## Model Architecture
+**Foundation model:** OpenAI Whisper-Small EN
+**Training approach:** Fine-tuning + Multi-task learning
+**Downstream tasks:** Depression and anxiety severity estimation
+Whisper serves as the backbone for extracting voice biomarkers, while multi-task head is fine-tuned jointly on depression and anxiety prediction tasks to leverage shared representations across mental health conditions.
+## Input Requirements
+**Preferred minimum audio length:** 30 seconds of speech after Voice Activity Detector
+**Input modality:** Audio only
+Shorter audio samples may lead to reduced prediction accuracy.
+## Output
+The model outputs a dictionary of the following form `{"depression":score, "anxiety": score}`.
+If `quantized=False` (see the Usage section below), the scores are returned as raw float values which correlate monotonically with PHQ-9 and GAD-7.
+If `quantized=True` the scores are converted into integers representing the severity of depression and anxiety.
+**Quantization levels for depression task:**
+0 – no depression (PHQ-9 <= 9)
+1 – mild to moderate depression (10 <= PHQ-9 <= 14)
+2 – severe depression (PHQ-9 >= 15)
+**Quantization levels for anxiety task:**
+0 – no anxiety (GAD-7 <= 4)
+1 – mild anxiety (5 <= GAD-7 <= 9)
+2 – moderate anxiety (10 <= GAD-7 <= 14)
+3 – severe anxiety (GAD-7 >= 15)
+## Intended Use
+* Mental health research
+* Clinical decision support
+* Continuous monitoring of depression and anxiety
+## Limitations
+* Not intended for diagnosis/self-diagnosis without clinical oversight
+* Performance may degrade on speech recorded outside controlled environments or in the presence of noise
+* Intended only for audio containing a single voice speaking English
+  * Biases related to language, accent, or demographic representation may be present
+# Usage
+1. Checkout the repo:
+```
+git clone https://huggingface.co/KintsugiHealth/dam
+```
+2. Install requirements:
+```python
+pip install -r requirements.txt
+```
+3. Load and run pipeline
+```python
+from pipeline import Pipeline
+pipeline = Pipeline()
+result = pipeline.run_on_file("sample.wav", quantized=True)
+print(result)
+```
+The output will resemble a dictionary, for example {'depression': 2, 'anxiety': 3}, indicating that the analyzed audio sample exhibits voice biomarkers consistent with severe depression and severe anxiety.
+## Tuning Thresholds
+As mentioned in the Data section above, the raw audio data cannot be shared, but validation and test sets of model scores associated with ground truth and demographic metadata are available for threshold tuning. This way thresholds can be tuned for traditional binary classification, ternary classification with an indeterminate output, and multi-class classification of severity. Two modules are provided for this in the model code's `tuning` package, as illustrated below.
+### Tuning Sensitivity, Specificity, and Indeterminate Fraction
+This module implements a generalization of ROC curve analysis wherein ground truth is binary, but model output can be negative (score below lower threshold), positive (score above upper threshold), or indeterminate (score between thresholds). For the purpose of metric calculations such as sensitivity and specificity, examples marked indeterminate do not count towards either the numerator or denominator. The budget for fraction of examples to be marked indeterminate is configurable as shown below.
+```
+import numpy as np
+from datasets import load_dataset
+from tuning.indet_roc import BinaryLabeledScores
+val = load_dataset("KintsugiHealth/dam-dataset", split="validation")
+val.set_format("numpy")
+test = load_dataset("KintsugiHealth/dam-dataset", split="test")
+test.set_format("numpy")
+data = dict(val=val, test=test)
+# Associate depression model scores with binarized labels based on whether the PHQ-9 sum is >= 10
+scores_labeled = {
+    k: BinaryLabeledScores(
+        y_score=v['scores_depression'], # Change to 'scores_anxiety' to calibrate anxiety thresholds
+        y_true=(v['phq'] >= 10).astype(int) # Change to 'gad' to calibrate anxiety thresholds; optionally change cutoff
+    )
+    for k, v in data.items()
+}
+issa = scores_labeled['val'].indet_sn_sp_array() # Metrics at all possible lower, upper threshold pairs
+# Compute ROC curve with 20% indeterminate budget and select a point near the diagonal
+roc_at_20 = issa.roc_curve(0.2) # Pareto frontier of (sensitivity, specificity) pairs with at most 20% indeterminate fraction
+print(f"Area under the ROC curve with 20% indeterminate budget: {roc_at_20.auc()=:.1%}") #
+sn_eq_sp_at_20 = roc_at_20.sn_eq_sp() # Find where ROC comes closest to sensitivity = specificity diagonal
+print(f"Thresholds to balance sensitivity and specificity on val set with 20% indeterminate budget: "
+      f"{sn_eq_sp_at_20.lower_thresh=:.3}, {sn_eq_sp_at_20.upper_thresh=:.3}")
+print(f"Performance on val set with these thresholds: {sn_eq_sp_at_20.sn=:.1%}, {sn_eq_sp_at_20.sp=:.1%}") #
+test_metrics = sn_eq_sp_at_20.eval(**scores_labeled['test']._asdict()) # Thresholds evaluated on test set
+print(f"Performance on test set with these thresholds: {test_metrics.sn=:.1%}, {test_metrics.sp=:.1%}") #
+# Find best specificity given sensitivity and indeterminate budget constraints
+constrained = issa[(issa.sn >= 0.8) & (issa.indet_frac <= 0.35)]
+optimal = constrained[np.argmax(constrained.sp)]
+print(f"Highest specificity achievable with sensitivity >= 80% and 35% indeterminate budget is "
+      f"{optimal.sp=:.1%}, achieved at thresholds {optimal.lower_thresh=:.3}, {optimal.upper_thresh=:.3}"
+)
+# Collect optimal ways of achieving balanced sensitivity and specificity as a function of indeterminate fraction
+sn_eq_sp = issa.sn_eq_sp_graph()
+```
+### Optimal Tuning for Multiclass Tasks
+The depression and anxiety models were each trained with ordinal regression to predict a scalar score monotonically correlated with the underlying PHQ-9 and GAD-7 questionnaire ground truth sums. As such there are efficient dynamic programming algorithms to select optimal thresholds for multi-class numeric labels under a variety of decision criteria.
+```
+from datasets import load_dataset
+from tuning.optimal_ordinal import MinAbsoluteErrorOrdinalThresholding
+val = load_dataset("KintsugiHealth/dam-dataset", split="validation")
+val.set_format("torch")
+test = load_dataset("KintsugiHealth/dam-dataset", split="test")
+test.set_format("torch")
+data = dict(val=val, test=test)
+scores = val['scores_anxiety']  # Change to 'scores_depression' for depression threshold tuning
+labels = val['gad']  # Change to 'phq' for depression threshold tuning; optionally change to quantized version for coarser prediction tuning
+# Can change to any of
+# `MaxAccuracyOrdinalThresholding`
+# `MaxMacroRecallOrdinalThresholding`
+# `MaxMacroPrecisionOrdinalThresholding`
+# `MaxMacroF1OrdinalThresholding`
+optimal_thresh = MinAbsoluteErrorOrdinalThresholding(num_classes=int(labels.max()) + 1)
+best_constant_cost, best_constant = optimal_thresh.best_constant_output_classifier(labels)
+print(f"Always predicting GAD sum = {best_constant} on val set independent of model score gives mean absolute error {best_constant_cost:.3}.")
+mean_error = optimal_thresh.tune_thresholds(labels=labels, scores=scores)
+print(f"Thresholds optimized on val set to predict GAD sum from anxiety score: {optimal_thresh.thresholds}")
+print(f"Mean absolute error predicting GAD sum on val set based on thresholds optimized on val set: {mean_error:.3}")
+test_preds = optimal_thresh(test['scores_anxiety'])
+mean_error_test = optimal_thresh.mean_cost(labels=test['gad'], preds=test_preds)
+print(f"Mean absolute error predicting GAD sum on test set based on thresholds optimized on val set: {mean_error_test:.3}")
+```
+# Acknowledgments
+This model was created through equal contributions by Oleksii Abramenko, Noah Stein, and Colin Vaz while at Kintsugi Health. For a full list of contributors to earlier modeling projects, data collection, clinical, and business matters, see the organization card at https://huggingface.co/KintsugiHealth.
+# References
+1. https://www.nimh.nih.gov/health/statistics/major-depression
+2. https://www.hopefordepression.org/depression-facts/
+3. https://nndc.org/facts/
+4. https://www.psychiatry.org/patients-families/stigma-and-discrimination
+5. https://www.sciencedirect.com/science/article/pii/S1746809423004536
+6. https://pmc.ncbi.nlm.nih.gov/articles/PMC3409931/
+7. https://pmc.ncbi.nlm.nih.gov/articles/PMC11559157

config.py ADDED Viewed

	@@ -0,0 +1,71 @@

+"""Configuration for running Kintsugi Depression and Anxiety model."""
+import torch
+EXPECTED_SAMPLE_RATE = 16000 # Audio sample rate in hertz
+# Configuration for running Kintsugi Depression and Anxiety model as intended
+default_config = {
+    # See featex.py for preprocessor config details
+    'preprocessor_config': {
+        'normalize_features': True,
+        'chunk_seconds': 30,
+        'max_overlap_frac': 0.0,
+        'pad_last_chunk_to_full': True,
+    },
+    # See model.py for backbone config details
+    'backbone_configs': {'audio': {'model': 'openai/whisper-small.en',
+                                   'hf_config': {'encoder_layerdrop': 0.0,
+                                                 'dropout': 0.0,
+                                                 'activation_dropout': 0.0},
+                                   'lora_params': {'r': 32,
+                                                   'lora_alpha': 64.0,
+                                                   'target_modules': 'all-linear',
+                                                   'lora_dropout': 0.4,
+                                                   'modules_to_save': ['conv1', 'conv2'],
+                                                   'bias': 'all'}},
+                         'llma': {'model': 'openai/whisper-small.en',
+                                  'hf_config': {'encoder_layerdrop': 0.0,
+                                                'dropout': 0.0,
+                                                'activation_dropout': 0.0}}},
+    # See model.py for classifier config details
+    'classifier_config': {'shared_projection_dim': [256, 64],
+                          'tasks': {'depression': {'proj_dim': 128, 'dropout': 0.4},
+                                    'anxiety': {'proj_dim': 128, 'dropout': 0.4}}},
+    # Score thresholds chosen to optimize macro average F1 score on validation set
+    'inference_thresholds': {
+        # Three-level depression severity model:
+        #             depression score <= -0.6699 --> no depression (PHQ-9 <= 9)
+        #   -0.6699 < depression score <= -0.2908 --> mild to moderate depression (10 <= PHQ-9 <= 14)
+        #   -0.2908 < depression score            --> severe depression (PHQ-9 >= 15)
+        'depression': [-0.6699, -0.2908],
+        # Four-level anxiety severity model:
+        #             anxiety score <= -0.7939 --> no anxiety (GAD-7 <= 4)
+        #   -0.7939 < anxiety score <= -0.2173 --> mild anxiety (5 <= GAD-7 <= 9)
+        #   -0.2173 < anxiety score <=  0.1521 --> moderate anxiety (10 <= GAD-7 <= 14)
+        #    0.1521 < anxiety score            --> severe anxiety (GAD-7 >= 15)
+        'anxiety': [-0.7939, -0.2173, 0.1521]
+    }
+}
+# Average filter bank energies used for feature normalization
+logmel_energies = torch.tensor([0.34912264, 0.58558977, 0.7912451 , 0.92767584, 0.98273695,
+       0.98439455, 0.9603633 , 0.93906444, 0.9366281 , 0.93200225,
+       0.916437  , 0.8928787 , 0.8637211 , 0.83265126, 0.79977655,
+       0.7778334 , 0.7561299 , 0.72997606, 0.70391226, 0.6800474 ,
+       0.65755   , 0.63536274, 0.61355984, 0.5923383 , 0.5720056 ,
+       0.55244887, 0.53684795, 0.5221597 , 0.5098636 , 0.49923953,
+       0.48908615, 0.47840047, 0.46758702, 0.47343993, 0.46268672,
+       0.4475126 , 0.46747103, 0.45131385, 0.4635319 , 0.44889897,
+       0.45491976, 0.4373785 , 0.43154317, 0.42194438, 0.41158468,
+       0.40096927, 0.3933149 , 0.38795966, 0.38441542, 0.38454026,
+       0.3815766 , 0.3768835 , 0.3719921 , 0.3654539 , 0.35399568,
+       0.3425986 , 0.32823247, 0.31404305, 0.30564603, 0.29617435,
+       0.29273877, 0.28560263, 0.27459458, 0.26876706, 0.25825337,
+       0.24759005, 0.24090728, 0.2344712 , 0.22529823, 0.20880115,
+       0.193578  , 0.18290243, 0.17621627, 0.17087021, 0.16641389,
+       0.15932252, 0.14312662, 0.11790597, 0.08030523, 0.03747071],
+)

dam3.1.ckpt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:cfa897e1b990de9377b2fb805b526a36fe6f31de01bbc8c3d288d317df2c4b0c
+size 736180146

featex.py ADDED Viewed

	@@ -0,0 +1,119 @@

+"""Preprocessing and normalization to prepare audio for Kintsugi Depression and Anxiety model."""
+from typing import Union, BinaryIO
+import numpy as np
+import os
+import torch
+import torchaudio
+from transformers import AutoFeatureExtractor
+from config import EXPECTED_SAMPLE_RATE, logmel_energies
+def load_audio(source: Union[BinaryIO, str, os.PathLike]) -> torch.Tensor:
+    """Load audio file, verify mono channel count, and resample if necessary.
+    Parameters
+    ----------
+    source: open file or path to file
+    Returns
+    -------
+    Time domain audio samples as a 1 x num_samples float tensor sampled at 16 kHz.
+    """
+    audio, fs = torchaudio.load(source)
+    if audio.shape[0] != 1:
+        raise ValueError(f"Provided audio has {audio.shape[0]} != 1 channels.")
+    if fs != EXPECTED_SAMPLE_RATE:
+        audio = torchaudio.functional.resample(audio, fs, EXPECTED_SAMPLE_RATE)
+    return audio
+class Preprocessor:
+    def __init__(self,
+                 normalize_features: bool = True,
+                 chunk_seconds: int = 30,
+                 max_overlap_frac: float = 0.0,
+                 pad_last_chunk_to_full: bool = True,
+    ):
+        """Create preprocessor object.
+        Parameters
+        ----------
+        normalize_features: Whether the Whisper preprocessor should normalize features
+        chunk_seconds: Size of model's receptive field in seconds
+        max_overlap_frac: Fraction of each chunk allowed to overlap previous chunk for inputs longer than chunk_seconds
+        pad_last_chunk_to_full: Whether to pad audio to an integer multiple of chunk_seconds
+        """
+        self.preprocessor = AutoFeatureExtractor.from_pretrained("openai/whisper-small.en")
+        self.normalize_features = normalize_features
+        self.chunk_seconds = chunk_seconds
+        self.max_overlap_frac = max_overlap_frac
+        self.pad_last_chunk_to_full = pad_last_chunk_to_full
+    def preprocess_with_audio_normalization(
+        self,
+        audio: torch.Tensor,
+    ) -> torch.Tensor:
+        """Run Whisper preprocessor and normalization expected by the model.
+        Note: some normalization steps can be avoided, but are included to match
+        feature extraction used during training.
+        Parameters
+        ----------
+        audio: Raw audio samples as a 1 x num_samples float tensor sampled at 16 kHz
+        Returns
+        -------
+        Normalized mel filter bank features as a float tensor of shape
+        num_chunks x 80 mel filter bands x 3000 time frames
+        """
+        # Remove DC offset and scale amplitude to [-1, 1]
+        audio = torch.squeeze(audio, 0)
+        audio = audio - torch.mean(audio)
+        audio = audio / torch.max(torch.abs(audio))
+        chunk_samples = EXPECTED_SAMPLE_RATE * self.chunk_seconds
+        if self.pad_last_chunk_to_full:
+            # pad audio so that the last chunk is not dropped
+            if self.max_overlap_frac > 0:
+                raise ValueError(
+                    f"pad_last_chunk_to_full is only supported for non-overlapping windows"
+                )
+            num_chunks = np.ceil(len(audio) / chunk_samples)
+            pad_size = int(num_chunks * chunk_samples - len(audio))
+            audio = torch.nn.functional.pad(audio, (0, pad_size))
+        overflow_len = len(audio) - chunk_samples
+        min_hop_samples = int(
+            (1 - self.max_overlap_frac) * chunk_samples
+        )
+        n_windows = 1 + overflow_len // min_hop_samples
+        window_starts = np.linspace(0, overflow_len, max(n_windows, 1)).astype(int)
+        features = self.preprocessor(
+            [
+                audio[start : start + chunk_samples].numpy(force=True)
+                for start in window_starts
+            ],
+            return_tensors="pt",
+            sampling_rate=EXPECTED_SAMPLE_RATE,
+            do_normalize=self.normalize_features,
+        )
+        for key in ("input_features", "input_values"):
+            if hasattr(features, key):
+                features = getattr(features, key)
+                break
+        mean_features = torch.mean(features, dim=-1)
+        # features are [batch, n_logmel_bins, n_frames]
+        rescale_factor = logmel_energies.unsqueeze(0) - mean_features
+        rescale_factor = rescale_factor.unsqueeze(2)
+        features += rescale_factor
+        return features

model.py ADDED Viewed

	@@ -0,0 +1,185 @@

+from typing import Any, Mapping, Optional
+import torch
+from peft import LoraConfig, get_peft_model
+from transformers import WhisperConfig, WhisperModel
+class WhisperEncoderBackbone(torch.nn.Module):
+    def __init__(
+        self,
+        model: str = "openai/whisper-small.en",
+        hf_config: Optional[Mapping[str, Any]] = None,
+        lora_params: Optional[Mapping[str, Any]] = None,
+    ):
+        """Whisper encoder model with optional Low-Rank Adaptation.
+        Parameters
+        ----------
+        model: Name of WhisperModel whose encoder to load from HuggingFace
+        hf_config: Optional config for HuggingFace model
+        lora_params: Parameters for Low-Rank Adaptation
+        """
+        super().__init__()
+        hf_config = hf_config if hf_config is not None else dict()
+        backbone_config = WhisperConfig.from_pretrained(model, **hf_config)
+        self.backbone = (
+            WhisperModel.from_pretrained(
+                model,
+                config=backbone_config,
+            )
+            .get_encoder()
+            .train()
+        )
+        if lora_params is not None and len(lora_params) > 0:
+            lora_config = LoraConfig(**lora_params)
+            self.backbone = get_peft_model(self.backbone, lora_config)
+        self.backbone_dim = backbone_config.hidden_size
+    def forward(self, whisper_feature_batch):
+        return self.backbone(whisper_feature_batch).last_hidden_state.mean(dim=1)
+class SharedLayers(torch.nn.Module):
+    def __init__(self, input_dim: int, proj_dims: list[int]):
+        """Fully connected network with Mish nonlinearities between linear layers. No nonlinearity at input or output.
+        Parameters
+        ----------
+        input_dim: Dimension of input features
+        proj_dims: Dimensions of layers to create
+        """
+        super().__init__()
+        modules = []
+        for output_dim in proj_dims[:-1]:
+            modules.extend([torch.nn.Linear(input_dim, output_dim), torch.nn.Mish()])
+            input_dim = output_dim
+        modules.append(torch.nn.Linear(input_dim, proj_dims[-1]))
+        self.shared_layers = torch.nn.Sequential(*modules)
+    def forward(self, x):
+        return self.shared_layers(x)
+class TaskHead(torch.nn.Module):
+    def __init__(self, input_dim: int, proj_dim: int, dropout: float = 0.0):
+        """Fully connected network with one hidden layer, dropout, and a scalar output."""
+        super().__init__()
+        self.linear = torch.nn.Linear(input_dim, proj_dim)
+        self.activation = torch.nn.Mish()
+        self.dropout = torch.nn.Dropout(dropout)
+        self.final_layer = torch.nn.Linear(proj_dim, 1, bias=False)
+    def forward(self, x):
+        x = self.linear(x)
+        x = self.activation(x)
+        x = self.dropout(x)
+        x = self.final_layer(x)
+        return x
+class MultitaskHead(torch.nn.Module):
+    def __init__(
+        self,
+        backbone_dim: int,
+        shared_projection_dim: list[int],
+        tasks: Mapping[str, Mapping[str, Any]],
+    ):
+        """Fully connected network with multiple named scalar outputs."""
+        super().__init__()
+        # Initialize the shared network and task-specific networks
+        self.shared_layers = SharedLayers(backbone_dim, shared_projection_dim)
+        self.classifier_head = torch.nn.ModuleDict(
+            {
+                task: TaskHead(shared_projection_dim[-1], **task_config)
+                for task, task_config in tasks.items()
+            }
+        )
+    def forward(self, x):
+        x = self.shared_layers(x)
+        return {task: head(x) for task, head in self.classifier_head.items()}
+def average_tensor_in_segments(tensor: torch.Tensor, lengths: list[int] | torch.Tensor):
+    """Average segments of a `tensor` along dimension 0 based on a list of `lengths`
+    For example, with input tensor `t` and `lengths` [1, 3, 2], the output would be
+    [t[0], (t[1] + t[2] + t[3]) / 3, (t[4] + t[5]) / 2]
+    Parameters
+    ----------
+    tensor : torch.Tensor
+        The tensor to average
+    lengths : list of ints
+        The lengths of each segment to average in the tensor, in order
+    Returns
+    -------
+    torch.Tensor
+        The tensor with relevant segments averaged
+    """
+    if not torch.is_tensor(lengths):
+        lengths = torch.tensor(lengths, device=tensor.device)
+    index = torch.repeat_interleave(
+        torch.arange(len(lengths), device=tensor.device), lengths
+    )
+    out = torch.zeros(
+        lengths.shape + tensor.shape[1:], device=tensor.device, dtype=tensor.dtype
+    )
+    out.index_add_(0, index, tensor)
+    broadcastable_lengths = lengths.view((-1,) + (1,) * (len(out.shape) - 1))
+    return out / broadcastable_lengths
+class Classifier(torch.nn.Module):
+    def __init__(
+        self,
+        backbone_configs: Mapping[str, Mapping[str, Any]],
+        classifier_config: Mapping[str, Any],
+        inference_thresholds: Mapping[str, Any],
+        preprocessor_config: Mapping[str, Any],
+    ):
+        """Full Kintsugi Depression and Anxiety model.
+        Whisper encoder -> Mean pooling over time -> Layers shared across tasks -> Per-task heads
+        Parameters
+        ----------
+        backbone_configs:
+        classifier_config:
+        inference_thresholds:
+        preprocessor_config:
+        """
+        super().__init__()
+        self.backbone = torch.nn.ModuleDict(
+            {
+                key: WhisperEncoderBackbone(**backbone_configs[key])
+                for key in sorted(backbone_configs.keys())
+            }
+        )
+        backbone_dim = sum(layer.backbone_dim for layer in self.backbone.values())
+        self.head = MultitaskHead(backbone_dim, **classifier_config)
+        self.inference_thresholds = inference_thresholds
+        self.preprocessor_config = preprocessor_config
+    def forward(self, x, lengths):
+        backbone_outputs = {
+            key: average_tensor_in_segments(layer(x), lengths)
+            for key, layer in self.backbone.items()
+        }
+        backbone_output = torch.cat(list(backbone_outputs.values()), dim=1)
+        return self.head(backbone_output), torch.ones_like(lengths)
+    def quantize_scores(self, scores: Mapping[str, torch.Tensor]) -> Mapping[str, torch.Tensor]:
+        """Map per-task scores to discrete predictions per `inference_thresholds` config."""
+        return {
+            key: torch.searchsorted(torch.tensor(self.inference_thresholds[key], device=value.device), value.mean(), out_int32=True)
+            for key, value in scores.items()
+        }

pipeline.py ADDED Viewed

	@@ -0,0 +1,43 @@

+import os
+from pathlib import Path
+from typing import Any, BinaryIO, Mapping, Optional, Union
+import torch
+from config import default_config
+from featex import load_audio, Preprocessor
+from model import Classifier
+class Pipeline:
+    def __init__(self, checkpoint: Optional[str | Path] = None, config: Optional[Mapping[str, Any]] = None, device: Optional[torch.device] = None):
+        if checkpoint is None:
+            file_dir = Path(__file__).parent.resolve()
+            checkpoint = file_dir / "dam3.1.ckpt"
+        if config is None:
+            config = default_config
+        if device is None:
+            if torch.cuda.is_available():
+                device = torch.device("cuda:0")
+            else:
+                device = torch.device("cpu")
+        self.device = device
+        self.model = Classifier(**config)
+        self.preprocessor = Preprocessor(**self.model.preprocessor_config)
+        state_dict = torch.load(checkpoint, map_location=device)
+        self.model.load_state_dict(state_dict)
+        self.model.to(self.device)
+        self.model.eval()
+    def run_on_features(self, features: torch.Tensor, quantize: bool = True):
+        scores = self.model(features, torch.tensor([features.shape[0]], device=self.device))[0]
+        if quantize:
+            return {k: int(v.item()) for k, v in self.model.quantize_scores(scores).items()}
+        else:
+            return scores
+    def run_on_audio(self, audio: torch.Tensor, quantize: bool = True):
+        features = self.preprocessor.preprocess_with_audio_normalization(audio)
+        return self.run_on_features(features.to(self.device), quantize=quantize)
+    def run_on_file(self, source: Union[BinaryIO, str, os.PathLike], quantize=True):
+        audio = load_audio(source)
+        return self.run_on_audio(audio, quantize=quantize)

requirements.txt ADDED Viewed

	@@ -0,0 +1,5 @@

+pytorch~=2.6.0
+pysoundfile~=0.13.1
+torchaudio~=2.7.0
+transformers~=4.52.3
+peft~=0.15.2

tuning/__init__.py ADDED Viewed

File without changes

tuning/indet_roc.py ADDED Viewed

	@@ -0,0 +1,416 @@

+"""Tools for tuning pairs of scalar thresholds to trade off sensitivity, specificity, and indeterminate rate.
+See `IndetSnSpArray` and subclass docstrings for details.
+"""
+from dataclasses import asdict, dataclass
+from typing import NamedTuple, Optional
+from typing_extensions import Self  # in typing in python3.11
+import numpy as np
+def running_argmax_indices(a):
+    """Return indices of a where the value is larger than all previous values.
+    >>> running_argmax_indices([1, 0, 3, 4, 4, 2, 5, 7, 1])
+    array([0, 2, 3, 6, 7])
+    """
+    m = np.maximum.accumulate(a)
+    return np.flatnonzero(np.r_[True, m[:-1] < m[1:]])
+def pareto_2d_indices(x, y):
+    """Compute indices of the Pareto frontier maximizing x and y, sorted in increasing x and decreasing y.
+    e.g. the Pareto frontier of the point set below is [A, G]
+    B A
+     C
+    E D
+    F  G
+     H
+    >>> u = [2, 0, 1, 2, 0, 0, 3, 1]
+    >>> v = [4, 4, 3, 2, 2, 1, 1, 0]
+    >>> pareto_2d_indices(np.array(u), np.array(v))
+    array([0, 6])
+    """
+    sort_indices = np.lexsort((-x, -y))  # last element is primary sort key
+    return sort_indices[running_argmax_indices(x[sort_indices])]
+def midpoints_with_infs(x):
+    """Return the midpoints between the sorted unique elements of x, along with +/-inf."""
+    unique_scores = np.unique(np.r_[-np.inf, x, np.inf])
+    return (unique_scores[1:] + unique_scores[:-1]) / 2
+def kde_disc_mass(
+    data: np.ndarray,
+    edges: np.ndarray,
+    bandwidth: float,
+    weights: Optional[np.ndarray] = None,
+):
+    """Perform Kernel Density Estimation (KDE) on data & weights, then compute the probability mass between edges."""
+    import scipy
+    z_score = (edges[:, None] - data[None, :]) / bandwidth
+    component_cdfs = scipy.stats.norm.cdf(z_score)
+    if weights is None:
+        weights = np.ones_like(data)
+    cdf = np.dot(component_cdfs, weights / weights.sum())
+    return np.diff(cdf)
+class BinaryLabeledScores(NamedTuple):
+    """An array of numeric scores along with associated 0/1 ground truth and optional numeric weights."""
+    y_score: np.ndarray
+    y_true: np.ndarray
+    weights: Optional[np.ndarray] = None
+    def smooth(
+        self, num_points: int, bandwidth: float, padding_bandwidths: float = 5.0
+    ) -> "BinaryLabeledScores":
+        """KDE-smooth positive and negative scores separately and discretize each to equally spaced weighted points.
+        Args:
+            num_points: number of points to use each for positive and negative score discretizations
+            bandwidth: bandwidth of kernel density estimation, i.e. standard deviation of noise to be added
+            padding_bandwidths: number of bandwidths to extend past lowest and highest scores when selecting
+                discretization endpoints
+        Returns:
+            `BinaryLabeledScores` object representing the smoothed and re-discretized weighted labeled scores
+        """
+        pos = self.y_true == 1
+        neg = self.y_true == 0
+        if self.weights is not None:
+            pos_weights = self.weights[pos]
+            neg_weights = self.weights[neg]
+        else:
+            pos_weights = None
+            neg_weights = None
+        padding = padding_bandwidths * bandwidth
+        all_points = np.linspace(
+            self.y_score.min() - padding,
+            self.y_score.max() + padding,
+            2 * num_points + 1,
+        )
+        edges = all_points[::2]
+        centers = all_points[1::2]
+        pos_kde_weights = kde_disc_mass(
+            self.y_score[pos], edges, bandwidth, pos_weights
+        )
+        neg_kde_weights = kde_disc_mass(
+            self.y_score[neg], edges, bandwidth, neg_weights
+        )
+        return BinaryLabeledScores(
+            y_true=np.r_[np.zeros_like(centers), np.ones_like(centers)],
+            y_score=np.r_[centers, centers],
+            weights=np.r_[neg_kde_weights, pos_kde_weights],
+        )
+    def indet_sn_sp_array(self) -> "IndetSnSpArray":
+        """Build `IndetSnSpArray`."""
+        return IndetSnSpArray.build(**self._asdict())
+def fake_vectorized_binom_ci(
+    k: np.ndarray, n: np.ndarray, p: float | np.ndarray = 0.95
+) -> tuple[np.ndarray, np.ndarray]:
+    """Compute binomial confidence intervals on arrays of parameters inefficiently."""
+    import scipy
+    k, n, p = np.broadcast_arrays(k, n, p)
+    # If speed is needed this can be rewritten with the statsmodels package, which is vectorized.
+    flat_out = [
+        scipy.stats.binomtest(k_, n_).proportion_ci(p_)
+        for k_, n_, p_ in zip(k.flatten(), n.flatten(), p.flatten())
+    ]
+    low = np.array([ci.low for ci in flat_out]).reshape(k.shape)
+    high = np.array([ci.high for ci in flat_out]).reshape(k.shape)
+    return low, high
+@dataclass
+class IndetSnSpArray:
+    """An array of metrics at different lower and upper threshold values.
+    This class and subclasses are for selecting pairs of model thresholds based on sensitivity,
+    specificity, and indeterminate rate. Throughout we assume scores are scalars and ground truth is binary.
+    Each member `lower_thresh`, `upper_thresh`, `sn`, `sp`, and `indet_frac` must be a numpy array, and they all must
+    have the same shape. Corresponding entries of these arrays specify a pair of thresholds and the metrics when a
+    common dataset is evaluated using those thresholds. The thresholding logic is that scores less than the lower
+    threshold count as negative outputs, scores greater than or equal to the upper threshold count as positive outputs,
+    and scores in between are indeterminate outputs.
+    Indeterminate fraction is defined as the proportion of scores in between the two thresholds. All other
+    metrics are interpreted as conditioned on the scores not being indeterminate. For example, sensitivity
+    is defined as usual as (true positives) / (total positives) *except that examples with indeterminate
+    scores do not count towards the numerator or the denominator*.
+    """
+    lower_thresh: np.ndarray
+    upper_thresh: np.ndarray
+    tp: np.ndarray
+    fp: np.ndarray
+    tn: np.ndarray
+    fn: np.ndarray
+    indet: np.ndarray
+    weighted: bool = False
+    min_weight: float = 1.0
+    eps: float = 1e-8
+    def __post_init__(self):
+        self.sn = self.tp / np.maximum(self.tp + self.fn, self.min_weight)
+        self.sp = self.tn / np.maximum(self.tn + self.fp, self.min_weight)
+        self.ppv = self.tp / np.maximum(self.tp + self.fp, self.min_weight)
+        self.npv = self.tn / np.maximum(self.tn + self.fn, self.min_weight)
+        total = self.indet + self.fn + self.fp + self.tn + self.tp
+        self.indet_frac = self.indet / np.maximum(total, self.min_weight)
+        for attr in ("sn", "sp", "ppv", "npv", "indet_frac"):
+            value = getattr(self, attr)
+            if value.size:
+                if value.max() > 1 + self.eps:
+                    raise ValueError(
+                        f"Numerical precision issues produced invalid value {attr} = {value.max()}."
+                    )
+                if value.min() < -self.eps:
+                    raise ValueError(
+                        f"Numerical precision issues produced invalid value {attr} = {value.min()}."
+                    )
+            setattr(self, attr, np.clip(value, 0.0, 1.0))
+    @property
+    def min_sn_sp(self):
+        return np.minimum(self.sn, self.sp)
+    @classmethod
+    def build(
+        cls,
+        lower_thresh: Optional[np.ndarray] = None,
+        upper_thresh: Optional[np.ndarray] = None,
+        *,
+        y_true: np.ndarray,
+        y_score: np.ndarray,
+        weights: Optional[np.ndarray] = None,
+        eps: float = 1e-8,
+    ) -> Self:
+        """Find `IndetSnSpArray` values for given truth and scores as thresholds vary (à la sklearn.metrics.roc_curve).
+        The output object contains arrays for `sn`, `sp`, `indet_frac`, `lower_thresh`, and `upper_thresh`, all with the
+        same shape. What these arrays contain and what their common shape is depends on the input as follows.
+        If both lower_thresh and upper_thresh are provided, they must have the same shape and this method computes
+        metrics at the pairs given by corresponding entries in these arrays. The common output shape will be the same as
+        this common input shape.
+        If only one set of thresholds is provided, this method computes metrics at all sorted pairs of these thresholds
+        (along with +/- inf). If neither is provided, sort scores and allow thresholds between each pair (along with
+        +/- inf). In both of these cases, the common output shape is a 1-d vector of length equal to the number of such
+        pairs.
+        """
+        weights = weights if weights is not None else np.ones_like(y_true)
+        y_true = y_true[weights > 0]
+        y_score = y_score[weights > 0]
+        weights = weights[weights > 0]
+        # Find all threshes and include +/- inf so np.histogram does the right thing
+        if lower_thresh is not None and upper_thresh is not None:
+            threshes = np.unique(np.r_[-np.inf, lower_thresh, upper_thresh, np.inf])
+            lower_indices = np.searchsorted(threshes, lower_thresh)
+            upper_indices = np.searchsorted(threshes, upper_thresh)
+        else:
+            if lower_thresh is not None:
+                threshes = np.unique(np.r_[-np.inf, lower_thresh, np.inf])
+            elif upper_thresh is not None:
+                threshes = np.unique(np.r_[-np.inf, upper_thresh, np.inf])
+            else:
+                unique_scores = np.unique(np.r_[-np.inf, y_score, np.inf])
+                threshes = (unique_scores[1:] + unique_scores[:-1]) / 2
+                threshes = np.unique(threshes)
+            lower_indices, upper_indices = np.triu_indices(len(threshes))
+        count_by_bin = np.histogram(y_score, bins=threshes, weights=weights)[0]
+        pos_by_bin = np.histogram(y_score, bins=threshes, weights=y_true * weights)[0]
+        count_by_thresh = np.pad(np.cumsum(count_by_bin), (1, 0))
+        pos_by_thresh = np.pad(np.cumsum(pos_by_bin), (1, 0))
+        tn_plus_fn = count_by_thresh[lower_indices]
+        total_minus_tp_minus_fp = count_by_thresh[upper_indices]
+        tp_plus_fp = count_by_thresh[-1] - total_minus_tp_minus_fp
+        fn = pos_by_thresh[lower_indices]
+        total_pos = pos_by_thresh[-1]  # last thresh is +inf
+        tp = total_pos - pos_by_thresh[upper_indices]
+        fp = tp_plus_fp - tp
+        tn = tn_plus_fn - fn
+        min_weight = weights.min()
+        indet = total_minus_tp_minus_fp - tn_plus_fn
+        return cls(
+            lower_thresh=threshes[lower_indices],
+            upper_thresh=threshes[upper_indices],
+            tp=tp,
+            fp=fp,
+            tn=tn,
+            fn=fn,
+            indet=indet,
+            weighted=not all(weights == 1.0),
+            min_weight=min_weight,
+            eps=eps,
+        )
+    def eval(
+        self,
+        *,
+        y_true,
+        y_score,
+        weights: Optional[np.ndarray] = None,
+    ) -> "IndetSnSpArray":
+        """Evaluate the given data on the thresholds of `self`."""
+        return IndetSnSpArray.build(
+            lower_thresh=self.lower_thresh,
+            upper_thresh=self.upper_thresh,
+            y_true=y_true,
+            y_score=y_score,
+            weights=weights,
+        )
+    def __getitem__(self, item) -> "IndetSnSpArray":
+        """Extract a subarray with numpy-style indexing."""
+        return IndetSnSpArray(
+            lower_thresh=self.lower_thresh[item],
+            upper_thresh=self.upper_thresh[item],
+            tp=self.tp[item],
+            fp=self.fp[item],
+            fn=self.fn[item],
+            tn=self.tn[item],
+            indet=self.indet[item],
+            weighted=self.weighted,
+            min_weight=self.min_weight,
+            eps=self.eps,
+        )
+    def __add__(self, other: "IndetSnSpArray") -> "IndetSnSpArray":
+        if not isinstance(other, IndetSnSpArray):
+            raise TypeError(f"Cannot add {type(other)} to IndetSnSpArray.")
+        tp = self.tp + other.tp
+        if np.array_equal(self.lower_thresh, other.lower_thresh):
+            lower_thresh = self.lower_thresh
+        else:
+            lower_thresh = np.nan * np.ones_like(tp)
+        if np.array_equal(self.upper_thresh, other.upper_thresh):
+            upper_thresh = self.upper_thresh
+        else:
+            upper_thresh = np.nan * np.ones_like(tp)
+        return IndetSnSpArray(
+            lower_thresh=lower_thresh,
+            upper_thresh=upper_thresh,
+            tp=tp,
+            fp=self.fp + other.fp,
+            fn=self.fn + other.fn,
+            tn=self.tn + other.tn,
+            indet=self.indet + other.indet,
+            weighted=self.weighted or other.weighted,
+            min_weight=min(self.min_weight, other.min_weight),
+            eps=self.eps,
+        )
+    def confidence_interval_bound(self, p: float = 0.95) -> "IndetSnSpArray":
+        """Compute two-sided confidence interval bounds: upper for indet_frac and lower for other metrics."""
+        if self.weighted:
+            raise NotImplementedError(
+                "Confidence intervals only implemented for unweighted confusion matrices."
+            )
+        copy = IndetSnSpArray(**asdict(self))
+        copy.sn, _ = fake_vectorized_binom_ci(self.tp, self.tp + self.fn, p=p)
+        copy.sp, _ = fake_vectorized_binom_ci(self.tn, self.tn + self.fp, p=p)
+        copy.ppv, _ = fake_vectorized_binom_ci(self.tp, self.fp + self.tp, p=p)
+        copy.npv, _ = fake_vectorized_binom_ci(self.tn, self.tn + self.fn, p=p)
+        _, copy.indet_frac = fake_vectorized_binom_ci(
+            self.indet, self.tp + self.fn + self.fp + self.tn + self.indet, p=p
+        )
+        return copy
+    def roc_curve(self, indet_budget=0.0) -> "IndetRocCurve":
+        """Compute ROC curve with indeterminate budget, sorted by increasing sn and decreasing sp.
+        Restrict `self` to Pareto-optimal pairs (sn, sp) for which `indet_frac <= indet_budget`. Other points are worse
+        than the points on the curve in the sense of having worse Sn, worse Sp, or not meeting the indeterminate budget.
+        """
+        within_budget = self[self.indet_frac <= indet_budget]
+        frontier = pareto_2d_indices(within_budget.sn, within_budget.sp)
+        return IndetRocCurve(**asdict(within_budget[frontier]))
+    def sn_eq_sp_graph(self) -> "IndetSnEqSpGraph":
+        """Compute sn=sp as a function of indet_frac, returning both sorted in increasing order.
+        Method: restrict to Pareto-optimal pairs (s, indet_frac) where s = min(sn, sp).
+        Pareto-optimality means that if (s, indet_frac) is in the output, there is no point (sn', sp', indet_frac') in
+        the input with indet_frac' <= indet_frac and sn', sp' > s. In other words, s is the maximum value such that
+        the quadrant { sn, sp >= s} intersects `self.roc_curve(indet_frac)`. This maximum occurs where the ROC curve
+        intersects the diagonal, up to an error bounded by the distance between points on the ROC curve.
+        """
+        frontier = pareto_2d_indices(self.min_sn_sp, -self.indet_frac)
+        return IndetSnEqSpGraph(**asdict(self[frontier]))
+class IndetRocCurve(IndetSnSpArray):
+    """Sn, Sp achievable within some indeterminate budget and associated lower and upper thresholds.
+    `sn` is assumed to be sorted in increasing order and `sp` decreasing.
+    """
+    def sn_eq_sp(self) -> IndetSnSpArray:
+        """Locate the point on the ROC curve closest to the diagonal"""
+        return self[np.argmax(self.min_sn_sp)]
+    def auc(self) -> float:
+        """Compute the area under the ROC curve."""
+        # `auc` does not automatically include the trivial points (0, 1) and (1, 0)
+        # and will underestimate the AUC if these are not explicitly added
+        from sklearn.metrics import auc
+        return auc(1 - np.r_[1.0, self.sp, 0.0], np.r_[0.0, self.sn, 1.0])
+    @classmethod
+    def build(
+        cls,
+        thresh=None,
+        *,
+        y_true: np.ndarray,
+        y_score: np.ndarray,
+        weights: Optional[np.ndarray] = None,
+    ) -> Self:
+        """Build an indeterminate=0 ROC curve in n log n time (vs n**2 for IndetSnSpArray.build().roc_curve())."""
+        if thresh is None:
+            thresh = midpoints_with_infs(y_score)[
+                ::-1
+            ]  # reverse for proper output sorting
+        issa = IndetSnSpArray.build(
+            lower_thresh=thresh,
+            upper_thresh=thresh,
+            y_true=y_true,
+            y_score=y_score,
+            weights=weights,
+        )
+        return cls(**asdict(issa))
+class IndetSnEqSpGraph(IndetSnSpArray):
+    """Sn=Sp achievable as a function of indeterminate budget and associated lower and upper thresholds.
+    Both min(self.sn, self.sp) and self.indet_frac are assumed to be sorted in non-decreasing order.
+    """
+    def at_budget(self, indet_budget: float = 0.0) -> IndetSnSpArray:
+        """Locate the best point on the graph within the given budget."""
+        return self[np.searchsorted(self.indet_frac, indet_budget, side="right") - 1]

tuning/optimal_ordinal.py ADDED Viewed

	@@ -0,0 +1,510 @@

+"""Tools for choosing multiple thresholds optimally under various decision criteria via dynamic programming.
+The abstract machinery is contained in the classes:
+- `OrdinalThresholding`
+- `OptimalOrdinalThresholdingViaDynamicProgramming`
+- `OptimalCostPerSampleOrdinalThresholding`
+- `ClassWeightedOptimalCostPerSampleOrdinalThresholding`
+- `OptimalCostPerClassOrdinalThresholding`
+These can be subclassed to efficiently implement new decision criteria, depending on their structure.
+The main intended user-facing classes are the subclasses implementing different decision criteria:
+- `MaxAccuracyOrdinalThresholding`
+- `MaxMacroRecallOrdinalThresholding`
+- `MinAbsoluteErrorOrdinalThresholding`
+- `MaxMacroPrecisionOrdinalThresholding`
+- `MaxMacroF1OrdinalThresholding`
+"""
+from abc import ABC, abstractmethod
+from typing import Literal, Optional, Union
+import torch
+class OrdinalThresholding(torch.nn.Module):
+    """Basic 1d thresholding logic."""
+    def __init__(self, num_classes: int):
+        """Init thresholding module with the specified number of classes (one more than the number of thresholds)."""
+        super().__init__()
+        self.num_classes = num_classes
+        self.register_buffer("thresholds", torch.zeros(num_classes - 1))
+        self.thresholds: torch.Tensor
+    def is_valid(self) -> bool:
+        """Check whether the thresholds are monotone non-decreasing."""
+        return all(torch.greater_equal(self.thresholds[1:], self.thresholds[:-1]))
+    def forward(self, scores) -> torch.Tensor:
+        """Find which thresholds each score lies between."""
+        return torch.searchsorted(self.thresholds, scores)
+    def tune_thresholds(
+        self,
+        *,
+        scores: torch.Tensor,
+        labels: torch.Tensor,
+        available_thresholds: Optional[torch.Tensor] = None,
+    ) -> torch.Tensor:
+        """Adapt the thresholds to the given data.
+        This is essentially an abstract method, but for testing purposes it's helpful to be able to instantiate the
+        class with a no-op version.
+        Parameters
+        ----------
+        scores : a vector of `float` scores for each example in the validation set
+        labels : a vector of `int` labels having the same shape as `scores` containing the corresponding labels
+        available_thresholds : a vector of `float` score values over which to optimize choice of thresholds;
+            `None`, then thresholds between every score in the validation set are allowed. +/- inf are always allowed.
+        Returns
+        -------
+        scalar `float` mean cost on the validation set using optimal thresholds
+        """
+class OptimalOrdinalThresholdingViaDynamicProgramming(OrdinalThresholding, ABC):
+    """Super-class for general dynamic programming implementations of ordinal threshold tuning.
+    Subclasses implement different ways of computing the mean cost and corresponding DP step.
+    """
+    direction: Literal["min", "max"]  # provided by subclasses
+    def __init__(self, num_classes: int):
+        super().__init__(num_classes=num_classes)
+        if self.direction not in ("min", "max"):
+            raise ValueError(
+                f"Got direction {self.direction!r}, expected 'min' or 'max'."
+            )
+    @abstractmethod
+    def mean_cost(
+        self, *, labels: torch.Tensor, preds: Union[int, torch.Tensor]
+    ) -> torch.Tensor:
+        """Compute the mean cost of assigning label(s) `preds` when the ground truth is `labels`."""
+    def best_constant_output_classifier(self, labels: torch.Tensor):
+        """Find the optimal mean cost of a constant-output classifier for given `labels` and the associated constant."""
+        if self.direction == "min":
+            optimize = torch.min
+        else:
+            optimize = torch.max
+        optimum = optimize(
+            torch.tensor(
+                [
+                    self.mean_cost(labels=labels, preds=c)
+                    for c in range(self.num_classes)
+                ],
+                device=labels.device,
+            ),
+            0,
+        )
+        return optimum.values, optimum.indices
+    @abstractmethod
+    def dp_step(
+        self,
+        c_idx: int,
+        *,
+        scores: torch.Tensor,
+        labels: torch.Tensor,
+        available_thresholds: torch.Tensor,
+        prev_cost: Optional[torch.Tensor] = None,
+    ) -> (torch.Tensor, Optional[torch.Tensor]):
+        """Given optimal cost `prev_cost` of classes < `c_idx`, optimize cost of `c_idx` as a function of threshold.
+        Arguments
+        ---------
+            c_idx : current class index
+            scores, labels, available_thresholds : see `tune_thresholds`
+            prev_cost (optional float tensor) : optimal cost of classes < `c_idx` as a function of upper threshold
+                for class `c_idx - 1`; ignored if `c_idx == 0`
+        Returns
+        -------
+            cost: `cost[i]` is for choosing upper threshold of class `c_idx` equal to `available_thresholds[i]`
+                when thresholds for lower classes are chosen optimally
+            indices : to achieve `cost[i]`, optimal upper threshold for class `c_idx - 1` is
+                `available_thresholds[indices[i]]`; `None` if `c_idx == 0`
+        """
+    def tune_thresholds(
+        self,
+        *,
+        scores: torch.Tensor,
+        labels: torch.Tensor,
+        available_thresholds: Optional[torch.Tensor] = None,
+    ) -> torch.Tensor:
+        """Set `self.thresholds` to optimize mean cost of given `scores` and `labels`.
+        Arguments
+        ---------
+            scores (1d float tensor) : scores of examples on tuning dataset
+            labels (1d int tensor) : labels in {0, ..., self.num_classes - 1} of same shape as scores
+            available_thresholds (optional 1d float tensor) : thresholds which will be considered when tuning.
+                +/-inf will be added automatically to ensure all examples are classified. If omitted, will
+                insert thresholds between each element of sorted(unique(scores)).
+        Returns
+        -------
+            float tensor : optimal mean cost achieved on the provided dataset at the tuned `self.thresholds`
+        """
+        inf = torch.tensor([torch.inf], device=scores.device)
+        if available_thresholds is None:  # use all possible thresholds
+            unique_scores = torch.unique(scores)
+            available_thresholds = (unique_scores[:-1] + unique_scores[1:]) / 2.0
+        # Always allow some classes to be omitted entirely by setting thresholds to +/- inf.
+        # This simplifies the algorithm and also guarantees that the baseline constant-output
+        # classifiers are feasible choices for tuning, which is needed to assure that the
+        # optimum is at least as good as a constant-output classifier.
+        available_thresholds = torch.concatenate(
+            [
+                -inf,
+                available_thresholds,
+                inf,
+            ]
+        )
+        indices = torch.empty(
+            (self.num_classes - 2, len(available_thresholds)),
+            dtype=torch.int,
+            device=scores.device,
+        )
+        # cost[j] = optimal total cost of items assigned pred <= c if the
+        # threshold between class c and c+1 is available_thresholds[j] (by appropriate choice of lower thresholds).
+        cost, _ = self.dp_step(
+            c_idx=0,
+            scores=scores,
+            labels=labels,
+            available_thresholds=available_thresholds,
+        )
+        for c in range(1, self.num_classes - 1):
+            cost, indices[c - 1, :] = self.dp_step(
+                c_idx=c,
+                scores=scores,
+                labels=labels,
+                available_thresholds=available_thresholds,
+                prev_cost=cost,
+            )
+        cost, best_index = self.dp_step(
+            c_idx=self.num_classes - 1,
+            scores=scores,
+            labels=labels,
+            available_thresholds=available_thresholds,
+            prev_cost=cost,
+        )
+        if self.direction == "min":
+            cost *= -1
+        # Follow DP path backwards to find thresholds which optimized cost
+        self.thresholds[self.num_classes - 2] = available_thresholds[
+            best_index
+        ]  # final threshold
+        for c in range(self.num_classes - 3, -1, -1):  # counting down to zero
+            best_index = indices[c, best_index.long()]
+            self.thresholds[c] = available_thresholds[best_index.long()]
+        return cost
+def cumsum_with_0(t: torch.Tensor):
+    return torch.nn.functional.pad(torch.cumsum(t, dim=0), (1, 0))
+class OptimalCostPerSampleOrdinalThresholding(
+    OptimalOrdinalThresholdingViaDynamicProgramming, ABC
+):
+    """Optimal 1d thresholding based on tuning thresholds to optimize the mean of a sample-wise cost function."""
+    @abstractmethod
+    def cost(self, *, labels: torch.Tensor, preds: Union[int, torch.Tensor]):
+        """Compute the sample-wise cost of assigning label(s) `preds` when the ground truth is `labels`."""
+    def mean_cost(
+        self, *, labels: torch.Tensor, preds: Union[int, torch.Tensor]
+    ) -> torch.Tensor:
+        """Compute the mean cost of assigning label(s) `preds` when the ground truth is `labels`."""
+        return torch.mean(self.cost(labels=labels, preds=preds))
+    def dp_step(
+        self,
+        c_idx: int,
+        *,
+        scores: torch.Tensor,
+        labels: torch.Tensor,
+        available_thresholds: torch.Tensor,
+        prev_cost: Optional[torch.Tensor] = None,
+    ) -> (torch.Tensor, Optional[torch.Tensor]):
+        """O(len(scores)) implementation for per-sample cost."""
+        # Compute running_cost[i] = sum of costs of elements with score less than available_thresholds[i] if assigned label c
+        item_costs = self.cost(labels=labels, preds=c_idx) / len(scores)
+        if self.direction == "min":
+            item_costs *= -1
+        # move tensors to and from CPU because histogram has no CUDA implementation
+        cost_new_class_by_thresh, _ = torch.histogram(
+            scores.cpu().float(),
+            weight=item_costs.cpu().float(),
+            bins=available_thresholds.cpu().float(),
+        )
+        running_cost = cumsum_with_0(cost_new_class_by_thresh.to(labels.device))
+        # Combine with running_cost with prev_cost
+        if c_idx == 0:
+            return running_cost, None
+        diff = prev_cost - running_cost
+        cummax = torch.cummax(diff, dim=0)
+        cost = running_cost + cummax.values
+        if c_idx == self.num_classes - 1:
+            # -1 to always set the *upper* threshold for class `num_classes - 1` to include the rest of the data
+            return cost[-1], cummax.indices[-1]
+        return cost, cummax.indices
+class MaxAccuracyOrdinalThresholding(OptimalCostPerSampleOrdinalThresholding):
+    """Threshold to maximize accuracy."""
+    direction = "max"
+    def cost(self, *, labels: torch.Tensor, preds: Union[int, torch.Tensor]):
+        return torch.eq(labels, preds).float()
+class MaxMacroRecallOrdinalThresholding(OptimalCostPerSampleOrdinalThresholding):
+    """Threshold to maximize macro-averaged recall."""
+    direction = "max"
+    def cost(self, *, labels: torch.Tensor, preds: Union[int, torch.Tensor]):
+        counts = torch.bincount(labels, minlength=self.num_classes).float()
+        ratios = counts.sum() / (self.num_classes * counts)
+        return torch.eq(labels, preds).float() * torch.gather(
+            ratios, 0, labels.type(torch.int64)
+        )
+class MinAbsoluteErrorOrdinalThresholding(OptimalCostPerSampleOrdinalThresholding):
+    """Threshold to minimize mean absolute error."""
+    direction = "min"
+    def cost(self, *, labels: torch.Tensor, preds: Union[int, torch.Tensor]):
+        return torch.abs(preds - labels).float()
+class ClassWeightedOptimalCostPerSampleOrdinalThresholding(
+    OptimalCostPerSampleOrdinalThresholding
+):
+    """Compute cost weighted equally over classes instead of equally over samples.
+    This class takes another instance of OptimalCostPerSampleOrdinalThresholding
+    which computes its cost independently for each sample and reweights the cost
+    based on label frequencies.
+    Note: this class depends on an implementation detail of its superclass:
+      namely calling `self.cost` with the full tuning or eval set of labels,
+      rather than a single label. This is required to do the re-weighting properly.
+    """
+    def __init__(self, unweighted_instance: OptimalCostPerSampleOrdinalThresholding):
+        self.direction = unweighted_instance.direction
+        super().__init__(unweighted_instance.num_classes)
+        self.unweighted_instance = unweighted_instance
+    def cost(self, *, labels: torch.Tensor, preds: Union[int, torch.Tensor]):
+        counts = torch.bincount(labels, minlength=self.num_classes)
+        (indices,) = torch.where(counts == 0)
+        if len(indices) > 0:
+            raise ValueError(
+                f"Cannot compute class-weighted cost because classes {set(indices.tolist())} are missing."
+            )
+        unweighted_cost = self.unweighted_instance.cost(labels=labels, preds=preds)
+        weights = len(labels) / (self.num_classes * counts[labels].float())
+        return weights * unweighted_cost
+class OptimalCostPerClassOrdinalThresholding(
+    OptimalOrdinalThresholdingViaDynamicProgramming, ABC
+):
+    """General DP case for when the linear algorithm for per-sample costs is not applicable.
+    Complexity depends on the implementation of `cost_matrix`.
+    """
+    @abstractmethod
+    def cost_matrix(
+        self,
+        c_idx: int,
+        *,
+        scores: torch.Tensor,
+        labels: torch.Tensor,
+        available_thresholds: torch.Tensor,
+        start: bool,
+        end: bool,
+    ) -> torch.Tensor:
+        """Each output[i, j] = cost for when scores in range `available_thresholds[i:j]` are assigned label `c_idx`."""
+    def mean_cost(
+        self, *, labels: torch.Tensor, preds: Union[int, torch.Tensor]
+    ) -> torch.Tensor:
+        """Compute the mean cost of assigning label(s) `preds` when the ground truth is `labels`."""
+        if isinstance(preds, int) or preds.numel() == 1:
+            preds = preds * torch.ones_like(labels, dtype=torch.int)
+        total_cost = torch.tensor(0.0, device=labels.device)
+        for c_idx in range(self.num_classes):
+            thresholds = torch.tensor([c_idx - 0.5, c_idx + 0.5], device=labels.device)
+            total_cost += self.cost_matrix(
+                c_idx, preds.float(), labels, thresholds, start=True, end=True
+            )[0, 0]
+        return total_cost / self.num_classes
+    def dp_step(
+        self,
+        c_idx: int,
+        *,
+        scores: torch.Tensor,
+        labels: torch.Tensor,
+        available_thresholds: torch.Tensor,
+        prev_cost: Optional[torch.Tensor] = None,
+    ) -> (torch.Tensor, Optional[torch.Tensor]):
+        cost_matrix = (
+            self.cost_matrix(
+                c_idx,
+                scores=scores,
+                labels=labels,
+                available_thresholds=available_thresholds,
+                start=c_idx == 0,
+                end=c_idx == self.num_classes - 1,
+            )
+            / self.num_classes
+        )
+        if self.direction == "min":
+            cost_matrix *= -1
+        if prev_cost is not None:
+            cost_matrix += prev_cost[:, None]
+        max_ = torch.max(cost_matrix, dim=0)
+        return max_.values, max_.indices
+def _compute_metrics_matrices(
+    scores: torch.Tensor,
+    binary_labels: torch.Tensor,
+    thresholds: torch.Tensor,
+    start: bool = False,
+    end: bool = False,
+) -> tuple[torch.Tensor, torch.Tensor]:
+    """Each output[i, j] = stats for when scores between thresholds[i] and thresolds[j] are assigned `True`.
+    Helper function for `MaxMacroPrecisionOrdinalThresholding` and `MaxMacroF1OrdinalThresholding`
+    Computed in O(len(thresholds)**2 + len(scores)*log(len(thresholds))) operations instead of the naive
+    O(len(scores)*len(thresholds)**2) operations to compute each element of the output independently.
+    Arguments
+    ---------
+        scores (float Tensor) : scores of labeled examples for which to compute metrics
+        binary_labels (bool Tensor) : corresponding binary labels of same shape as `scores`
+        thresholds (float Tensor) : thresholds between which to compute metrics
+        start : compute only the first row of the output (lower threshold at its minimum value)
+        end : compute only the last column of the output (upper threshold at its maximum value)
+    Returns
+    -------
+        tp : tp[i, j] = number of true positives if scores between thresholds[i:j] are classified as positive
+        tp_plus_fp: tp_plus_fp[i, j] = number of scores between thresholds[i:j]
+    """
+    # move tensors to and from CPU because histogram has no CUDA implementation
+    scores = scores.float().cpu()
+    thresholds = thresholds.float().cpu()
+    labeled_true_by_thresh, _ = torch.histogram(
+        scores,
+        weight=binary_labels.float().cpu(),
+        bins=thresholds,
+    )
+    count_by_thresh, _ = torch.histogram(
+        scores,
+        bins=thresholds,
+    )
+    running_labeled_true_by_thresh = cumsum_with_0(
+        labeled_true_by_thresh.to(binary_labels.device)
+    )
+    running_count_by_thresh = cumsum_with_0(
+        count_by_thresh.to(binary_labels.device).float()
+    )
+    def start_slice(t):
+        return t[: (1 if start else None), None]
+    def end_slice(t):
+        return t[None, (-1 if end else None) :]
+    tp = end_slice(running_labeled_true_by_thresh) - start_slice(
+        running_labeled_true_by_thresh
+    )
+    tp_plus_fp = end_slice(running_count_by_thresh) - start_slice(
+        running_count_by_thresh
+    )
+    return tp, tp_plus_fp
+class MaxMacroPrecisionOrdinalThresholding(OptimalCostPerClassOrdinalThresholding):
+    """Threshold to maximize macro-averaged precision."""
+    direction = "max"
+    def cost_matrix(
+        self,
+        c_idx: int,
+        scores: torch.Tensor,
+        labels: torch.Tensor,
+        available_thresholds: torch.Tensor,
+        start: bool,
+        end: bool,
+    ) -> torch.Tensor:
+        tp, tp_plus_fp = _compute_metrics_matrices(
+            scores, torch.eq(labels, c_idx), available_thresholds, start=start, end=end
+        )
+        safe_tp_plus_fp = torch.maximum(
+            tp_plus_fp, torch.ones(1, device=tp_plus_fp.device)
+        )
+        return torch.where(torch.ge(tp_plus_fp, 0.0), tp / safe_tp_plus_fp, -torch.inf)
+class MaxMacroF1OrdinalThresholding(OptimalCostPerClassOrdinalThresholding):
+    """Threshold to maximize macro-averaged F1 score."""
+    direction = "max"
+    def cost_matrix(
+        self,
+        c_idx: int,
+        scores: torch.Tensor,
+        labels: torch.Tensor,
+        available_thresholds: torch.Tensor,
+        start: bool,
+        end: bool,
+    ) -> torch.Tensor:
+        tp, tp_plus_fp = _compute_metrics_matrices(
+            scores, torch.eq(labels, c_idx), available_thresholds, start=start, end=end
+        )
+        tp_plus_fn = torch.eq(labels, c_idx).float().sum()  # scalar
+        safe_tp_plus_fp = torch.maximum(
+            tp_plus_fp, torch.ones(1, device=tp_plus_fp.device)
+        )
+        return torch.where(
+            torch.ge(tp_plus_fp, 0.0),
+            2 * tp / (safe_tp_plus_fp + tp_plus_fn),
+            -torch.inf,
+        )