loki2.mil.models.src.utils

Utility functions for MIL model evaluation and visualization.

This module provides functions for calculating metrics, evaluating models, generating attention heatmaps, plotting training curves, and computing cross-validation results.

Module Contents

loki2.mil.models.src.utils.calculate_metrics(y_prob: torch.Tensor, labels: torch.Tensor, criterion: torch.nn.Module) Dict[str, float | int | numpy.ndarray]

Calculate unified metrics for model evaluation.

Computes loss, accuracy, AUROC, and predictions from model outputs and ground truth labels. Handles various tensor shapes automatically.

Parameters:
  • y_prob – Model output probabilities of shape (batch_size, 1) or (batch_size,).

  • labels – Ground truth labels of shape (batch_size, 1) or (batch_size,).

  • criterion – Loss function (e.g., nn.BCELoss).

Returns:

  • loss: Computed loss value (float).

  • accuracy: Accuracy score (float).

  • auroc: AUROC score (float, 0.5 if only one class present).

  • correct: Number of correct predictions (int).

  • total: Total number of samples (int).

  • predictions: Binary predictions as NumPy array.

  • probabilities: Probability values as NumPy array.

Return type:

Dict containing

loki2.mil.models.src.utils.evaluate_model(model: torch.nn.Module, data_loader: torch.utils.data.DataLoader, device: torch.device, description: str = 'Evaluation') Dict[str, float | int | List[str] | List[int] | List[float]]

Evaluate model on a dataset.

Computes metrics including loss, accuracy, AUROC, and collects predictions for all samples. Handles both logits and probability outputs automatically.

Parameters:
  • model – PyTorch model to evaluate.

  • data_loader – DataLoader for the evaluation dataset.

  • device – PyTorch device (CPU or CUDA).

  • description – Description string for logging. Defaults to “Evaluation”.

Returns:

  • patient_ids: List of patient ID strings.

  • true_labels: List of true labels (integers).

  • predicted_probs: List of predicted probabilities (floats).

  • predicted_labels: List of predicted labels (integers).

  • auroc: AUROC score (float).

  • accuracy: Accuracy score (float).

  • loss: Average loss (float).

  • correct: Number of correct predictions (int).

  • total: Total number of samples (int).

Return type:

Dict containing

loki2.mil.models.src.utils.generate_heatmaps(attention_data: pandas.DataFrame, pdf_filename: str | pathlib.Path, title_prefix: str, quantile_20: float, quantile_80: float, point_size: float) None

Generate attention heatmap PDF for all patients.

Creates scatter plots showing attention weights overlaid on cell positions for each patient, saved as a multi-page PDF.

Parameters:
  • attention_data – DataFrame with columns: Patient_ID, X, Y, log10_Attention.

  • pdf_filename – Path to save the PDF file.

  • title_prefix – Prefix for plot titles.

  • quantile_20 – 20th percentile for normalization (lower bound).

  • quantile_80 – 80th percentile for normalization (upper bound).

  • point_size – Size of scatter plot points.

loki2.mil.models.src.utils.generate_epoch_attention_analysis(model: torch.nn.Module, data_loader: torch.utils.data.DataLoader, train_patients: numpy.ndarray, model_folder: str | pathlib.Path, epoch_num: int, point_size: float, embedding_dim: int, M_dim: int, L_dim: int, attention_branches: int, dropout_rate: float) None

Generate attention analysis for a specific epoch.

Extracts attention weights from the model, processes them, and generates heatmap visualizations for train and test sets.

Parameters:
  • model – Trained PyTorch MIL model.

  • data_loader – DataLoader for the full dataset.

  • train_patients – Array of training patient IDs.

  • model_folder – Directory to save attention analysis files.

  • epoch_num – Epoch number for file naming.

  • point_size – Size of scatter plot points.

  • embedding_dim – Embedding dimension (unused, kept for compatibility).

  • M_dim – M dimension (unused, kept for compatibility).

  • L_dim – L dimension (unused, kept for compatibility).

  • attention_branches – Number of attention branches (unused, kept for compatibility).

  • dropout_rate – Dropout rate (unused, kept for compatibility).

loki2.mil.models.src.utils.downsample_data(dataframe: pandas.DataFrame, negative_ratio: float = 1.0, positive_negative_ratio: float = 1.0, max_patients: int | None = None, embedding_ratio: float = 1.0, random_seed: int = 27) pandas.DataFrame

Downsample data by patient and embedding counts.

Reduces the dataset size by sampling patients and their embeddings according to specified ratios, while maintaining class balance.

Parameters:
  • dataframe – Input DataFrame with columns: Patient_ID, Patient_Label, and embedding features.

  • negative_ratio – Ratio of negative patients to keep. Defaults to 1.0.

  • positive_negative_ratio – Target ratio of positive to negative patients. Defaults to 1.0.

  • max_patients – Maximum number of patients to keep. If None, no limit. Defaults to None.

  • embedding_ratio – Ratio of embeddings to keep per patient. Defaults to 1.0.

  • random_seed – Random seed for reproducibility. Defaults to 27.

Returns:

Downsampled DataFrame with the same structure as input.

Return type:

pd.DataFrame

loki2.mil.models.src.utils.compute_and_plot_overall_metrics(all_test_predictions: List[Dict[str, Any]], all_fold_results: List[Dict[str, Any]], patient_ids: numpy.ndarray, args: Any, model_folder: str | pathlib.Path, title_prefix: str | None = None) Dict[str, Any]

Compute overall 5-fold test metrics and plot ROC curve.

Aggregates predictions from all folds, computes overall metrics, generates ROC curve plot, and saves comprehensive cross-validation results.

Parameters:
  • all_test_predictions – List of test prediction dictionaries from each fold, each containing: patient_ids, true_labels, predicted_probs, predicted_labels, fold.

  • all_fold_results – List of fold result dictionaries, each containing: fold, best_epoch, best_val_auroc, final_* metrics.

  • patient_ids – Array of all patient IDs in the dataset.

  • args – Arguments object containing n_folds, seed, etc.

  • model_folder – Directory path to save outputs.

  • title_prefix – Prefix for plot title (e.g., “cancer_type - signature”). Defaults to None.

Returns:

  • cross_validation_settings: Dictionary with CV configuration.

  • fold_results: List of fold result dictionaries.

  • summary: Dictionary with mean and std of validation metrics.

  • 5fold_overall_test_metrics: Dictionary with overall test metrics including AUROC, accuracy, precision, recall, F1, sensitivity, specificity, confusion matrix, and file paths.

Return type:

Dict containing

loki2.mil.models.src.utils.plot_training_curves(training_history: List[Dict[str, Any]], fold_folder: str | pathlib.Path, fold_num: int) None

Plot training curves for a single fold.

Generates three sets of plots: AUROC comparison, accuracy comparison, and loss comparison, each with individual and combined views.

Parameters:
  • training_history – List of dictionaries, each containing epoch metrics: epoch, train_loss, train_accuracy, train_auroc, val_loss, val_accuracy, val_auroc, test_loss (optional), test_accuracy (optional), test_auroc (optional).

  • fold_folder – Directory to save the plot files.

  • fold_num – Fold number for plot titles.