loki2.cl.model_cl
Minimal projection-only alignment module.
Module Contents
- class loki2.cl.model_cl.ProjectionCL(embed_dim: int = 512, modality_dims: Tuple[int, int] = (1280, 768), *, bias: bool = False, logit_scale_init: float = 1.0, num_layers: int = 1, hidden_dim: int | None = None, dropout: float = 0.0, max_logit_scale: float = 10.0, min_logit_scale: float | None = None)
Bases:
torch.nn.ModuleCLIP-style symmetric contrastive projector for paired embeddings.
- Parameters:
embed_dim – Shared embedding dimensionality after projection. Defaults to 512.
modality_dims – Tuple containing input dimensions for the two modalities. Defaults to (1280, 768).
bias – Whether to enable bias terms in the projection layers. Defaults to False.
logit_scale_init – Initial value (not log-space) of the logit scale multiplier
s. Defaults to 1.0.num_layers – Number of layers in each projection head. Defaults to 1.
hidden_dim – Hidden dimension for intermediate projection layers (defaults to embed dim). Defaults to None.
dropout – Dropout rate for hidden layers. Defaults to 0.0.
max_logit_scale – Upper bound for
s(enforces a minimum temperature1 / s). Defaults to 10.0.min_logit_scale – Optional lower bound for
s. Defaults to None.
- Raises:
ValueError – If modality_dims does not contain exactly two dimensions, num_layers is less than 1, dropout is not in [0, 1], logit_scale_init is not positive, max_logit_scale is not positive, or min_logit_scale is not in (0, max_logit_scale].
RuntimeError – If a projection head does not contain at least one linear layer.
- proj_a
- proj_b
- init_value
- logit_scale
- encode_a(features: torch.Tensor) torch.Tensor
Encode features from modality A through its projection head.
- Parameters:
features – Input features for modality A.
- Returns:
Projected features for modality A.
- Return type:
torch.Tensor
- encode_b(features: torch.Tensor) torch.Tensor
Encode features from modality B through its projection head.
- Parameters:
features – Input features for modality B.
- Returns:
Projected features for modality B.
- Return type:
torch.Tensor
- forward(features_a: torch.Tensor, features_b: torch.Tensor, *, return_loss: bool = False) Tuple[torch.Tensor, torch.Tensor] | Tuple[torch.Tensor, torch.Tensor, torch.Tensor]
Project both modalities, compute scaled cosine similarities, and optionally return the CLIP-style contrastive loss.
Let
f_iandg_idenote the raw embeddings for sampleifrom modalitiesaandb. After the projection heads and L2 normalisation we obtain unit vectors\hat f_iand\hat g_i. The learnable logit-scale parameters = \exp(\text{logit\_scale})plays the role of the inverse temperature1/\tau. The logits matrix we feed to the cross-entropy loss is\[L_{ij} = \min(s, s_{\text{max}})\; \hat f_i^\top \hat g_j,\]where
s_{max}is the configured maximum scale. Whenreturn_lossisTruewe minimise the symmetric InfoNCE objective\[\mathcal{L} = \tfrac{1}{2} \bigl[ \operatorname{CE}(L, I) + \operatorname{CE}(L^\top, I) \bigr],\]where
CEis the cross-entropy andIindexes the matching pairs along the diagonal.- Parameters:
features_a – Input features for modality A.
features_b – Input features for modality B.
return_loss – If True, also return the contrastive loss. Defaults to False.
- Returns:
Tuple[torch.Tensor, torch.Tensor]: Logits for A-B and B-A similarity if return_loss=False.
Tuple[torch.Tensor, torch.Tensor, torch.Tensor]: Logits and loss if return_loss=True.
- Return type:
Union[Tuple[torch.Tensor, torch.Tensor], Tuple[torch.Tensor, torch.Tensor, torch.Tensor]]
- Raises:
ValueError – If return_loss=True and batch sizes for both modalities are not equal.
- current_logit_scale() float
Return the effective logit scale as a Python float for logging.