loki2.cl.model_cl
=================

.. py:module:: loki2.cl.model_cl

.. autoapi-nested-parse::

   Minimal projection-only alignment module.


Module Contents
---------------

.. py:class:: ProjectionCL(embed_dim: int = 512, modality_dims: Tuple[int, int] = (1280, 768), *, bias: bool = False, logit_scale_init: float = 1.0, num_layers: int = 1, hidden_dim: Optional[int] = None, dropout: float = 0.0, max_logit_scale: float = 10.0, min_logit_scale: Optional[float] = None)

   Bases: :py:obj:`torch.nn.Module`


   CLIP-style symmetric contrastive projector for paired embeddings.

   :param embed_dim: Shared embedding dimensionality after projection. Defaults to 512.
   :param modality_dims: Tuple containing input dimensions for the two modalities.
                         Defaults to (1280, 768).
   :param bias: Whether to enable bias terms in the projection layers. Defaults to False.
   :param logit_scale_init: Initial value (not log-space) of the logit scale multiplier ``s``.
                            Defaults to 1.0.
   :param num_layers: Number of layers in each projection head. Defaults to 1.
   :param hidden_dim: Hidden dimension for intermediate projection layers (defaults to embed dim).
                      Defaults to None.
   :param dropout: Dropout rate for hidden layers. Defaults to 0.0.
   :param max_logit_scale: Upper bound for ``s`` (enforces a minimum temperature ``1 / s``).
                           Defaults to 10.0.
   :param min_logit_scale: Optional lower bound for ``s``. Defaults to None.

   :raises ValueError: If modality_dims does not contain exactly two dimensions, num_layers is less
       than 1, dropout is not in [0, 1], logit_scale_init is not positive, max_logit_scale
       is not positive, or min_logit_scale is not in (0, max_logit_scale].
   :raises RuntimeError: If a projection head does not contain at least one linear layer.


   .. py:attribute:: hidden_dim


   .. py:attribute:: proj_a


   .. py:attribute:: proj_b


   .. py:attribute:: init_value


   .. py:attribute:: logit_scale


   .. py:method:: encode_a(features: torch.Tensor) -> torch.Tensor

      Encode features from modality A through its projection head.

      :param features: Input features for modality A.

      :returns: Projected features for modality A.
      :rtype: torch.Tensor


   .. py:method:: encode_b(features: torch.Tensor) -> torch.Tensor

      Encode features from modality B through its projection head.

      :param features: Input features for modality B.

      :returns: Projected features for modality B.
      :rtype: torch.Tensor


   .. py:method:: forward(features_a: torch.Tensor, features_b: torch.Tensor, *, return_loss: bool = False) -> Union[Tuple[torch.Tensor, torch.Tensor], Tuple[torch.Tensor, torch.Tensor, torch.Tensor]]

      Project both modalities, compute scaled cosine similarities, and optionally return the CLIP-style contrastive loss.

      Let ``f_i`` and ``g_i`` denote the raw embeddings for sample ``i`` from modalities ``a`` and ``b``.
      After the projection heads and L2 normalisation we obtain unit vectors ``\hat f_i`` and
      ``\hat g_i``.  The learnable logit-scale parameter ``s = \exp(\text{logit\_scale})`` plays the
      role of the inverse temperature ``1/\tau``.  The logits matrix we feed to the cross-entropy loss is

      .. math::

          L_{ij} = \min(s, s_{\text{max}})\; \hat f_i^\top \hat g_j,

      where ``s_{max}`` is the configured maximum scale.  When ``return_loss`` is ``True`` we minimise the
      symmetric InfoNCE objective

      .. math::

          \mathcal{L} = \tfrac{1}{2} \bigl[ \operatorname{CE}(L, I) + \operatorname{CE}(L^\top, I) \bigr],

      where ``CE`` is the cross-entropy and ``I`` indexes the matching pairs along the diagonal.

      :param features_a: Input features for modality A.
      :param features_b: Input features for modality B.
      :param return_loss: If True, also return the contrastive loss. Defaults to False.

      :returns:     - Tuple[torch.Tensor, torch.Tensor]: Logits for A-B and B-A similarity if return_loss=False.
                    - Tuple[torch.Tensor, torch.Tensor, torch.Tensor]: Logits and loss if return_loss=True.
      :rtype: Union[Tuple[torch.Tensor, torch.Tensor], Tuple[torch.Tensor, torch.Tensor, torch.Tensor]]

      :raises ValueError: If return_loss=True and batch sizes for both modalities are not equal.


   .. py:method:: current_logit_scale() -> float

      Return the effective logit scale as a Python float for logging.