loki2.models.base.vision_transformer

Vision Transformer implementation for Loki2.

This module provides a Vision Transformer (ViT) backbone architecture based on the implementation from CellViT++.

Vision Transformer, based on https://people.sc.fsu.edu/~jburkardt/presentations/truncated_normal.pdf

Module Contents

loki2.models.base.vision_transformer.trunc_normal_(tensor: torch.Tensor, mean: float = 0.0, std: float = 1.0, a: float = -2.0, b: float = 2.0) torch.Tensor

Initialize tensor with truncated normal distribution.

Parameters:
  • tensor – Tensor to initialize.

  • mean – Mean of the normal distribution. Defaults to 0.0.

  • std – Standard deviation of the normal distribution. Defaults to 1.0.

  • a – Lower bound of truncation. Defaults to -2.0.

  • b – Upper bound of truncation. Defaults to 2.0.

Returns:

The initialized tensor (modified in-place).

Return type:

torch.Tensor

loki2.models.base.vision_transformer.drop_path(x: torch.Tensor, drop_prob: float = 0.0, training: bool = False) torch.Tensor

Drop paths (Stochastic Depth) per sample.

Parameters:
  • x – Input tensor.

  • drop_prob – Drop probability. Defaults to 0.0.

  • training – Whether in training mode. Defaults to False.

Returns:

Output tensor with dropped paths applied.

Return type:

torch.Tensor

class loki2.models.base.vision_transformer.PatchEmbed(img_size: int = 224, patch_size: int = 16, in_chans: int = 3, embed_dim: int = 768)

Bases: torch.nn.Module

Image to Patch Embedding (without positional embedding).

Converts input images into patch embeddings using a convolutional layer.

Parameters:
  • img_size – Input image size. Defaults to 224.

  • patch_size – Patch token size (one dimension only, tokens are squared). Defaults to 16.

  • in_chans – Number of input channels. Defaults to 3.

  • embed_dim – Embedding dimension. Defaults to 768.

num_patches
img_size
patch_size
proj
forward(x)
class loki2.models.base.vision_transformer.Attention(dim: int, num_heads: int = 8, qkv_bias: bool = False, qk_scale: float = None, attn_drop: float = 0.0, proj_drop: float = 0.0)

Bases: torch.nn.Module

Attention Module (Multi-Head Attention, MHA)

Parameters:
  • dim (int) – Embedding dimension

  • num_heads (int, optional) – Number of attention heads. Defaults to 8.

  • qkv_bias (bool, optional) – If bias should be used for query (q), key (k), and value (v). Defaults to False.

  • qk_scale (float, optional) – Scaling parameter. Defaults to None.

  • attn_drop (float, optional) – Dropout for attention layer. Defaults to 0.0.

  • proj_drop (float, optional) – Dropout for projection layers. Defaults to 0.0.

num_heads
head_dim
scale
qkv
attn_drop
proj
proj_drop
forward(x)
class loki2.models.base.vision_transformer.Mlp(in_features: int, hidden_features: int = None, out_features: int = None, act_layer: Callable = nn.GELU, drop: float = 0.0)

Bases: torch.nn.Module

Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes:

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will have their parameters converted too when you call to(), etc.

Note

As per the example above, an __init__() call to the parent class must be made before assignment on the child.

Variables:

training (bool) – Boolean represents whether this module is in training or evaluation mode.

out_features
hidden_features
fc1
act
fc2
drop
forward(x)
class loki2.models.base.vision_transformer.DropPath(drop_prob=None)

Bases: torch.nn.Module

Drop paths (Stochastic Depth) per sample (when applied in main path of residual blocks).

drop_prob
forward(x)
class loki2.models.base.vision_transformer.Block(dim: int, num_heads: int, mlp_ratio: float = 4.0, qkv_bias: bool = False, qk_scale: float = None, drop: float = 0.0, attn_drop: float = 0.0, drop_path: float = 0.0, act_layer: Callable = nn.GELU, norm_layer: Callable = nn.LayerNorm)

Bases: torch.nn.Module

Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes:

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will have their parameters converted too when you call to(), etc.

Note

As per the example above, an __init__() call to the parent class must be made before assignment on the child.

Variables:

training (bool) – Boolean represents whether this module is in training or evaluation mode.

norm1
attn
drop_path
norm2
mlp_hidden_dim
mlp
forward(x, return_attention=False)
class loki2.models.base.vision_transformer.VisionTransformer(img_size: List[int] = [224], patch_size: int = 16, in_chans: int = 3, num_classes: int = 0, embed_dim: int = 768, depth: int = 12, num_heads: int = 12, mlp_ratio: float = 4.0, qkv_bias: bool = False, qk_scale: float = None, drop_rate: float = 0.0, attn_drop_rate: float = 0.0, drop_path_rate: float = 0.0, norm_layer: Callable = nn.LayerNorm, **kwargs)

Bases: torch.nn.Module

Vision Transformer

patch_embed
num_patches
cls_token
pos_embed
pos_drop
dpr
blocks
norm
head
interpolate_pos_encoding(x, w, h)
prepare_tokens(x)
forward(x: torch.Tensor) Tuple[torch.Tensor, torch.Tensor]

Forward pass

Parameters:

x (torch.Tensor) – Input batch

Returns:

Class token (raw)

Return type:

Tuple[torch.Tensor]

get_last_selfattention(x)
get_intermediate_layers(x, n=1)