loki2.models.base.vision_transformer
Vision Transformer implementation for Loki2.
This module provides a Vision Transformer (ViT) backbone architecture based on the implementation from CellViT++.
Vision Transformer, based on https://people.sc.fsu.edu/~jburkardt/presentations/truncated_normal.pdf
Module Contents
- loki2.models.base.vision_transformer.trunc_normal_(tensor: torch.Tensor, mean: float = 0.0, std: float = 1.0, a: float = -2.0, b: float = 2.0) torch.Tensor
Initialize tensor with truncated normal distribution.
- Parameters:
tensor – Tensor to initialize.
mean – Mean of the normal distribution. Defaults to 0.0.
std – Standard deviation of the normal distribution. Defaults to 1.0.
a – Lower bound of truncation. Defaults to -2.0.
b – Upper bound of truncation. Defaults to 2.0.
- Returns:
The initialized tensor (modified in-place).
- Return type:
torch.Tensor
- loki2.models.base.vision_transformer.drop_path(x: torch.Tensor, drop_prob: float = 0.0, training: bool = False) torch.Tensor
Drop paths (Stochastic Depth) per sample.
- Parameters:
x – Input tensor.
drop_prob – Drop probability. Defaults to 0.0.
training – Whether in training mode. Defaults to False.
- Returns:
Output tensor with dropped paths applied.
- Return type:
torch.Tensor
- class loki2.models.base.vision_transformer.PatchEmbed(img_size: int = 224, patch_size: int = 16, in_chans: int = 3, embed_dim: int = 768)
Bases:
torch.nn.ModuleImage to Patch Embedding (without positional embedding).
Converts input images into patch embeddings using a convolutional layer.
- Parameters:
img_size – Input image size. Defaults to 224.
patch_size – Patch token size (one dimension only, tokens are squared). Defaults to 16.
in_chans – Number of input channels. Defaults to 3.
embed_dim – Embedding dimension. Defaults to 768.
- num_patches
- img_size
- patch_size
- proj
- forward(x)
- class loki2.models.base.vision_transformer.Attention(dim: int, num_heads: int = 8, qkv_bias: bool = False, qk_scale: float = None, attn_drop: float = 0.0, proj_drop: float = 0.0)
Bases:
torch.nn.ModuleAttention Module (Multi-Head Attention, MHA)
- Parameters:
dim (int) – Embedding dimension
num_heads (int, optional) – Number of attention heads. Defaults to 8.
qkv_bias (bool, optional) – If bias should be used for query (q), key (k), and value (v). Defaults to False.
qk_scale (float, optional) – Scaling parameter. Defaults to None.
attn_drop (float, optional) – Dropout for attention layer. Defaults to 0.0.
proj_drop (float, optional) – Dropout for projection layers. Defaults to 0.0.
- num_heads
- head_dim
- scale
- qkv
- attn_drop
- proj
- proj_drop
- forward(x)
- class loki2.models.base.vision_transformer.Mlp(in_features: int, hidden_features: int = None, out_features: int = None, act_layer: Callable = nn.GELU, drop: float = 0.0)
Bases:
torch.nn.ModuleBase class for all neural network modules.
Your models should also subclass this class.
Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes:
import torch.nn as nn import torch.nn.functional as F class Model(nn.Module): def __init__(self): super().__init__() self.conv1 = nn.Conv2d(1, 20, 5) self.conv2 = nn.Conv2d(20, 20, 5) def forward(self, x): x = F.relu(self.conv1(x)) return F.relu(self.conv2(x))
Submodules assigned in this way will be registered, and will have their parameters converted too when you call
to(), etc.Note
As per the example above, an
__init__()call to the parent class must be made before assignment on the child.- Variables:
training (bool) – Boolean represents whether this module is in training or evaluation mode.
- out_features
- fc1
- act
- fc2
- drop
- forward(x)
- class loki2.models.base.vision_transformer.DropPath(drop_prob=None)
Bases:
torch.nn.ModuleDrop paths (Stochastic Depth) per sample (when applied in main path of residual blocks).
- drop_prob
- forward(x)
- class loki2.models.base.vision_transformer.Block(dim: int, num_heads: int, mlp_ratio: float = 4.0, qkv_bias: bool = False, qk_scale: float = None, drop: float = 0.0, attn_drop: float = 0.0, drop_path: float = 0.0, act_layer: Callable = nn.GELU, norm_layer: Callable = nn.LayerNorm)
Bases:
torch.nn.ModuleBase class for all neural network modules.
Your models should also subclass this class.
Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes:
import torch.nn as nn import torch.nn.functional as F class Model(nn.Module): def __init__(self): super().__init__() self.conv1 = nn.Conv2d(1, 20, 5) self.conv2 = nn.Conv2d(20, 20, 5) def forward(self, x): x = F.relu(self.conv1(x)) return F.relu(self.conv2(x))
Submodules assigned in this way will be registered, and will have their parameters converted too when you call
to(), etc.Note
As per the example above, an
__init__()call to the parent class must be made before assignment on the child.- Variables:
training (bool) – Boolean represents whether this module is in training or evaluation mode.
- norm1
- attn
- drop_path
- norm2
- mlp
- forward(x, return_attention=False)
- class loki2.models.base.vision_transformer.VisionTransformer(img_size: List[int] = [224], patch_size: int = 16, in_chans: int = 3, num_classes: int = 0, embed_dim: int = 768, depth: int = 12, num_heads: int = 12, mlp_ratio: float = 4.0, qkv_bias: bool = False, qk_scale: float = None, drop_rate: float = 0.0, attn_drop_rate: float = 0.0, drop_path_rate: float = 0.0, norm_layer: Callable = nn.LayerNorm, **kwargs)
Bases:
torch.nn.ModuleVision Transformer
- patch_embed
- num_patches
- cls_token
- pos_embed
- pos_drop
- dpr
- blocks
- norm
- head
- interpolate_pos_encoding(x, w, h)
- prepare_tokens(x)
- forward(x: torch.Tensor) Tuple[torch.Tensor, torch.Tensor]
Forward pass
- Parameters:
x (torch.Tensor) – Input batch
- Returns:
Class token (raw)
- Return type:
Tuple[torch.Tensor]
- get_last_selfattention(x)
- get_intermediate_layers(x, n=1)