Vision Transformer (ViT) from Scratch in PyTorch

Built a Vision Transformer from scratch in PyTorch.

Key Features:

  • Multi-head self-attention mechanism.
  • Detailed implementation with step by step outputs.
PyTorch Vision Transformers Scratch

2D Positional Encodings for Vision Transformer

Implemented various positional encodings and adapted them to 2D for Vision Transformer.

Implemented Positional Encodings:

  • Learnable
  • Sinusoidal (Absolute)
  • Relative
  • Rotary Position Embedding (RoPe)
  • No Position
PyTorch 2D Positional Encoding Vision Transformer

Large Language Model (LLM) from Scratch in PyTorch

Developed LLMs from scratch in PyTorch with detailed implementation steps and advanced functionalities.

Key Features:

  • Byte-Pair Encoding (BPE) tokenizer
  • Rotational Positional Embeddings (RoPE)
  • SwishGLU activation
  • RMSNorm
  • Mixture of Experts (MoE)
  • Key-Value Cache
  • Temperature, Top-p and Top-k sampling
PyTorch Large Language Model LLM Transformers Scratch

Various Generative Adversarial Networks (GANs)

Implemented several GAN variants from scratch for image generation and image translation with easy-to-understand code.

Implemented Models:

PyTorch Generative Adversarial Networks GANs Image Generation Image Translation

Duplicate Photos and Video Finder

Created a high-speed Python tool to detect and delete duplicate photos/videos in directories recursively.

Features:

  • Fast Pixel-wise comparison for photos
  • Option to keep largest/smallest file among duplicates
  • Useful for cleaning shared media libraries and backups
Python OpenCV NumPy