SHASHWAT // SYSTEM ARCHIVE

About Blogs Projects

ACTIVE

∞

SYSTEM.ARCHIVE

Transformers

transformers(3)
llms(2)
inference-optimization(2)
hardware(2)
multi-agent-systems(1)
orchestration(1)
distributed-ai(1)
agent-swarms(1)
agents(1)
autonomous-systems(1)
career(1)
architecture(1)
kv-cache(1)
quantization(1)
model-optimization(1)
inference(1)

Tags & Categories

transformers (3)llms (2)inference-optimization (2)hardware (2)multi-agent-systems (1)orchestration (1)distributed-ai (1)agent-swarms (1)agents (1)autonomous-systems (1)

May 15, 2026
Transformers
Inference-Optimization
Hardware
LLMs
Flash Attention 4 Explained: From Quadratic to 1,605 TFLOPs/s
Read Entry
May 10, 2026
Inference-Optimization
KV-Cache
Transformers
Hardware
KV Cache Optimization: Why TurboQuant Changes the Game
Read Entry
May 5, 2026
Quantization
Transformers
Model-Optimization
Inference
Quantization for Transformers: From Full INT8 to Selective Head Quantization
Read Entry

Navigation

About↗Blogs↗Projects↗ML Models↗

( Let's Connect )

Book a call with Shashwat

Schedule a Meeting↗Currently booking for Q3 2026

( Newsletter )

Technical deep-dives on AI architecture, delivered monthly.

( Details )

GitHub↗X↗LinkedIn↗

↳ shashwork19@gmail.comBased in Delhi, India.

Delhi (IST, GMT +05:30)

© 2026 Shashwat Sharma

Shashwat「 I do what I do best. I build systems. 」