Warp Execution

Visualize how GPU warps of 32 threads execute instructions in lockstep (SIMT), handle branch divergence with active masks, and get scheduled by the warp scheduler.

Warp 0 Lanes (32 threads)32/32 active

Active Mask:11111111111111111111111111111111

Active

Masked

Idle

Instruction Stream (Warp 0)

LOAD x[tid]32/32

1MUL x, 2.032/32

2ADD x, bias32/32

3STORE y[tid]32/32

4LOAD z[tid]32/32

5ADD y, z32/32

6STORE out[tid]32/32

Execution TimelineCycle: 0

Warp 0

Stall

Warp Scheduler

Warp 0

active

0/7 instrs0 switches

SM Occupancy

Warps Loaded1/4

Occupancy: 25%(low - stalls cannot be hidden)

1.0x

Active Lanes32/32

Lane Util100%

Avg Util0.0%

Divergences0

Warp Switches0

Instrs Exec0

Key Concepts

SIMT Execution

32 threads in a warp execute in lockstep
All active lanes run the same instruction
Maximum efficiency when all 32 lanes are active

Branch Divergence

If/else causes some lanes to be masked off
Both paths executed serially, not in parallel
Reconverge after the branch completes

Warp Scheduling

SM schedules warps to hide memory latency
Stalled warps yield to eligible ones
Higher occupancy = better latency hiding

No Divergence

All 32 lanes execute the same path. Maximum SIMT efficiency with full lane utilization.