2.10CPU Architecture

SIMD / Vector Processing Visualizer

Compare scalar (one element at a time) vs SIMD (multiple elements in one instruction). See the throughput advantage of SSE, AVX, and AVX-512 vector widths.

Operation

Mode

SIMD Width

Speed1x

Mode

SIMD x4

Cycles Used

Elements Done

0/16

Speedup

SIMD Processing (4-wide)

[0]

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

ADD

Throughput Comparison

Scalar

16 cycles

SSE (4)

4 cycles (4x speedup)

AVX (8)

2 cycles (8x speedup)

AVX-512 (16)

1 cycles (16x speedup)

Scalar: Processes one element per clock cycle. Simple but slow for data-parallel operations.

SIMD: Single Instruction, Multiple Data — processes 4/8/16 elements simultaneously using wide vector registers.

Real-world: SSE (128-bit, 4 floats), AVX2 (256-bit, 8 floats), AVX-512 (512-bit, 16 floats). Used in image processing, scientific computing, and ML inference.