12.1
Compiler & Language Internals

Lexical Analysis

Watch a lexer scan source code character by character, recognizing tokens through a deterministic finite automaton

Presets
1.0x
Progress
0%
Chars Scanned
0/15
Tokens Found
0
Errors
0
State
START
Source Code Scanner
pos: 0/15
let x = 42 + y;
Token Stream
0 tokens
Press Play or Step to start scanning...
Token Table
#TypeValuePosition
No tokens yet
Scanner State
State
START
Char
'l'
@0
Buffer
empty
Scanner DFA
[a-z][0-9]"/'op[a-z0-9][0-9.]any"/'STARTIN_IDIN_NUMIN_STRSTR_ENDIN_OP
Token Types
KEYWORD
IDENTIFIER
NUMBER
OPERATOR
PUNCTUATION
STRING
ERROR
How It Works

A lexer (lexical analyzer) is the first phase of a compiler. It reads raw source code as a stream of characters and groups them into meaningful sequences called tokens.

Scanner States:

  • START - Waiting for next token
  • IN_IDENTIFIER - Reading a name/keyword
  • IN_NUMBER - Reading a numeric literal
  • IN_STRING - Reading a string literal
  • IN_OPERATOR - Reading operator chars
  • DONE - All characters processed

The scanner uses a Deterministic Finite Automaton (DFA) to decide state transitions. Each character determines the next state. When a token boundary is found, the buffered characters are emitted as a token.

Maximal munch: The lexer always reads as many characters as possible before emitting a token (e.g., "===" is one token, not three).