13.1
Data Engineering

Bloom Filters

A space-efficient probabilistic data structure that tells you either "definitely not in set" or "probably in set" using multiple hash functions and a bit array

Presets
1.0x
Elements
0
Bits Set
0/32
Fill Ratio
0.0%
Hash Functions
3
Theoretical FPR
0.0%
Observed FPR
N/A
False Positives
0
Bit Array(32 bits)
Size:32
Hashes:
00
01
02
03
04
05
06
07
08
09
010
011
012
013
014
015
016
017
018
019
020
021
022
023
024
025
026
027
028
029
030
031
Saturation
0.0%
Hash Functions
Insert or query an element to see hash computations
Insert Element
Query Element
Inserted Elements
0 items
No elements inserted yet
Query Results
0 queries
No queries yet. Insert elements first, then query!
How Bloom Filters Work

A Bloom filter is a probabilistic data structure that tests set membership using k hash functions and an m-bit array.

Operations:

Insert: Hash element with k functions, set k bits to 1
Query: Hash element, check k bits. All 1? Probably yes. Any 0? Definitely no.
False Positives
Different elements can set overlapping bits. When all k positions for a non-member happen to be set by other elements, we get a false positive. The rate depends on m (bits), n (elements), and k (hash functions).
Optimal k
k = (m/n) * ln(2)
Too few hash functions means more collisions. Too many means the array fills up faster.
No False Negatives
If it says "no", it is 100% certain
No Deletion
Cannot unset bits (use counting BF)

Common uses: Spell checkers, database query optimization, network routers, cache filtering, cryptocurrency wallets (BIP37).