Data Engineering

Bloom Filters

A space-efficient probabilistic data structure that tells you either "definitely not in set" or "probably in set" using multiple hash functions and a bit array

Presets

1.0x

Elements

Bits Set

0/32

Fill Ratio

0.0%

Hash Functions

Theoretical FPR

0.0%

Observed FPR

N/A

False Positives

Bit Array(32 bits)

Size:32

Hashes:

010

011

012

013

014

015

016

017

018

019

020

021

022

023

024

025

026

027

028

029

030

031

Saturation

0.0%

Hash Functions

Insert or query an element to see hash computations

Insert Element

Query Element

Inserted Elements

0 items

No elements inserted yet

Query Results

0 queries

No queries yet. Insert elements first, then query!

How Bloom Filters Work

A Bloom filter is a probabilistic data structure that tests set membership using k hash functions and an m-bit array.

Operations:

Insert: Hash element with k functions, set k bits to 1

Query: Hash element, check k bits. All 1? Probably yes. Any 0? Definitely no.

False Positives

Different elements can set overlapping bits. When all k positions for a non-member happen to be set by other elements, we get a false positive. The rate depends on m (bits), n (elements), and k (hash functions).

Optimal k

k = (m/n) * ln(2)

Too few hash functions means more collisions. Too many means the array fills up faster.

No False Negatives

If it says "no", it is 100% certain

No Deletion

Cannot unset bits (use counting BF)

Common uses: Spell checkers, database query optimization, network routers, cache filtering, cryptocurrency wallets (BIP37).