I’ve had a few interesting conversations around text indexing recently with friends and something that’s fallen out is that there are some really cool data structures you build that are extensions of Bloom filters. The techniques for constructing them don’t seem to be that widely known so this blog post intends to remedy that.

We will start by examining a variant of the traditional Bloom filter and then we’ll look at ways we can generalise the model to produce new types of probabilistic data structures and understand what properties they might have.