Fast and concise probabilistic filters in Python – Daniel Lemire's blog
Sometimes you need to filter out or filter in data quickly. Suppose that your employer maintains a list of forbidden passwords or URLs or words. You may store them in a relational database and query them as needed. Unfortunately, this process can be slow and inefficient.
A better approach might be to use a probabilistic filter. A probabilistic filter is a sort of ‘approximate set’. You can ask it whether a key is present in the set, and if it is present, then you will always get ‘true’ (the corr...
Read more at lemire.me