A Bloom filter is a space-efficient probabilistic data structure used to test whether an element is a member of a set. Unlike traditional data structures, Bloom filters can return false positives (indicating that an element is in the set when it is not), but never false negatives (indicating that an element is not in the set when it is). This makes them useful in scenarios where the cost of false positives is low, and memory efficiency is crucial.
To implement a Bloom filter in Go, follow these steps:
gopackage main
import (
"crypto/sha256"
"encoding/binary"
"hash"
"hash/fnv"
"math"
)
type BloomFilter struct {
bitArray []bool
numHashes int
}
// NewBloomFilter initializes a new Bloom filter with the given size and number of hash functions.
func NewBloomFilter(size int, numHashes int) *BloomFilter {
return &BloomFilter{
bitArray: make([]bool, size),
numHashes: numHashes,
}
}
// hashFunctions generates a list of hash functions.
func hashFunctions() []hash.Hash {
return []hash.Hash{fnv.New64(), sha256.New()}
}
// Add inserts an element into the Bloom filter.
func (bf *BloomFilter) Add(item []byte) {
hashes := hashFunctions()
for i := 0; i < bf.numHashes; i++ {
hash := hashElement(hashes, item, i)
index := hash % uint64(len(bf.bitArray))
bf.bitArray[index] = true
}
}
// Check tests if an element is in the Bloom filter.
func (bf *BloomFilter) Check(item []byte) bool {
hashes := hashFunctions()
for i := 0; i < bf.numHashes; i++ {
hash := hashElement(hashes, item, i)
index := hash % uint64(len(bf.bitArray))
if !bf.bitArray[index] {
return false
}
}
return true
}
// hashElement combines multiple hash functions to create a unique hash.
func hashElement(hashes []hash.Hash, item []byte, i int) uint64 {
hash := hashes[i%len(hashes)]
hash.Reset()
hash.Write(item)
return binary.BigEndian.Uint64(hash.Sum(nil))
}
func main() {
// Create a Bloom filter with a size of 1000 bits and 3 hash functions
bf := NewBloomFilter(1000, 3)
// Add elements to the Bloom filter
bf.Add([]byte("Hello"))
bf.Add([]byte("World"))
// Check for membership
println(bf.Check([]byte("Hello"))) // Output: true
println(bf.Check([]byte("Go"))) // Output: false (likely)
}
BloomFilter Structure:
bitArray
is a slice of booleans representing the bit array.numHashes
is the number of hash functions used.NewBloomFilter:
hashFunctions:
Add:
Check:
hashElement:
The false positive rate (FPR) of a Bloom filter can be approximated using the formula:
Where:
By adjusting and , you can control the trade-off between space efficiency and the likelihood of false positives.
Bloom filters provide a highly efficient means of membership testing with a controllable false positive rate. By understanding their probabilistic nature and implementation, you can effectively utilize Bloom filters in scenarios where memory efficiency and speed are critical. This example in Go demonstrates the fundamental concepts and implementation steps, providing a foundation for further optimization and application-specific adjustments.