Iter.fromArray / BloomFilter

Hello everyone!

I was looking at this tutorial concerning the implementation of a BloomFilter :

I don’t understand why bitMap[digest] is set to True inside the loop for.

It was explained that one index in bitMap was put to 1 after applying multiple hash functions and not that multiple bitMap indexs were found using different hash functions…
Maybe I don’t fully understand what Iter.fromArray does; but I don’t understand when I look at the explanation for the Iter module in the Motoko library.

Really appreciate if someone could explain that to me.

2 Likes

hey @Seb , welcome to the forum and grear first question!

Usually for a bloom filter you use multiple hash functions - which is also the case here. You take your item and hash it with each hash function f inside the hashFuncs array. The computed hash digest is used as the index for our bitMap to set the corresponding value to 1, indicating that we’ve seen this item.
This also gives a nice explanation of the concept:
https://llimllib.github.io/bloomfilter-tutorial/

I hope it helps, if not please don’t hesitate to ask further questions :slight_smile:

3 Likes

Hey! Thank you for your answer :slight_smile:

I totally get the concept of the bloom filter but I still don’t understand the way the code is structured.
The instruction bitMap[digest] := true should be put after the loop FOR and not inside.

Hey Seb, sorry for my misunderstanding! Why should it be outside? Are you referring to this?

1 Like

The formulation here might be a bit unclear but I believe it is referring to the behaviour of the code that your screenshot shows.

The filter takes in the value that’s being entered into the data structure, hashes it to multiple indices (ranging from 0 to the length - 1 of the bitmap) using several different hash functions, and stores a 1 at that particular index.

1 Like

Okay I get it this time !

My mistake was to think that the hash function number 2 was applied to the result of the hash function number 1…
Don’t know why I thought that way because it would present no interest…
We use multiples hash functions to decrease the risk of collusion, it makes sense now :slightly_smiling_face:

Really appreciated your help :wink:

1 Like