xuebaunion@vip.163.com

3551 Trousdale Rkwy, University Park, Los Angeles, CA

留学生论文指导和课程辅导

无忧GPA：https://www.essaygpa.com

工作时间：全年无休-早上8点到凌晨3点

微信客服：xiaoxionga100

微信客服：ITCS521

算法代写-COMS31900

时间：2021-01-25

Exercise Sheet 1: Hashing and Bloom filters

COMS31900 Advanced Algorithms 2019/2020

Please feel free to discuss these problems on the unit discussion board. If you would like to

have your answers marked, please either hand them in in person at the lecture or email them

to me with the email subject ”Problem sheet 1” by the deadline stated.

1 Weakly-universal Hashing

A hash function family H = {h1, h2, . . . } is weakly-universal iff for randomly and uniformly

chosen h ∈ H, we have Pr(h[x] = h[y]) ≤ 1/m for any distinct x, y ∈ U . Consider the following

hash function families. For each one, prove that it is weakly universal or give a counter-example.

1. Let p be a prime number and m be an integer, p ≥ m. Consider the hash function

family where you pick at random a ∈ {1, . . . , p− 1} and then define ha : {0, . . . , p− 1} →

{0, . . . ,m− 1} as ha(x) = (ax mod p) mod m.

2. Let p be a prime and m be an integer such that p ≥ m. Consider the hash function

family where you pick at random b ∈ {0, . . . , p− 1} and then define hb : {0, . . . , p− 1} →

{0, . . . ,m− 1} as hb(x) = ((x + b) mod p) mod m.

3. Let p be a multiple of m. Consider the hash function family where you pick at random

a ∈ {1, . . . ,m − 1} and b ∈ {0, . . . ,m − 1}. Define ha,b : {0, . . . , p − 1} → {0, . . . ,m − 1}

as ha,b(x) = ((ax + b) mod p) mod m).

2 Cuckoo Hashing

1. This question is about cuckoo hashing. Consider a small variant of cuckoo hashing where

we use two tables T1 and T2 of the same size and hash function h1 and h2. When inserting

a new key x, we first try to put x at position h1(x) in T1. If this leads to a collision,

then the previously stored key y is moved to position h2(y) in T2. If this leads to another

collision, then the next key is again inserted at the appropriate position in T1, and so on.

In some cases, this procedure continues forever, i.e. the same configuration appears after

some steps of moving the keys around to dissolve collisions.

(a) Consider two tables of size 5 each and two hash functions h1(k) = k mod 5 and

h2(k) = bk5c mod 5. Insert the keys 27, 2, 32 in this order into initially empty hash

tables, and show the result.

(b) Find another key such that its insertion leads to an infinite sequence of key displace-

ments.

2. In order to use cuckoo hashing under an unbounded number of key insertions, we cannot

have a hash table of fixed size. The size of the hash table has to scale with the number

1

of keys inserted. Suppose that we never delete a key that has been inserted. Consider

the following approach with Cuckoo hashing. When the current hash table fills up to its

capacity, a new hash table of doubled size is created. All keys are then rehashed to the

new table. Argue that the average time it takes to resize and rebuild the hash table, if

spread out over all insertions, is constant in expectation. That is, the expected amortised

cost of rebuilding is constant.

3 Bloom Filters

1. Answer the following three questions about Bloom filters:

(a) What operations do we perform on Bloom filters?

(b) What is the difference between hash tables and Bloom filters in terms of which data

we can access?

(c) Why is there is a problem when deleting elements from a Bloom filter?

2. Suppose you have two Bloom filters A and B (each having the same number of cells and

the same hash functions) representing the two sets A and B. Let C = A&B be the Bloom

filter formed by computing the bitwise Boolean and of A and B.

(a) C may not always be the same as the Bloom filter that would be constructed by

adding the elements of the set (A intersect B) one at a time. Explain why not.

(b) Does C correctly represent the set (A intersect B), in the sense that it gives a positive

answer for membership queries of all elements in this set? Explain why or why not.

(c) Suppose that we want to store a set S of n = 20 elements, drawn from a universe

of U = 10000 possible keys, in a Bloom filter of exactly N = 100 cells, and that we

care only about the accuracy of the Bloom filter and not its speed. For this problem

size, what is the best choice of the number of hash functions (the parameter r in the

lecture)? (That is what value of r gives the smallest possible probability that a key

not in S is a false positive?) What is the probability of a false positive for this choice

of r?

4 Perfect Hashing

This question is about perfect hashing:

1. Our perfect hashing scheme assumed the set of keys stored in the table is static. Suppose

instead that we want to add a few new items to our table after the initial construction.

Suggest a way to modify our intitial construction so that we can insert these new items

using no new space and without making significant changes to our existing table (in

particular, we don’t want to change our initial hash function). Your scheme should still

do lookups of all items in O(1) time, but you may use a bit more initial space.

2. Suppose now that we want to delete some of our initial items. Describe a simple way to

support deletions in our perfect hashing scheme.

COMS31900 Advanced Algorithms 2019/2020

Please feel free to discuss these problems on the unit discussion board. If you would like to

have your answers marked, please either hand them in in person at the lecture or email them

to me with the email subject ”Problem sheet 1” by the deadline stated.

1 Weakly-universal Hashing

A hash function family H = {h1, h2, . . . } is weakly-universal iff for randomly and uniformly

chosen h ∈ H, we have Pr(h[x] = h[y]) ≤ 1/m for any distinct x, y ∈ U . Consider the following

hash function families. For each one, prove that it is weakly universal or give a counter-example.

1. Let p be a prime number and m be an integer, p ≥ m. Consider the hash function

family where you pick at random a ∈ {1, . . . , p− 1} and then define ha : {0, . . . , p− 1} →

{0, . . . ,m− 1} as ha(x) = (ax mod p) mod m.

2. Let p be a prime and m be an integer such that p ≥ m. Consider the hash function

family where you pick at random b ∈ {0, . . . , p− 1} and then define hb : {0, . . . , p− 1} →

{0, . . . ,m− 1} as hb(x) = ((x + b) mod p) mod m.

3. Let p be a multiple of m. Consider the hash function family where you pick at random

a ∈ {1, . . . ,m − 1} and b ∈ {0, . . . ,m − 1}. Define ha,b : {0, . . . , p − 1} → {0, . . . ,m − 1}

as ha,b(x) = ((ax + b) mod p) mod m).

2 Cuckoo Hashing

1. This question is about cuckoo hashing. Consider a small variant of cuckoo hashing where

we use two tables T1 and T2 of the same size and hash function h1 and h2. When inserting

a new key x, we first try to put x at position h1(x) in T1. If this leads to a collision,

then the previously stored key y is moved to position h2(y) in T2. If this leads to another

collision, then the next key is again inserted at the appropriate position in T1, and so on.

In some cases, this procedure continues forever, i.e. the same configuration appears after

some steps of moving the keys around to dissolve collisions.

(a) Consider two tables of size 5 each and two hash functions h1(k) = k mod 5 and

h2(k) = bk5c mod 5. Insert the keys 27, 2, 32 in this order into initially empty hash

tables, and show the result.

(b) Find another key such that its insertion leads to an infinite sequence of key displace-

ments.

2. In order to use cuckoo hashing under an unbounded number of key insertions, we cannot

have a hash table of fixed size. The size of the hash table has to scale with the number

1

of keys inserted. Suppose that we never delete a key that has been inserted. Consider

the following approach with Cuckoo hashing. When the current hash table fills up to its

capacity, a new hash table of doubled size is created. All keys are then rehashed to the

new table. Argue that the average time it takes to resize and rebuild the hash table, if

spread out over all insertions, is constant in expectation. That is, the expected amortised

cost of rebuilding is constant.

3 Bloom Filters

1. Answer the following three questions about Bloom filters:

(a) What operations do we perform on Bloom filters?

(b) What is the difference between hash tables and Bloom filters in terms of which data

we can access?

(c) Why is there is a problem when deleting elements from a Bloom filter?

2. Suppose you have two Bloom filters A and B (each having the same number of cells and

the same hash functions) representing the two sets A and B. Let C = A&B be the Bloom

filter formed by computing the bitwise Boolean and of A and B.

(a) C may not always be the same as the Bloom filter that would be constructed by

adding the elements of the set (A intersect B) one at a time. Explain why not.

(b) Does C correctly represent the set (A intersect B), in the sense that it gives a positive

answer for membership queries of all elements in this set? Explain why or why not.

(c) Suppose that we want to store a set S of n = 20 elements, drawn from a universe

of U = 10000 possible keys, in a Bloom filter of exactly N = 100 cells, and that we

care only about the accuracy of the Bloom filter and not its speed. For this problem

size, what is the best choice of the number of hash functions (the parameter r in the

lecture)? (That is what value of r gives the smallest possible probability that a key

not in S is a false positive?) What is the probability of a false positive for this choice

of r?

4 Perfect Hashing

This question is about perfect hashing:

1. Our perfect hashing scheme assumed the set of keys stored in the table is static. Suppose

instead that we want to add a few new items to our table after the initial construction.

Suggest a way to modify our intitial construction so that we can insert these new items

using no new space and without making significant changes to our existing table (in

particular, we don’t want to change our initial hash function). Your scheme should still

do lookups of all items in O(1) time, but you may use a bit more initial space.

2. Suppose now that we want to delete some of our initial items. Describe a simple way to

support deletions in our perfect hashing scheme.