Hash Tables II: Open Addressing and Linear Probing
In Part I, we explored the basics of hash tables and used chaining to resolve collisions. While chaining is robust, it requires extra memory for pointers and list structures. Open Addressing offers an alternative: storing every element directly within the hash table's array.
This approach requires a probing sequence to find an alternative slot when a collision occurs. The simplest form of this is Linear Probing.
Linear Probing: The Sequence
When a collision occurs at index $i$, linear probing simply checks the next available slot: $$ \text{index} = (i + 1) \pmod{n} $$ If that slot is also occupied, it checks $i+2$, $i+3$, and so on, until an empty slot is found or the table is determined to be full.
The Challenge of Deletion
Deleting an entry in an open-addressed table isn't as simple as setting the slot to
None.
If we simply remove an item, we might "break" the probing path for other items that collided at
that same index but were stored further down the array.
To solve this, we use Tombstones. A tombstone is a special marker (often an
object like DELETED) that tells the lookup function: "There was something here,
keep
looking," while telling the insertion function: "This slot is available for use."
Implementation: Linear Probing and Tombstones
The following Python implementation demonstrates linear probing, the use of a DELETED
tombstone, and automatic resizing.
DELETED = object()
LOAD_TRESHOLD_FACTOR = 0.75
class MikeysBoxys:
# Step 1: Initialize the table
def __init__(self, size):
self.size = size
self.count = 0
self.mikeyboxy = [None for _ in range(size)] # Buckets
def get_array(self, n):
return self.mikeyboxy[n-1]
# Step 2: Simple hash function
def hash_function(self, key):
return sum(ord(c) for c in str(key)) % self.size
# Step 3: Insert or update a key (Linear Probing)
def mapit(self, key, value):
index = self.hash_function(key)
start_index = index
while self.mikeyboxy[index] is not None and self.mikeyboxy[index] != DELETED:
k, v = self.mikeyboxy[index]
if k == key:
self.mikeyboxy[index] = (key, value)
return
index = (index + 1) % self.size
if index == start_index:
break
self.mikeyboxy[index] = (key, value)
self.count += 1
if self.count/self.size > LOAD_TRESHOLD_FACTOR:
self.autosize()
# Find value for respective key
def lookupval(self, key):
index = self.hash_function(key)
start_index = index
while self.mikeyboxy[index] is not None:
if self.mikeyboxy[index] != DELETED:
k, v = self.mikeyboxy[index]
if k == key:
return v
index = (index + 1) % self.size
if index == start_index:
break
return None
# Step 4: Deletion using Tombstones
def deletekey(self, key):
index = self.hash_function(key)
start_index = index
while self.mikeyboxy[index] is not None:
if self.mikeyboxy[index] is not DELETED:
k, v = self.mikeyboxy[index]
if k == key:
self.mikeyboxy[index] = DELETED
self.count -= 1
return True
index = (index + 1) % self.size
if start_index == index:
break
return False
# Step 5: Double table size when threshold is crossed
def autosize(self):
old_table = self.mikeyboxy
self.size *= 2
self.mikeyboxy = [None for _ in range(self.size)]
self.count = 0
for entry in old_table:
if entry is not None and entry is not DELETED:
key, value = entry
self.mapit(key, value)
How it Works
- Insertion (mapit): If a collision occurs, we increment the index wrapping around the array until we find an empty slot or a tombstone.
- Lookup (lookupval): We follow the same probing path. If we hit a tombstone,
we skip it and keep looking. We only stop if we hit an truly empty (
None) slot. - Deletion (deletekey): When we find the key, we don't set it to
None; we set it toDELETED. This preserves the search trail for keys inserted after it.
Conclusion
Open addressing with linear probing is a powerful technique for cache-friendly hash tables. By understanding the role of tombstones, we ensure that our data retrieval remains reliable even after many deletions.
In the next post, we will look into Quadratic Probing and Double Hashing to reduce primary clustering issues.