5.2.1 Creating a Hash Table Class
Hash tables are a fundamental data structure in computer science, widely used for their efficiency in storing and retrieving data. In this section, we will delve into creating a hash table class in JavaScript, exploring its internal mechanisms, and implementing essential operations. By the end of this guide, you will have a solid understanding of how hash tables work and how to implement them effectively in JavaScript.
Understanding Hash Tables
A hash table is a data structure that maps keys to values. It uses a hash function to compute an index into an array of buckets or slots, from which the desired value can be found. The efficiency of a hash table comes from its ability to perform lookups, insertions, and deletions in constant average time complexity, O(1).
Why Use a Prime Number for Size?
Choosing a prime number for the size of the hash table helps in reducing the number of collisions. Collisions occur when two keys hash to the same index. A prime number size ensures a more uniform distribution of keys across the hash table, minimizing the likelihood of collisions.
Implementing a Hash Table Class
Let’s start by defining a basic HashTable
class in JavaScript. This class will include an array to store data and a simple hash function to compute indices.
class HashTable {
constructor(size = 53) {
this.keyMap = new Array(size);
}
}
The Hash Function
A hash function is a critical component of a hash table. It takes a key and returns an index in the array. Our hash function will be simple yet effective, using a prime number to reduce collisions.
_hash(key) {
let total = 0;
let WEIRD_PRIME = 31;
for (let i = 0; i < Math.min(key.length, 100); i++) {
let char = key[i];
let value = char.charCodeAt(0) - 96; // Assuming keys are lowercase letters
total = (total * WEIRD_PRIME + value) % this.keyMap.length;
}
return total;
}
This function iterates over the characters of the key, calculates a value based on the character’s ASCII code, and combines it using a prime number to produce a hash.
Setting and Getting Values
Next, we implement methods to set and get values in the hash table. These methods will handle collisions using separate chaining, where each index in the array contains a list of key-value pairs.
set(key, value) {
let index = this._hash(key);
if (!this.keyMap[index]) {
this.keyMap[index] = [];
}
this.keyMap[index].push([key, value]);
}
get(key) {
let index = this._hash(key);
if (this.keyMap[index]) {
for (let pair of this.keyMap[index]) {
if (pair[0] === key) {
return pair[1];
}
}
}
return undefined;
}
- Set Method: Computes the index using the hash function. If no bucket exists at that index, it initializes an empty array. It then pushes the key-value pair into the array.
- Get Method: Computes the index and checks if a bucket exists. If it does, it iterates through the bucket to find the key and returns the associated value.
Handling Collisions with Separate Chaining
Separate chaining is a technique to handle collisions by storing multiple key-value pairs at the same index in a list. This approach allows multiple keys to hash to the same index without overwriting each other.
Retrieving All Keys and Values
To retrieve all keys and values stored in the hash table, we implement the keys
and values
methods. These methods iterate over the buckets and collect unique keys and values.
keys() {
let keysArr = [];
for (let bucket of this.keyMap) {
if (bucket) {
for (let pair of bucket) {
if (!keysArr.includes(pair[0])) {
keysArr.push(pair[0]);
}
}
}
}
return keysArr;
}
values() {
let valuesArr = [];
for (let bucket of this.keyMap) {
if (bucket) {
for (let pair of bucket) {
if (!valuesArr.includes(pair[1])) {
valuesArr.push(pair[1]);
}
}
}
}
return valuesArr;
}
- Keys Method: Iterates over each bucket, checks for pairs, and adds unique keys to the array.
- Values Method: Similar to the keys method, but collects unique values.
Testing the Hash Table
Testing is crucial to ensure the correctness of our hash table implementation. We should test with various types of data, including edge cases like collisions and non-existent keys.
let ht = new HashTable();
ht.set("hello", "world");
ht.set("goodbye", "cruel world");
ht.set("foo", "bar");
ht.set("foo", "baz");
console.log(ht.get("hello")); // Output: world
console.log(ht.get("goodbye")); // Output: cruel world
console.log(ht.get("nonexistent")); // Output: undefined
console.log(ht.keys()); // Output: ['hello', 'goodbye', 'foo']
console.log(ht.values()); // Output: ['world', 'cruel world', 'baz']
Best Practices and Optimization Tips
- Prime Number Size: Always choose a prime number for the size of the hash table to reduce collisions.
- Load Factor: Monitor the load factor (number of entries divided by the number of buckets) and resize the hash table when necessary to maintain efficiency.
- Collision Resolution: Consider other collision resolution techniques like open addressing if separate chaining does not meet your needs.
- Testing: Thoroughly test your hash table with different data types and edge cases to ensure robustness.
Common Pitfalls
- Poor Hash Function: A hash function that does not distribute keys uniformly can lead to clustering and performance degradation.
- Ignoring Collisions: Failing to handle collisions properly can result in data loss or incorrect retrieval.
- Memory Management: Inefficient use of memory can lead to performance issues, especially with large datasets.
Conclusion
Creating a hash table class in JavaScript provides a deep understanding of how hash tables work and their importance in efficient data storage and retrieval. By implementing a hash table from scratch, you gain insight into the intricacies of hash functions, collision resolution, and performance optimization.
Further Reading and Resources
Quiz Time!
### What is the primary purpose of a hash function in a hash table?
- [x] To compute an index for storing a key-value pair
- [ ] To encrypt data
- [ ] To sort data
- [ ] To compress data
> **Explanation:** A hash function computes an index for storing a key-value pair in a hash table, enabling efficient data retrieval.
### Why is a prime number size beneficial for a hash table?
- [x] It reduces collisions
- [ ] It increases memory usage
- [ ] It simplifies the hash function
- [ ] It speeds up computation
> **Explanation:** Using a prime number size reduces collisions by ensuring a more uniform distribution of keys across the hash table.
### What is separate chaining in the context of hash tables?
- [x] A collision resolution technique using linked lists
- [ ] A method to encrypt data
- [ ] A way to sort keys
- [ ] A technique to compress data
> **Explanation:** Separate chaining is a collision resolution technique where each index in the hash table contains a linked list of key-value pairs.
### How does the `set` method handle collisions in the provided hash table implementation?
- [x] By storing key-value pairs in a list at each index
- [ ] By overwriting existing values
- [ ] By ignoring new entries
- [ ] By resizing the hash table
> **Explanation:** The `set` method handles collisions by storing key-value pairs in a list at each index, allowing multiple pairs to exist at the same index.
### What does the `get` method return if a key is not found in the hash table?
- [x] undefined
- [ ] null
- [ ] 0
- [ ] An empty string
> **Explanation:** If a key is not found, the `get` method returns `undefined`, indicating the key does not exist in the hash table.
### What is the load factor in a hash table?
- [x] The ratio of the number of entries to the number of buckets
- [ ] The total number of keys
- [ ] The size of the hash table
- [ ] The number of collisions
> **Explanation:** The load factor is the ratio of the number of entries to the number of buckets, used to determine when to resize the hash table.
### Which of the following is a common pitfall when implementing a hash table?
- [x] Poor hash function design
- [ ] Using too many buckets
- [ ] Storing too few keys
- [ ] Using a prime number size
> **Explanation:** A poor hash function design can lead to clustering and performance issues, making it a common pitfall in hash table implementation.
### What is the time complexity of retrieving a value from a hash table on average?
- [x] O(1)
- [ ] O(n)
- [ ] O(log n)
- [ ] O(n^2)
> **Explanation:** On average, retrieving a value from a hash table has a time complexity of O(1) due to efficient indexing.
### What is the purpose of the `keys` method in the hash table class?
- [x] To retrieve all unique keys stored in the hash table
- [ ] To delete keys from the hash table
- [ ] To sort keys
- [ ] To encrypt keys
> **Explanation:** The `keys` method retrieves all unique keys stored in the hash table, allowing users to access the stored keys.
### True or False: The `values` method in the hash table class can return duplicate values.
- [x] True
- [ ] False
> **Explanation:** The `values` method can return duplicate values if different keys map to the same value, although the implementation provided avoids duplicates.