Open Addressing in Hash Tables: Techniques and Implementations

October 25, 2024 8 min read Data Structures Algorithms JavaScript Hash Tables Open Addressing Collision Resolution Linear Probing Quadratic Probing Double Hashing

Explore open addressing in hash tables, focusing on collision resolution techniques like linear probing, quadratic probing, and double hashing, with practical JavaScript implementations.

On this page

5.3.1 Open Addressing

Open addressing is a collision resolution technique used in hash tables where all elements are stored directly within the hash table array itself. Unlike separate chaining, which uses linked lists to handle collisions, open addressing finds another spot within the array to store the colliding element. This approach requires a careful balance between hash function design and probing strategy to maintain efficient operations.

Understanding Open Addressing

In open addressing, when a collision occurs (i.e., two keys hash to the same index), the algorithm searches for the next available slot in the array according to a specific probing sequence. The main probing techniques include linear probing, quadratic probing, and double hashing. Each method has its own advantages and trade-offs, which we will explore in detail.

Linear Probing

Linear probing is the simplest form of open addressing. When a collision occurs, the algorithm checks the next slot in the array (index + 1) until it finds an empty slot. This method is straightforward but can lead to a phenomenon known as primary clustering, where a cluster of occupied slots forms, increasing the likelihood of future collisions.

Linear Probing Algorithm

class LinearProbingHashTable {
  constructor(size = 53) {
    this.keyMap = new Array(size);
  }

  _hash(key) {
    let total = 0;
    let PRIME = 31;
    for (let i = 0; i < Math.min(key.length, 100); i++) {
      let char = key[i];
      let value = char.charCodeAt(0) - 96;
      total = (total * PRIME + value) % this.keyMap.length;
    }
    return total;
  }

  set(key, value) {
    let index = this._hash(key);
    while (this.keyMap[index]) {
      index = (index + 1) % this.keyMap.length;
    }
    this.keyMap[index] = [key, value];
  }

  get(key) {
    let index = this._hash(key);
    while (this.keyMap[index]) {
      if (this.keyMap[index][0] === key) {
        return this.keyMap[index][1];
      }
      index = (index + 1) % this.keyMap.length;
    }
    return undefined;
  }
}

Quadratic Probing

Quadratic probing uses a quadratic function to compute the interval between probes. This method reduces primary clustering but can lead to secondary clustering, where keys that hash to the same initial index follow the same probe sequence.

Quadratic Probing Formula

The quadratic probing formula is:

index = (initialIndex + c1 * i + c2 * i * i) % arraySize;

Where c1 and c2 are constants, and i is the probe attempt number.

Quadratic Probing Implementation

class QuadraticProbingHashTable {
  constructor(size = 53) {
    this.keyMap = new Array(size);
  }

  _hash(key) {
    let total = 0;
    let PRIME = 31;
    for (let i = 0; i < Math.min(key.length, 100); i++) {
      let char = key[i];
      let value = char.charCodeAt(0) - 96;
      total = (total * PRIME + value) % this.keyMap.length;
    }
    return total;
  }

  set(key, value) {
    let index = this._hash(key);
    let i = 1;
    while (this.keyMap[index]) {
      index = (index + i * i) % this.keyMap.length;
      i++;
    }
    this.keyMap[index] = [key, value];
  }

  get(key) {
    let index = this._hash(key);
    let i = 1;
    while (this.keyMap[index]) {
      if (this.keyMap[index][0] === key) {
        return this.keyMap[index][1];
      }
      index = (index + i * i) % this.keyMap.length;
      i++;
    }
    return undefined;
  }
}

Double Hashing

Double hashing uses a second hash function to calculate the step size for probing. This method is more complex but effectively reduces clustering by ensuring a more uniform distribution of keys across the hash table.

Double Hashing Formula

The double hashing formula is:

index = (hash1(key) + i * hash2(key)) % arraySize;

The secondary hash function, hash2, must not evaluate to zero to ensure that the probing sequence covers all slots in the table.

Double Hashing Implementation

class DoubleHashingHashTable {
  constructor(size = 53) {
    this.keyMap = new Array(size);
  }

  _hash1(key) {
    let total = 0;
    let PRIME = 31;
    for (let i = 0; i < Math.min(key.length, 100); i++) {
      let char = key[i];
      let value = char.charCodeAt(0) - 96;
      total = (total * PRIME + value) % this.keyMap.length;
    }
    return total;
  }

  _hash2(key) {
    return (key % (this.keyMap.length - 1)) + 1;
  }

  set(key, value) {
    let index = this._hash1(key);
    let i = 0;
    while (this.keyMap[index]) {
      index = (this._hash1(key) + i * this._hash2(key)) % this.keyMap.length;
      i++;
    }
    this.keyMap[index] = [key, value];
  }

  get(key) {
    let index = this._hash1(key);
    let i = 0;
    while (this.keyMap[index]) {
      if (this.keyMap[index][0] === key) {
        return this.keyMap[index][1];
      }
      index = (this._hash1(key) + i * this._hash2(key)) % this.keyMap.length;
      i++;
    }
    return undefined;
  }
}

Comparing Probing Methods

Each probing method has its strengths and weaknesses:

Linear Probing: Simple to implement but can lead to primary clustering, which degrades performance as the table fills up.
Quadratic Probing: Reduces primary clustering but can still suffer from secondary clustering. It requires careful selection of constants to ensure all slots are probed.
Double Hashing: More complex but provides a more uniform distribution of keys, reducing clustering and improving performance.

Best Practices and Considerations

Load Factor: Maintain a load factor below 0.7 to minimize collisions and ensure efficient probing.
Table Size: Use a prime number for the hash table size to reduce clustering and improve distribution.
Hash Functions: Design hash functions to minimize collisions and ensure a uniform distribution of keys.
Performance Needs: Choose a probing method based on the specific performance requirements and complexity constraints of your application.

Conclusion

Open addressing is a powerful technique for collision resolution in hash tables, offering several methods to balance simplicity and performance. By understanding and implementing linear probing, quadratic probing, and double hashing, you can optimize hash table operations for a wide range of applications.

Quiz Time!

### Which of the following is a characteristic of open addressing? - [x] All elements are stored within the hash table array itself. - [ ] Elements are stored in linked lists outside the hash table. - [ ] It uses a separate chaining method for collision resolution. - [ ] It requires additional memory for storing pointers. > **Explanation:** Open addressing stores all elements within the hash table array itself, unlike separate chaining which uses linked lists. ### What is a primary disadvantage of linear probing? - [x] Primary clustering - [ ] Secondary clustering - [ ] Requires a second hash function - [ ] Increased memory usage > **Explanation:** Linear probing can lead to primary clustering, where a cluster of occupied slots forms, increasing the likelihood of future collisions. ### How does quadratic probing resolve collisions? - [x] By using a quadratic function to compute the interval between probes - [ ] By moving to the next slot in a linear fashion - [ ] By using a second hash function to calculate the step size - [ ] By storing elements in linked lists > **Explanation:** Quadratic probing uses a quadratic function to compute the interval between probes, reducing primary clustering. ### What is the formula for double hashing? - [x] index = (hash1(key) + i * hash2(key)) % arraySize - [ ] index = (initialIndex + c1 * i + c2 * i * i) % arraySize - [ ] index = (hash1(key) + hash2(key)) % arraySize - [ ] index = (initialIndex + i) % arraySize > **Explanation:** Double hashing uses the formula index = (hash1(key) + i * hash2(key)) % arraySize, where a second hash function calculates the step size. ### Which probing method is most effective at reducing clustering? - [x] Double hashing - [ ] Linear probing - [ ] Quadratic probing - [ ] Separate chaining > **Explanation:** Double hashing is effective at reducing clustering by using a second hash function to calculate the step size, ensuring a more uniform distribution. ### What is a key consideration when implementing quadratic probing? - [x] Choosing appropriate constants for the quadratic function - [ ] Ensuring the secondary hash function does not evaluate to zero - [ ] Using linked lists to store elements - [ ] Maintaining a load factor above 0.9 > **Explanation:** Quadratic probing requires careful selection of constants for the quadratic function to ensure all slots are probed. ### What is the recommended load factor for open addressing? - [x] Below 0.7 - [ ] Above 0.9 - [ ] Exactly 1.0 - [ ] Between 0.8 and 0.9 > **Explanation:** Maintaining a load factor below 0.7 minimizes collisions and ensures efficient probing in open addressing. ### Why is a prime number recommended for hash table size? - [x] To reduce clustering and improve distribution - [ ] To increase memory usage - [ ] To simplify hash function design - [ ] To ensure all slots are probed > **Explanation:** Using a prime number for the hash table size reduces clustering and improves the distribution of keys. ### What is the main advantage of double hashing over other probing methods? - [x] It provides a more uniform distribution of keys. - [ ] It is simpler to implement. - [ ] It uses linked lists to store elements. - [ ] It requires less memory. > **Explanation:** Double hashing provides a more uniform distribution of keys, reducing clustering and improving performance. ### True or False: Open addressing requires additional memory for storing pointers. - [ ] True - [x] False > **Explanation:** False. Open addressing stores all elements within the hash table array itself and does not require additional memory for storing pointers.

Monday, October 28, 2024

5.3.2 Separate Chaining

Browse Data Structures and Algorithms in JavaScript