Blog / Development

Understanding Hash Maps: A Simple Implementation in Go

Muhammed Nurullah Er

...

Introduction

Hash maps, also known as hash tables, are fundamental data structures used in computer science to store key-value pairs. They offer efficient insertion, deletion, and lookup operations, making them essential in various applications. In this blog post, we'll explore what a hash map is, its key characteristics, and delve into a simple implementation in Go.

What is a Hash Map?

A hash map is a data structure that organizes data using a technique called hashing. Hashing involves converting keys into numeric indices through a hash function. These indices are then used to store and retrieve corresponding values efficiently. The primary advantage of hash maps lies in their ability to provide constant-time average-case performance for insertion, deletion, and lookup operations, assuming a good hash function and proper handling of collisions.

Consider that you have to find your name between 10.000 names. You have to search through this large name array sequentially until you find your name. As you can expect this will take a long time. In the worst case, your name will be at the end of the list or not found at all, the time complexity is O(n) in this scenario. But if you have separated the names to buckets alphabetically, you would know that you have to look-up the bucket corresponding to the first letter of your name. Now you can get your name with relatively constant time complexity, because you simply know where to look for your name. Of course you should consider this classification when you insert a new name to this array too.

We have mentioned “good hash function and proper collision handling” above. Let’s explain these terms with our scenario. The hash function provides us a useful index. In our scenario our hash function simply takes the first letter of our name. If your name is “John”, you will look for your name in the bucket “J”. Normally you can say “But there will be other names starting with J too.”. That’s exactly what collision is. If our hash function generates the same index for different values, we have to put multiple values to a single index. For that reason a hashmap is not a simple array of values, it’s an array of linked lists. In our scenario there shouldn’t be a “name” value in each index. There should be a node that contains a name and a pointer to the next name in this index. We will have a better understanding with our custom hash map implementation.

Our Custom Hash Map Implementation in Go:

Let's delve into a simple implementation of a hash map in Go. Our implementation will focus on basic functionality, including insertion, retrieval, and deletion of key-value pairs.

In our scenario we just had names for the sake of simplicity. In an actual hashmap, we would have key-value pairs. And as you can see, each node in a bucket will include a key, a value and a pointer to the next node in the same bucket. With this implementation we would have a linked list in each index of the hash table.

type Node struct {
    Key   string
    Value string
    Next  *Node
}

And here is our hash table that simply includes an array of Node pointers and an integer represents the size of the table.

type HashMap struct {
    buckets []*Node
    size    int
}

Now we need a function to create a new hash map. That’s pretty simple:

func NewHashMap(size int) *HashMap {
    return &HashMap{
        buckets: make([]*Node, size),
        size:    size,
    }
}

Now we need a hash function to calculate the index for each key-value pair that we will store in our hash table. Normally a proper hash algorithm should be used to minimize the collision. But again for the sake of simplicity we'll just get the remainder of the length of our key divided by the size of our hash map as hash.

func hashFunction(key string, size int) uint {
	return uint(len(key) % size)
}

Here with the insert method of our HashMap we will better understand the structure of hashmaps.

// Insert method simply gets key and value as parameters.
func (hm *HashMap) Insert(key string, value string) {

    // Here we calculate the index that we will put our key-value pair (node) using the hashFunction.
    index := hashFunction(key, hm.size)

    // Here we create our node with key and value
    node := &Node{Key: key, Value: value}

    /* If there are no key-value pairs in the generated index, we simply put our new node to
     that index. */
    if hm.buckets[index] == nil {
        hm.buckets[index] = node
    } else {
       
        /* If there is a node in the generated index, that means we have a collision. No panic,
         that's why we have Next in our node. We should continuously set a current node to the
         next node until we have the next node "nil". That means there are no nodes after that
         current one. */
        current := hm.buckets[index]
        for current.Next != nil {
            current = current.Next
        }
        /* Now we can set the next node of the current node to our newly generated node with
        peace in mind. */
        current.Next = node
    }
}

Now, let’s implement a get method:

func (hm *HashMap) Get(key string) string {

    // Calculate the index with hashFunction to know where we should look.
    index := hashFunction(key, hm.size)

    // We get the first node in this index and assign it to a variable.
    current := hm.buckets[index]

    for current != nil {
        /* We can not simply compare our key with the key of the current node. Because it may
        not be the node that we are looking for, but what about the next node of it, the next
        node of the next node of it... Thus, we should continuously set the current node to the
        next node until the key of the current node matches with our key. Then we return the
        value of that node. */
        if current.Key == key {
            return current.Value
        }
        current = current.Next
    }
    
    /* If the code reaches here, it means that there is not a node that matches with our key.
    We simply return an empty string here. ( In the real world we would handle this situation
    differently.) */
    return ""
}

Our last method will be the delete.

func (hm *HashMap) Delete(key string) {
    // Calculate the index of the key using the hash function
    index := hashFunction(key, hm.size)
    // Get the first node at the calculated index
    current := hm.buckets[index]
    // Initialize a pointer to keep track of the previous node
    var prev *Node

    // Traverse the linked list at the calculated index
    for current != nil {
        // If the current node's key matches the key to be deleted
        if current.Key == key {
            // If the previous pointer is nil, it means the node to delete is the first node
            if prev == nil {
                /* Update the head of the linked list to skip the current node. It means we no
                longer have the current node. */
                hm.buckets[index] = current.Next
            } else {
                /* Skip the current node by updating the previous node's next pointer. When we
                set the next node of the previous node to the next node of the current node, we
                simply no longer have the current node. It looks complicated but give it a
                shoot, you will understand. */
                prev.Next = current.Next
            }
            // Exit the method after deletion
            return
        }
        /* If the key of current node not matches with our key, we move to the next node in the
        linked list */
        prev = current
        current = current.Next
    }

Example usage of our hashmap:

func main() {
    // Create a new hashmap with size 10
    myHashMap := NewHashMap(10)

    // Insert key-value pairs
    myHashMap.Insert("john", "doe")
    myHashMap.Insert("foo", "bar")

    // Get and print values
    value := myHashMap.Get("john")
    fmt.Println("Value for key john:", value)

    // Delete a key
    myHashMap.Delete("foo")
    /* If we try to get the value for key "foo" we will get an empty string. (You can return a
    proper error or a flag in your get method) */
}

Let’s Wrap Up the Key Characteristics of Hash Maps:

  • Key-Value Storage: Hash maps store data in key-value pairs, where each key is unique and associated with a specific value.
  • Hash Function: A hash function maps keys to indices in the hash table, facilitating fast access to values.
  • Collision Handling: Collisions occur when two keys produce the same hash value. Hash maps employ collision resolution techniques such as chaining or open addressing to handle collisions gracefully.
  • Dynamic Sizing: Many hash map implementations dynamically resize to maintain a balanced load factor, ensuring optimal performance as the number of elements changes.
  • Efficient Operations: Hash maps offer constant-time average-case performance for common operations like insertion, deletion, and lookup.

Note: Normally a hash map will have a grow/shrink implementation for efficient memory usage. We didn’t include it here for the sake of simplicity.

“Writing is seeing the future.” Paul Valéry