Heaps in Golang. Why are they better than sorted arrays?
Let’s go straight to the point. Heaps are not the most commonly known data structure out there. They have very specific properties with narrow use cases. But once I learned about them, I see them more and more useful.
When you need to get min/max from a collection of items, the first thing that comes to mind is to get it sorted and then take the first/last element. But you don’t need to keep everything sorted, you just need to keep the min/max at the top. This is where heaps come in handy. Especially if the collection is dynamic.
I am not going to explain how this datastructure works in detail, it is done well in a lot of places. Just know that heaps are a binary tree with the following properties, comparing to a sorted array:
Operation | Sorted Array | Heap |
---|---|---|
Insert | O(n) | O(log n) |
Delete | O(n) | O(log n) |
Get Min/Max | O(1) | O(1) |
Extract Min/Max (1) | O(n) | O(log n) |
Sort/Heapify (2) | O(n log n) | O(log n) |
(1) Extracting min/max is not the same as getting it. It removes the element from the collection. (2) Heapify is a process of converting an unsorted array into a heap.
container/heap#
Golang’s stdlib has a container/heap
package that does the heavy algorithmic lifting. The interface is simple.
type Interface interface {
sort.Interface
Push(x any) // add x as element Len()
Pop() any // remove and return element Len() - 1.
}
Don’t let the sort.Interface confuse you. It won’t sort anything - it just needs the same functions: Len()
, Less(i, j int) bool
, and Swap(i, j int)
. You can play around the the basic implementation in the official documentation.
For testing I am going use the IntHeap
from the example in the documentation.
type IntHeap []int
func (h IntHeap) Len() int { return len(h) }
func (h IntHeap) Less(i, j int) bool { return h[i] < h[j] }
func (h IntHeap) Swap(i, j int) { h[i], h[j] = h[j], h[i] }
func (h *IntHeap) Push(x any) {
// Push and Pop use pointer receivers because they modify the slice's length,
// not just its contents.
*h = append(*h, x.(int))
}
func (h *IntHeap) Pop() any {
old := *h
n := len(old)
x := old[n-1]
*h = old[0 : n-1]
return x
}
sorted array based heap#
Just for performance comparison, let’s implement a heap using a sorted array. I am going to keep the same Pop
and Push
interface as the container/heap
package, but the implementation will be different. Instead of maintaining a binary tree, we will keep an array sorted at all times.
type IntHeapSortedArray []int
func (h IntHeapSortedArray) Len() int { return len(h) }
func (h *IntHeapSortedArray) Init() {
sort.Ints(*h) // Ensure the array is sorted
}
func (h *IntHeapSortedArray) Push(x any) {
*h = append(*h, x.(int))
sort.Ints(*h)
}
func (h *IntHeapSortedArray) Pop() any {
if len(*h) == 0 {
panic("heap is empty")
}
x := (*h)[0]
*h = (*h)[1:]
return x
}
benchmarking comparison#
Consider the following benchmark. We have 2 heap implementations: one using container/heap
and another using a sorted array. We will push and pop items from both heaps and compare their performance. We are going to use different item counts to see how they scale.
func BenchmarkHeaps(b *testing.B) {
h1 := &IntHeap{}
h3 := &IntHeapSortedArray{}
heaps := []struct {
name string
h SlimHeap
}{
{"container/heap", h1},
{"sorted array", h3},
}
for _, heap := range heaps {
for itemCount := 10; itemCount < 1001; itemCount *= 10 {
b.Run(fmt.Sprintf("%s/itemCount=%d", heap.name, itemCount), func(b *testing.B) {
for i := 0; i < b.N; i++ {
for j := 0; j < itemCount; j++ {
heap.h.Push(j * rand.Intn(1000))
}
for j := 0; j < itemCount; j++ {
heap.h.Pop()
}
assert.Equal(b, 0, heap.h.Len())
}
})
}
}
}
While the itemCount size grows exponentially (10, 100, 1000), the benchmark execution time grows linearly for the container/heap
implementation (498, 3087, 27326), while the sorted array implementation grows exponentially (635.8, 13085, 559916).
BenchmarkHeaps/container/heap/itemCount=10-8 2493511 498.0 ns/op 132 B/op 16 allocs/op
BenchmarkHeaps/container/heap/itemCount=100-8 344109 3087 ns/op 1562 B/op 195 allocs/op
BenchmarkHeaps/container/heap/itemCount=1000-8 43798 27326 ns/op 15944 B/op 1993 allocs/op
BenchmarkHeaps/sorted_array/itemCount=10-8 1881627 635.8 ns/op 292 B/op 18 allocs/op
BenchmarkHeaps/sorted_array/itemCount=100-8 92127 13085 ns/op 3098 B/op 196 allocs/op
BenchmarkHeaps/sorted_array/itemCount=1000-8 2151 559916 ns/op 33864 B/op 1996 allocs/op
One day I am going to learn how to better visualize benchmark results, but for now this needs to do