The Heapsort algorithm belongs to the family of efficient sorting algorithms, whose average complexity is the best that we can find O(log(n)).

Review

Name Heapsort
Best case O(log(n))
Average case O(log(n))
Worst case O(log(n))
Difficulty Easy
Family Efficient sorts

 

Heap data structure

The algorithm introduces another algorithm design technique: it uses a data structure called heap, but what is that?  

The heap data structure is an array object that we can see as a binary tree

In a binary tree, each parent node can have at most two children.

There are two important proprieties to mark...

Full binary tree

In a full binary tree, every parent node has either two or no children.

Complete binary tree

In a complete binary tree every level, except possibly the last, is completely filled, and all nodes in the last level are as far left as possible.

Our heap must adopt this property and be a complete binary tree, as shown below.

Complete binary tree

 

I talk about how to implement binary trees in my article on C++ tree implementation.

We can have two kinds of the heap: max-heaps and min-heaps.

We are going to use the max-heap structure for our algorithm.

Max-Heap

The max-heap has the property that for every node i

A[parent(i)]  A[i]

The value of a node is at most the value of its parent.

Thus, the largest element in the max-heap is stored at the root and the subtree contains values no larger than that contained at the node itself.

MaxHeap

In this pic, it is represented the correlation between heap and array. 

The number above the node is the corresponding index of the array.

We can summarize the structure in this way:

  • Parent(i) = (i -1)/2
  • Left (i) = 2i + 1
  • Right(i) = 2i + 2

For example, watching the pic, in the position i = 3 there's the value 8.

Parent(3) --> i = 1 --> 14

Left (3) --> i = 7  --> 2

Right(3) --> i = 8 --> 4

Algorithm

Now that we undestood what an heap is, we can analize the algorithm itself.

There are three steps

  1. Building the max-heap from the input data
  2. Now the largest element is stored in the root, we are going to replace it with the last element of the heap and decrease the size by 1
  3. Repeat the steps while the size of the heap is greater than 1

HeapSortThe algorithm was invented by Williams in the year 1964.

Pseudocode

  • Maintain the max-heap

In order to maintain the max-heap propriety, we are going to write the function Heapify.

In this way, the max element is always of the array is placed in the root of our heap.

Heapify

heapify(array[], n, index)

n is the length of the current heap

largest = index

left = 2*index

right = 2*index+1

if left < n AND array[left] > array[index]

largest = left

if right < n AND array[right] > array[index]

largest = right

if largest != index{

swap array[index] with array[largest]

heapify(array, largest)

}

At each step it is determined the largest element between array[index], array[left(index)] and array[right(index)]  and its index is stored in largest.

Base case

If array[index] is the largest, then the subtree rooted in i is already a max-heap tree and the procedure ends.

Recursive step

Otherwise one of the two subtrees has the largest element and array[index] is swapped with array[largest].

The subtree rooted at largest now might violate the propriety and we need to call the function recursively on that subtree.

  • Build the max - heap from the input array

In case of a complete tree, the index of a non-leaf node is given by  

(size/2) -1

Where size is the length of the array. 

All the node after that index are leaf nodes and thus they don't have subtrees to convert to max-heap.

So we can build a max heap from

for (i = size/2-1; i>=0; i--)

heapify(array, size, i)

For example, let's consider the array A = { 3, 4, 1, 2, 5, 9} and see what's happening

size = 6

i = size/2 -1 = 6/2 - 1 = 2

A[2] = 9

 

Heapify

heapify(A, 6, 2)

Heapify

 

i = 1

heapify(A, 6, 1)

Heapify

 

i = 0

heapify(A, 6, 0)

Heapify

We have finally obtained our max-heap!

Heapify

Once we have built the max heap from the input array, we can now go on with the heapSort.

  • HeapSort

Because of the max-heap propriety, the largest element is stored on the root.

So...

  1. Swap the first element with the last element of the array
  2. Remove the last element by decreasing the size of the array
  3. Heapify the tree by reducing the size by 1, so we can have the largest element in the root again
  4. Repeat the process while the size of the heap is greater than 1

for( i = size-1; i >= 0; i--)

swap array[0] with array[i]

heapify(array, i, 0)

Programming language implementation

C++

Heapify

void heapify(int array[], int n_size, int index)
{
    //Set the largest as the index
    int largest = index;
    //Left element in the array of max-heap
    int left = (2*index) + 1;
    //Right element in the array of max-heap
    int right = (2*index) + 2;
    
    //Check if the left children element is greater than its parent
    if (left < n_size && array[left] > array[largest])
        largest = left;
    
    //Check if the right children element is greater than its parent
    if (right < n_size && array[right] > array[largest])
        largest = right;
    
    //If the largest is not the parent
    if (largest != index)
    {
        //Swap it with the largest one
        std::swap(array[index], array[largest]);
        //Recursively check the subtrees
        heapify(array, n_size, largest);
    }
}

heapSort

//heapSort function
void heapSort(int arr[], int n)
{
    //Let's build the max-heap from the input array
    for (int i = n / 2 - 1; i >= 0; i--)
        heapify(arr, n, i);
    
    //Now we are sure that the first element is the max
    n--;                            //While cicle from the last element
    do{
        std::swap(arr[0], arr[n]);  //Let's swap the first element
        heapify(arr, n--, 0);       //Recall the heapify function to rebuild the max-heap
        
    } while(n > 0);
}

Complete program

//
//  HeapSort - Array implementation - C++
//
//  Created by Andrea Cappelletti on 15/04/2018.
//
//  Explanation: https://www.cmprogrammers.com/post.php?id=55

#include 

#define size 5

void heapify(int array[], int n_size, int index)
{
    //Set the largest as the index
    int largest = index;
    //Left element in the array of max-heap
    int left = (2*index) + 1;
    //Right element in the array of max-heap
    int right = (2*index) + 2;
    
    //Check if the left children element is greater than its parent
    if (left < n_size && array[left] > array[largest])
        largest = left;
    
    //Check if the right children element is greater than its parent
    if (right < n_size && array[right] > array[largest])
        largest = right;
    
    //If the largest is not the parent
    if (largest != index)
    {
        //Swap it with the largest one
        std::swap(array[index], array[largest]);
        //Recursively check the subtrees
        heapify(array, n_size, largest);
    }
}

//heapSort function
void heapSort(int arr[], int n)
{
    //Let's build the max-heap from the input array
    for (int i = n / 2 - 1; i >= 0; i--)
        heapify(arr, n, i);
    
    //Now we are sure that the first element is the max
    n--;                            //While cicle from the last element
    do{
        std::swap(arr[0], arr[n]);  //Let's swap the first element
        heapify(arr, n--, 0);       //Recall the heapify function to rebuild the max-heap
        
    } while(n > 0);
}

//This function allow us to print the array so we can check if the merge was successfull
void print(int array[]){
    for(int i = 0; i< size; i++){
        std:: cout <<array[i]<<" ";
        
    }
    std::cout<<std::endl;
}

int main() {
    
    //Let's fill our array of size defined in the beginning of the file
    int array[size]={1,4,3,8,0};
    std::cout<<"Array before heapSort"<<std::endl;
    print(array);
    heapSort(array, size);
    std::cout<<"Array after heapSort"<<std::endl;
    print(array);
}

 

The full code is available on my GitHub.

Feel free to contribute!

Properties

This algorithm has a limited use compared to Merge sort and Quick sort.

However, it uses the heap structure, that it's very common!

Complexity

The heapSort is a really efficient algorithm for a large amount of input.

The time complexity of healpify is O(log(n)) and the of heapSort is O(n), thus the complexity is

 Θ(log(n))

in each case (average, worst, best).

The basic operations on heaps run in time at most proportional to the height of the tree and thus take O(log(n)) time.