CPS222 Lecture: Heaps; Priority Queues              Last revised 1/25/2015

Objectives

1. To show how a complete binary tree can me mapped straight-forwardly to an array.
2. To define a heap, and show how a heap can be maintained.
3. To show how a heap can be used to implement a priority queue.

I. Heaps
-  -----

   A. In today's lecture, we're going to cover the same ground as the
      assigned section in the book, but in a different order.

      1. We will begin by talking about a special kind of binary trees
         known as a heap.  We are then show a special use of heaps - to 
         implement a data structure known as a priority queue.

      2. Your text starts by discussing priority queues, and then introduces
         heaps as a way of implementing them.
         
   B. Recall that in talking about binary trees we defined the notion of a
      complete binary tree. 

      1. A complete binary tree (called "almost-complete" by some writers) is
         a binary tree having the following properties:

         a. If the height of the tree is h, then all leaves lie at level h or
            at level h - 1.

         b. If any node has a descendant at level h in its right subtree, then
            all of the leaves in its left subtree are at level h.

        Ex:     A                       A
              /   \                   /   \
             B                       B     C
                                    / \   /
                                   D   E F

         Recall: a perfect binary tree can be converted to a complete, but not perfect,
         binary tree of the same height by removing nodes on the lowest level, starting
         from the right and working toward the left.  If all the nodes on the lowest level
         are removed this way, one ends up with another perfect tree of height one less.

      2. We also showed that, in a complete binary tree of height h, there are 
         at least 2^(h-1) nodes and at most 2^h - 1 nodes.
         
   C. There is a correspondence between an array and a COMPLETE binary tree.

      1. Consider what happens if we number the nodes in a complete binary 
         tree, using level order - e.g:

                          1
                       /     \
                      2       3
                     / \     / \
                    4   5   6   7
                   / \
                  8   9

      2. Observe that the following relationship holds between the number
         of a node and the number of its children:

         if m is the number of a node, then 2m is the number of its left
         child (unless 2m exceeds the number of nodes in the tree, in which
         case it has no left child.)  Likewise, 2m+1 is the number of its
         right child, unless 2m+1 exceeds the number of nodes.

      3. Likewise, if m is the number of a node, then m / 2 is the number of
         its parent - unless m / 2 = 0 (m = 1) - in which case the node is the 
         root of the tree and has no parent.

      4. A complete binary tree, then, can be represented by an array without
         using any pointers.  Furthermore, in such a representation it is easily
         possible to go from a node to its children and also from a child back
         to its parent.  (When implementing such an array in C/C++/Java, it is
         convenient to not use slot 0 in the array, storing the nodes in slots
         1 .. size of tree, which means the total space allocated for the tree
         is one more slot than actually used.  There are ways to use slot 0 as a
         header slot for certain operations, or to it can be used to store 
         information about the total number of nodes in the tree.)

      5. Example: the tree

                APPLE
               /     \
            BANANA   CHERRY
            /
          DOGWOOD

         can be represented by the array:

                [1]     [2]     [3]     [4]
                APPLE   BANANA  CHERRY  DOGWOOD

         and the array 

                [1] [2] [3] [4] [5] [6] [7] [8]
                 A   C   F   G   I   M   Q   Z

         represents the tree

                 A
                / \
              C     F
             / \   / \
            G   I M   Q
           /
          Z

   D. One apecial kind of complete binary tree is known as a
      HEAP.  A heap is a binary tree with the following properties:

      1. The STRUCTURE PROPERTY: it is complete

      2. The HEAP PROPERTY: The key at each node is <= the key at either of 
         its children (if it has any.)

      3. Examples

         a. Both of the above trees are heaps

         b. Example: the following is not a heap 
        
                         CAT
                       /     \
                    EEL       AARDVARK
                   /    \    /    \
              ZEBRA RACCOON FOX   SNAKE

            Why?

            ASK

            The heap order property is violated by AARDVARK, because it is
            not true that CAT <= AARDVARK

         c. Example: the following is not a heap

                         CAT
                       /     \
                    EEL       FOX
                   /    \         \
              ZEBRA RACCOON       SNAKE

            Why?

            ASK

            The heap structure property is violated by SNAKE.

      4. Note: nothing is said about the relative order of the keys of the
         children - only the relationship between the parent and the
         child.  Thus, both of the following are heaps:

                CAT                             CAT
               /   \            and            /   \
            DOG     FOX                     FOX     DOG

      5. Note that this definition defines what is sometimes called a
         "minheap" because the key at the root is the minimum of all the
         keys in the tree.  It is also possible to define a "maxheap" by
         changing the <= requirement in the heap order property to >=.

II. Maintaining a heap
--  ------------------

   A. We now consider the basic strategy for maintaining a heap.  We
      need to support two basic operations:

      1. Construction: inserting new items into the heap either incrementally, 
         or enmasse (creating a heap from scratch from a mass of data.)

         a. This can be done in O(log n) time incrementally.

         b. It can be done in amortized O(1) time enmasse.

      2. Removing the item with smallest value from the heap.  (Finding
         it is easy - it is always the top of the heap - what's a bit
         more complicated is replacing it with the next smallest value.)

         This can be done in O(log n) time.

      3. We do NOT consider an operation for removing a SPECIFIC item
         from the heap.  As it turns out, such an operation is not needed
         for the uses of heaps we have discussed, and would take O(n)
         time just to FIND the specific item, since a heap is not intended
         as a search structure.

      4. Three preliminary remarks:

         a. We represent the heap by a data structure consisting of a count
            of the number of items currently in the heap (n) and an array
            of actual items (in slots [1] .. [n]).  We assume that the array
            has additional space available for adding new items - so to
            add an item we can increment n, which makes slot [n+1] part of
            the heap, and then adjust the information in the heap appropriately.

         b. Because a heap is a complete binary tree, we know that its height
            is <= ceiling(log n).  Hence, any operation that performs at most 
            one operation at each level in the tree takes time O(log n)
            
         c. The algorithms I'm presenting differ in some details from the
            ones in the book, but are essentially the same.

   B. Constructing a Heap
   
      1. The strategy for incremental construction is this: to add a new node
         node to a heap:

         a. Declare slot n+1 to be part of the heap.  Call this the vacant
            slot.

         b. Perform the following operation repeatedly:
         
            i. Consider the parent of the vacant slot. (Slot (vacant slot / 2)).  
               It the parent does not exist (vacant slot is 1) or the current 
               contents of the parent slot <= the new item, quit this loop.  
               
           ii. Otherwise, move the contents of the parent slot into the vacant
               slot and declare the parent slot to be the vacant slot
               
         c. When the loop is done, insert the new entry in the vacant slot.

         d. Example: Add 3 to the following heap

                           1
                      4         2
                   7     5   10     9
                 8

            - Initially, vacant slot is right child of 7.

                           1
                      4         2
                   7     5   10     9
                 8   _

            - Since 7 > 3, move 7 into the vacant slot and declare its slot 
              the vacant slot.

                           1
                      4         2
                   _     5   10     9
                 8   7

            - 4 is the parent of the new vacant slot.  Since 4 > 3, move
              4 into the vacant slot and declare its slot vacant.
         
                           1
                      _         2
                   4     5   10     9
                 8   7

            - 1 is the parent of the new vacant slot.  Since 1 <= 3, stop.

            - Put 3 into the vacant slot

                           1
                      3         2
                   4     5   10     9
                 8   7

         e. Clearly, this process is O(h) = O(log n)

      2. If we have all the entries available to us at the outset, we can
         build the heap more efficiently as follows:

         a. Initially just put the entries into the array representation
            in any order.  The result, viewed as a binary tree, will satisfy
            the heap structure property, but not the heap order property.
            
         b. Convert this to a structure satisfying the heap order property -
            the algorithm for this is given in section 8.3.6 of the book
            (where it is called bottom-up heap construction.)

         c. The book gives an analysis that shows that the cost of building the 
            entire heap this way is O(n), which makes the amortized cost per 
            entry O(1).

   C. Removing the minimum item from a heap (removeMin)

      1. The algorithm is similar to that for incremental construction.
         Since the minimum item is to be removed from the heap, we
         consider its slot (the root) to be vacant.  Likewise, since
         the size of the heap is to be decreased by 1, we must find a new
         home for the item currently in slot n (the displaced item),
         since the size of the heap is being reduced to n-1.

         a. Perform the following process repeatedly:
         
            i. Consider the child or children of the vacant slot (slots
               2 * (vacant slot) and 2 * (vacant slot) + 1.

               - If neither is part of the heap (2 * vacant slot) > new heap
                 size, quit this loop.
               
               - If there are two children, consider the child item with
                 the smallest value - we call the slot where this occurs
                 child slot.
                 
               - If the displaced item is <= than this child, quit this loop.
        
           ii. Otherwise, move the child item into the vacant slot, and
               consider the child slot to be the new vacant slot.
               
         b. When the loop is done, put the displaced item in the vacant slot. 
            
      2. Example: Remove the smallest item from the following heap:

                           1
                      3         2
                   4     5   10     9
                 8  7
                
            - Initially, the displaced item is 7.  The vacant slot is the
              one that contained 1

                           _            Displaced item = 7
                      3          2
                   4     5   10     9
                 8                      (Note that the slot that contained
                                         7 is no longer considered part of
                                         the heap)

            - Since 2 is the smallest child of the vacant slot, and 7 > 2,
              move 2 into the vacant slot and make its slot the new vacant
              slot.
                           2            Displaced item = 7
                      3          _
                   4     5   10     9
                 8
                        
            - Since 9 is the smallest child of the vacant slot, and 7 <= 9,
              stop.  Put the displaced item - 7 - into the vacated slot

                           2
                      3          7
                   4     5   10     9
                 8
                        
      3. Clearly, this process is O(h) = O(log n) - Why?

            ASK

III. Uses for Heaps
---  ---- --- -----

   A. One use discussed in the book is to represent a priority queue.
      (We assume here that smaller numbers mean higher priority - e.g
      "priority 1" beats "priority 2").

      1. A priority queue is often used in conjunction with some kind of
         server that provides services on a priority basis - e.g.

         a. A priority CPU scheduler in an operating system assigns the CPU to 
            the process with the smallest priority value.
         
         b. The scheduler associated with a print queue might print the 
            shortest job (in terms of number of pages) first.

      2. The principal operation a priority queue needs to support is to
         find the entry with the smallest priority value and remove it from 
         the queue.  (The book calls this removeMin).

      3. Note that, with a heap based on priority values, the smallest
         value is always found "at the top of the heap".  Because we can
         remove this entry and replace it with the one having the next
         smallest priority value easily (as we shall see shortly) we can use 
         a heap as a priority queue.  

   B. Another use of heaps is in event-driven simulations of some system.

      1. Example: simulate the operation of a bank.  Events are

         a. New customer arrives and gets in line
         
         b. Finish processing a customer transaction

      2. The heart of such a simulation is the "event list" which maintains
         a list of simulated events in the order in which they occur.

      3. The principal operation the event list needs to support is the
         ability to find the next event that is scheduled to occur and
         remove it from the event list.

         Again, a heap based on the scheduled time for events works well 
         for this.