CPS222 Lecture: Binary Trees
                                                        Last revised 1/18/2013

Objectives:

1. To define "binary tree"
2. To introduce traversals on binary trees
3. To introduce the use of binary trees to represent arithmetic expression 

Materials:

1. Excerpts from recursive code for binary tree traversals (to project)
2. Non-recursive inorder traversal code to project
3. Level order traversal code to project
4. Guessing game program - executable and code handout

I. Introduction
-  ------------

   A. In our discussion of general trees and forests, we noted that we can
      represent any general tree/forest by an equivalent binary tree; and
      we saw that operations on a tree can be mapped to equivalent operations
      on the binary tree equivalent.

   B. However, binary trees are of great importance in their own right; and
      it is in this sense that we consider them now.

   C. Definition: A binary tree is a set of nodes which is either empty, or it
      consists of a root and two disjoint subsets, designated the left subtree
      and the right subtree, each of which is a binary tree.

      1. Example:               A
                              /   \
                            B      D
                          /   \  /    \
                        C               E
                      /  \            /   \

        - A is the root.  Its subtrees are B..C and D..E
        - B's left subtree is C, and its right subtree is empty
        - both of C's subtrees are empty
        - D's left subtree is empty, and its right subtree is E
        - both of E's subtrees are empty.

        - In drawing the above, I deliberately included pointers to empty
          subtrees. In practice, the tree can also be drawn as:

                                A
                              /   \
                            B      D
                          /           \
                        C               E

      2. Note: (By our earlier definition of tree) a binary tree is not 
         necessarily a tree!  A binary tree can be empty - a tree cannot!
         
         a. Nonetheless, we use similar terminology for talking about binary
            trees - e.g. "parent", "child" etc.
            
         b. For clarity, I will sometimes use the term "general tree" to
            distinguish from "binary tree"

      3. Every binary tree has exactly two subtrees - though one or both may
         be empty.

      4. Not only are the subtrees of a binary tree ordered; but even if a
         binary tree has only one non-empty subtree, we still designate it
         as either "left" or "right".  Thus, the following two binary trees
         are distinct:

                A                               A
              /                                  \
            B                                      B
            
   D. When discussing binary trees, it is useful to define a number of special
      kinds of binary tree.  Unfortunately, the terminology is not used
      consistently from one author to another, so one must be careful to be
      sure he knows what a given writer means!

      1. A proper or strictly binary tree is a binary tree in which every node 
         has either two non-empty subtrees or no non-empty subtrees.  (Some
         writers include in the definition the requirement that the tree itself 
         be non-empty)

        Ex:     A               A
               / \            /   \
             B     C         B     C
                                 /   \
                                D     E
                              /   \
                             F     G

        but not:                A
                              /   \
                             B

      2. A perfect binary tree (called "full" or "complete" by some writers 
         [ though our book uses the term "complete" differently, as we shall 
         see ] is a strictly binary tree in which all the leaves lie on 
         the same level - OR - A binary tree of height 1 (where we use the
         intuitive definition of height) is perfect. A binary tree of height h 
         (h > 1) is perfect iff its two subtrees are perfect binary trees of 
         height h-1.

        Ex:     A                       A
              /   \                   /   \
             B     C                 B     C
                                    / \   / \ 
                                   D   E F   G

         Observe: for a given height, a perfect binary tree contains the
         maximum possible number of nodes.

      3. A complete binary tree (called "almost-complete" by some writers) is
         a binary tree having the following properties:

         a. If the height of the tree is h, then all leaves lie at level h or
            at level h - 1.

         b. If any node has a descendant at level h in its right subtree, then
            all of the leaves in its left subtree are at level h.

        Ex:     A                       A
              /   \                   /   \
             B                       B     C
                                    / \   /
                                   D   E F

         Observe: a perfect binary tree can be converted to a complete, but not 
         perfect, tree of the same height by removing nodes on the lowest level, 
         starting from the right and working toward the left.

   E. There are two important theorems about the number of nodes in various 
      kinds of binary trees:

      1. Thm: In a perfect binary tree of height h, the number of nodes is

                 h
                2  - 1  (if height is measured intuitively in NODES)

         Pf: Assigned as a homework problem

      2. Thm: In a complete binary tree of height h (measured intuitively), 
         there are at least 2^(h-1) nodes and at most 2^h - 1 nodes.

         Pf:  If we delete all the nodes at level h, we end up with a
              perfect tree of height h-1.  By the previous theorem, this tree
              contains 2^(h-1) - 1 nodes.  Since our original tree was of
              height h, we must have deleted at least one node; therefore, our
              original tree had at least 2^(h-1)  nodes.  

              Again, if our complete tree of height h is also perfect, then it 
              contains 2^h - 1 nodes; otherwise, we can add nodes at level 
              h to produce a perfect tree of height h, ending up with 
              2^h - 1 nodes; therefore, our original complete tree had at 
              most 2^h - 1 to nodes begin with.

        Observe: for any non-zero number of nodes, it is always possible
        to construct a complete binary tree. Also, for a given number of nodes, 
        a complete binary tree has the minimum height (though if the tree is 
        not perfect there are other arrangements of the bottom level that yield 
        equal performance).  Since many algorithms have time complexity 
        proportional to the height, this means that complete binary trees are 
        optimal.

        Observe: the preceeding two theorems give us an important measure
        of the size of a binary tree.  These two theorems tell us that for an 
        optimal binary tree of n nodes:

                2^(h-1) <= n <= 2^h - 1

        or:     h-1 <= ceiling(log n) <= h
                                  2

        That is - for a complete binary tree, any algorithm whose time is 
        proportional to the height of the tree is O(log n).

II. Operations on Binary Trees
--  ---------- -- ------ -----

   A. One of the most useful operations on a binary tree is traversal.  Three
      orders of traversal are of special interest:

      1. Preorder:      visit the root
                        traverse the left subtree in preorder
                        traverse the right subtree in preorder

      2. Inorder:       traverse the left subtree in inorder
                        visit the root
                        traverse the right subtree in inorder

      3. Postorder:     traverse the left subtree in postorder
                        traverse the right subtree in postorder
                        visit the root

      4. Note that preorder and postorder were also defined for general trees.
         Inorder pertains only to binary trees, though we did make use of it
         when representing a general tree as a binary tree.

   B. The traversal algorithms are most commonly expressed recursively,
      since they are defined recursively.

      PROJECT Code for binary tree operations

      1. Node class

      2. preorder traversal

         What would you have to do to change this to inorder or postorder?

         ASK

         - change names
         - change relative order of visit and recurison

   C. These operations can also be expressed in non-recursive form.  

      1. Ex: inorder - PROJECT code for non-recursive inorder

      2. Non-recursive preorder traversal is very similar.

      3. Non-recursive postorder traversal is somewhat more complex.

   D. Another traversal that is sometimes of interest is level-order.  For
      this, we use a non-recursive algorithm with a queue:

      1. PROJECT code for level-order

      2. To see that this algorithm works correctly, note the following:

         a. Nodes are visited in the order in which they are inserted into q.

         b. All the nodes at level L are inserted in the queue - in level order
            - before any nodes at level L+1 are inserted in the queue.  This
            can be shown inductively:

            i. Basis: the node at level 0 is inserted in the queue before the
               nodes at level 1 are inserted.

           ii. Hypothesis: assume that for all levels L <= some k (k >= 0) it
               is true that all the nodes at level L are inserted in the queue
               in level order before any node at level L+1 is inserted.  We wish
               to show that it is also true that all the nodes at level k + 1
               are inserted in the queue in level order before any node at
               level k+2 is inserted.

               Proof: as each node at level k is visited, its two children at
               level k+1 are inserted in the queue in level order.  When all
               the nodes at level k have been visited, all the nodes at level
               k+1 have been inserted in level order; but no node at level
               k+1 has been visited; therefore, no node at level k+2 has yet
               been inserted. QED

III. Uses of binary trees
---  ---- -- ------ -----

   A. We have already seen that any general forest or tree can be represented
      by an equivalent binary tree, and that when a linked structure is used,
      this binary tree is more space-efficient because it has fewer wasted null
      pointers.

   B. A very important use of binary trees is in representing arithmetic or
      logical expressions.  Such a tree is called an expression tree.  

      For example, the expression:

        A * (B + C) / D - E     can be represented by the tree:

                        -
                      /   \
                     /     E
                  /    \
                 *      D
               /   \
             A      +
                  /   \
                B       C

      1. Observe that in an expression tree, the internal nodes are operators
         and the external nodes are operands.  The subtress are subexpressions.

      2. Observe further that:

         a. Traversing the tree in inorder yields the inorder form of the
            expression - though parentheses may be needed to show operator
            precedence.
                            Ex: the above:  A*B+C/D-E

         b. Traversing the tree in preorder yields the prefix form of the
            expression. 
                            Ex: the above:  -/*A+BCDE    

         c. Traversing the tree in postorder yields the postfix form of the
            expression.
                            Ex: the above:  ABC+*D/E-

      3. The tree form of an expression can obviously be used for conversion
         from one form to another - e.g. create the tree from infix, then
         traverse it in postorder to yield postfix.  But it has many other
         uses, as well:

         a. In an optimizing compiler, the tree form of an expression can be
            used for optimization.  For example, if the same subtree occurs
            more than once, it can be evaluated once and stored; then the
            resulting value can be plugged in wherever the common subexpression
            occurred.

                ex:     *
                      /   \
                     +     +
                    / \   / \
                   A   B A   B

         b. In an interpreter, an expression can be stored in tree form.
            Whenever the expression is to be evaluated, the tree can be
            traversed in postorder, with the current values of the various
            operands being plugged in in place of the terminal nodes.

   C. Decision trees

      1. Some classification or diagnosis kinds of problems can be solved by
         a protocol based on yes-no questions.  Such a problem can be
         modeled by a binary tree in which each non-leaf node represents a
         question, with the left subtree being the process to be followed if
         the answer is "no" and the right if the answer is "yes".  Leaf
         nodes represent conclusions.

      2. Example - a simple guessing game program.

         a. DEMO

         b. HANDOUT code

   D. We will see later that a very nice sorting algorithm - called heapsort -
      is based on a special kind of binary tree.

   E. A particular kind of binary tree that is often useful is a BINARY
      SEARCH TREE (BST).  (In fact, sometimes people mistakenly think that all
      binary trees are BSTs - not so!)  We will look at this shortly.