CPS222 Lecture: Algorithm Design Strategies             Last modified 4/29/2015

Materials

1. Projectable of brute force solution to max sum problem
2. Huffman algorithm Powerpoint
3. Dynamic Programming Fibonacci program example
4. Projectable of partially filled in LCS table
5. Projectable of figure 12.2 p. 563 in Goodrich, Tamassia, Mount
6. Projectable of same figure, but showing derivation of GTTTAA
7. Projectable of example use of optimal BST algorithm on a tree of 4 keys
8. obst program to demo and project

I. Introduction
-  ------------

   A. At this point in the course, we are going to shift our focus somewhat.
      
      1. Up until now, our focus has been on learning "standard" algorithms
         and their associated data structures.  

      2. When confronted with a problem to solve, you should always ask "can this problem 
         be viewed as an instance of a problem for which there exists a known algorithm?"
         If the answer is "yes", then you don't have to "reinvent the wheel".

         Example: We saw earlier that problems as diverse as scheduling tasks with 
         prerequisites, analyzing electrical circuits, and designing robust communication 
         or transportation networks can be solved by known graph algorithms.

      3. Sometimes, though, one is confronted with a problem which does not correspond 
         to any previously-solved problem.  In this case, it may be necessary to develop 
         an algorithm to solve the problem from scratch.

      4. Or, the problem may be a familiar problem for which no good algorithm
         is known - e.g. it may be an instance of an NP-complete problem.  In
         this case, we may need to develop an algorithm that develops an
         acceptable, though perhaps not optimal solution.

         Example: If we have a problem to solve that is equivalent to the
         traveling salesman problem, we will not be able to find a practical
         algorithm that gives us a guaranteed optimal solution; but we may be
         able to develop an algorithm that gives us a solution that is close
         enough to optimal for the cases we are interested in.

   B. We now consider a number of strategies that can be used to tackle a
      problem which does not already have a known algorithmic solution.

      1. These are not solutions to a problem, but strategies to explore
         when trying to find a solution.

      2. Many of the "standard" algorithms that we have learned were first
         discovered by someone who applied one of the strategies to the
         problem in the first place!

   C. For each strategy, we will consider one or more examples of algorithms
      that utilize that strategy.  We will see that algorithms we have already learned 
      exemplify the strategy we are considering, and we will also consider some new 
      algorithm.  In all cases, though, the goal here is to understand the design 
      strategy behind the algorithm, not just the algorithm itself.
  
II. Brute Force
--  ----- -----

   A. Given the sheer speed of a computer, it is tempting to try to solve a
      problem by brute force - e.g. trying all the possibilities.
      
      1. We saw an example of this when we first looked at algorithm anaylsis.
         Remember the maximum sum problem?  Our first attempt at a solution was
         the brute force solution.

        int naiveMaxSum(int a[], int n)
        /* Naive solution to the maximum subvector sum problem */
        {
            int maxSum = 0;

            for (int i = 0; i < n; i ++)
                for (int j = 0; j < n; j ++)
                {
                    int thisSum = 0;

                    for (int k = i; k <= j; k ++)
                        thisSum += a[k];

                    if (thisSum > maxSum)               
                        maxSum = thisSum;
                }

            return maxSum;
        }
             PROJECT
             
      2. Complexity of this solution?
      
         ASK
         
         theta(N^3)
         
      3. As you recall, we developed a series of better solutions, culminating
         in an theta(N) solution - which we argued is inherently the lower limit
         because any solution must look at each element of the vector at least
         once.
         
   B. Of course, often we will be able to find a better solution to the problem
      than sheer brute force - as was the case with the max sum problem.
      
   C. However, this won't always be the case.  There will be some problems for
      which brute force is the only option.
      
      Examples?
      
      ASK
      
      1. Searching an unordered list, or one that is ordered on some basis
         other than the order of the search key.
          
         The only option is the brute force one of looking at every item.
         
      2. Problems like the traveling salesman - if we must have the absolute
         best solution.

III. Greedy Algorithms
---  ------ ----------

   A. Many problems have the general form "for a given set of data, what is the 
      best way to ____?".   

      1. Examples we have considered thus far:

         ASK

         a. Shortest path problem in a graph

         b. Minimal cost spanning tree of a graph

      2. For such a problem, what we are seeking is a GLOBAL OPTIMUM -
         i.e. the best overall way to solve the problem.  E.g. for a minimum
         cost spanning tree, we want to find the spanning tree having the lowest
         overall cost, though it may include some "expensive" individual edges.

      3. One way to solve such a problem would be exhaustive search - create
         all the possible solutions, and then choose the cheapest one.  
         Unfortunately, such an approach generally has exponential cost.

   B. The greedy strategy goes like this: build up an overall solution one step
      at a time, by making a series of LOCALLY OPTIMAL choices.

      1. Example: Djikstra's shortest path algorithm builds up the list of
         shortest paths one node at a time by, at each step, choosing the
         not yet known node that has the shortest known path to the starting
         vertex.

      2. Example: Kruskal's minimum cost spanning tree algorithm builds up
         the tree one edge at a time by, at each step, adding to the tree
         the lowest cost edge which does not introduce a cycle.

   C. A good - and historically important - example of a greedy algorithm is 
      the Huffman algorithm.  We will now look at it, both as an algorithm that 
      is interesting in its own right, and as an example of the greedy strategy.

      1. One area of considerable interest in many applications is DATA 
         COMPRESSION - reducing the number of bits required to store a given
         body of data.  We consider one approach here, based on weight-balanced
         binary trees, and utilizing a greedy algorithm that produces an
         optimal solution.

      2. Suppose you were given the task of storing messages comprised of the
         7 letters A-G plus space (just to keep things simple.)  In the absence 
         of any information about their relative frequency of use, the best you 
         could do would be to use a three bit code - e.g.
   
         000        = space
         001 .. 111 = A .. G
   
      3. However, suppose you were given the following frequency of usage data.
         Out of every 100 characters, it is expected that:
   
           10 are A's           Note: these data are contrived!
           10 are B's
            5 are C's
            5 are D's
           30 are E's
            5 are F's
            5 are G's
           30 are spaces
   
         a. Using the three bit code we just considered, a typical message of
            length 100 would use 300 bits.
    
         b. Suppose, however, we used the following variable-length code instead:
  
            A     = 000         NOTE: No shorter code can be a prefix of
            B     = 001               any longer code.  Thus, we cannot
            C     = 0100              use codes like 00 or 01 - if we saw
            D     = 0101              these bits, we wouldn't know if they
            E     = 10                were a character in their own right or
            F     = 0110              part of the code for A/B or C/D.
            G     = 0111
            space = 11
   
            A message of length 100 with typical distribution would now need:
   
            (10 * 3) + (10 * 3) + (5 * 4) + (5 * 4) + (2 * 30) + (5 * 4) + 
             (5 * 4) + (30 * 2) = 260 bits = a savings of about 13%

      4. A variable length code can be represented by a decode tree, with
         external nodes representing characters and internal nodes representing
         a decision point at a single bit of the message - e.g.

                                    ( first bit)
                                / 0               \ 1
                         (2nd bit)              (2nd bit)
                        / 0     \ 1             / 0     \ 1
                  (3rd bit)   (3rd bit)       [E]     [space]
                  / 0  \ 1   / 0       \ 1
                [A]   [B]   (4th bit)  (4th bit)
                           / 0    \ 1   / 0   \ 1
                         [C]     [D]   [F]    [G]

         The optimum such tree is the one having the smallest weighted external path 
         length - e.g. sum of the levels of the leaves times their weights.
   
      5. An algorithm for computing such a weight-balanced code tree is the 
         Huffman algorithm, discussed in the book.

         a. Basic method: we work with a list of partial trees.

            i. Initially, the list contains one partial tree for each character.

           ii. At each iteration, we choose the two partial trees of least weight and
               construct a new tree consisting of an internal node plus these two as 
               its children.  We put this new tree back on the list, with weight equal 
               to the sum of its children's weights.

          iii. Since each step reduces the length of the list by 1 (two
               partial trees removed and one put back on), after n-1
               iterations we have a list consisting of a single node, which
               is our decode tree.

         b. Example: For the above data.  

            Initial list:        A    B    C    D    E    F    G   space
                                .10  .10  .05  .05  .30  .05  .05  .30
                                / \  / \  / \  / \  / \  / \  / \  / \

            Step 1 - remove C, D - and add new node:

                                 ()   A    B    E    F    G   space
                                .10  .10  .10  .30  .05  .05  .30
                                / \  / \  / \  / \  / \  / \  / \
                                C D

            Step 2 - remove F, G - and add new node:

                                 ()    ()   A    B    E   space
                                .10   .10  .10  .10  .30  .30
                                / \   / \  / \  / \  / \  / \
                                F G   C D

            Step 3 - remove A, B - and add new node:

                                 ()    ()   ()   E   space
                                .20   .10  .10  .30  .30
                                / \   / \  / \  / \  / \
                                A B   F G  C D

            Step 4 - remove two partial trees - and add new node:

                                 ()        ()    E   space
                                .20        .20   .30  .30
                                / \        / \   / \  / \
                              ()   ()      A B  
                             / \   / \
                             C D   F G

            Step 5 - remove two partial trees - and add new node:

                                 ()       E   space
                                .40      .30   .30
                                / \      / \   / \
                              ()   ()
                             / \   / \
                             A B ()   ()
                                 / \  / \
                                 C D  F G

            Step 6 - remove E, space - and add new node:

                                ()               ()
                                .60             .40
                                / \             / \ 
                                E  space      ()   ()
                                             / \   / \
                                             A B ()   ()
                                                 / \  / \
                                                 C D  F G
            Step 7 - construct final tree:

                                    ()
                                   1.00
                                  /    \
                                 ()     ()
                                / \     / \
                              ()   ()   E space
                             / \   / \
                             A B ()   ()
                                 / \  / \
                                 C D  F G

         c. Analysis:

            i. Constructing the initial list is theta(n).

           ii. Transforming to a tree involves n-1 (= theta(n)) iterations.  On each 
               iteration, we scan the entire list to find the two partial trees of 
               least weight = theta(n) - so this process, using the simplest mechanism 
               for storing the list of partial trees is theta(n^2).  

          iii. Printing the tree is theta(n).

           iv. Overall is therefore theta(n^2).  However, we could reduce time to
               theta(n log n) by using a more sophisticated data structure for
               the "list" of partial trees - e.g. a heap based on weight.
               (But given the small size of a typical alphabet, the theta(n^2)
               algorithm may actually be faster.)

      6. We have applied this technique to individual characters in an alphabet.
         It could also be profitably applied to larger units - e.g. we might
         choose to have a single code for frequently occurring words (such as
         "the") or sequences of letters within words (such as "th" or "ing").

      7. The Huffman algorithm exemplifies the greedy algorithm strategy,
         because at each step we choose the two lowest weight subtrees to
         combine into a new subtree, thus increasing the code length for
         each of the characters in the subtrees by one.  We keep our cost
         down by increasing the code length of the lowest frequency subtrees.

   D. A significant limitation of the greedy strategy is that, for some problems, a 
      greedy algorithm fails to deliver a globally optimal solution.

      1. For the examples we have looked at this far (shortest path,
         minimum cost spanning tree, shortest job first scheduling, and
         the Huffman algorithm), the greedy algorithm actually produces a
         result that can be shown to be globally optimal - i.e. it finds the
         best possible solution.

      2. For other problems, however, finding the globally optimal solution
         may require a step that is not locally optimal.  A simple example of
         this is finding one's way through a maze.

         a. A greedy algorithm for finding one's way through a maze is
            as follows: never go back to a square you've already visited
            unless you have no other choice; where two or non-backtracking
            moves are possible, choose the one that moves you closer to
            the goal.

         b. An example where this greedy algorithm finds the best path:
            (S = start, G = goal)

                +-----------------------+
                |                       |
                |   +---+-----------+   |
                |   |///|           |   |
                |   +---+   |   |   |   |
                |     S     | G |   |   |
                |-----------+---+       |
                |///////////////|       |
                +-----------+---+-------+
                
         c. An example where this greedy algorithm fails to find the best
            path, because a move away from the goal (not locally optimal)
            is needed to find the best (globally optimal) path.

                +-----------------------+
                |                       |
                |   +---+---+-------+   |
                |           |       |   |
                |-------+   |       |   |
                |     S     | G |   |   |
                |   --------+   |   |   |
                |               |       |
                +---------------+-------+

    E. As it turns out, it is frequently the case that a problem for which a
       greedy algorithm fails to find the best solution may be one for which 
       finding the best solution inherently requires exponential effort.  In 
       such cases, a greedy algorithm may still be be a useful approach to 
       finding a solution that is generally close enough - given that an 
       algorithm for finding the optimal solution may not be practical (e.g. it
       may be NP-complete) or an algorithmic solution may not exist at all.

       1. A good example of such a problem is the bin packing problem.

         a. The problem originates in the way the post office handles   
            packages: 

            i. The post office uses large cloth bins which are filled with
               packages and then loaded on a plane or a truck.  (Perhaps
               you've seen one at a PO.)
' 
           ii. The problem is this: given a supply of bins of some fixed 
               capacity, and packages of varying sizes, find a way to put the 
               packages in the bins in such a way as to use the fewest 
               possible bins.
               
          iii. To simplify our discussion, we will simplify the problem in two
               ways:
               
               - We will assume that the size of each package can be
                 represented by a single number (i.e. we will not consider
                 issues of shape - only overall volume).
                 
               - We will normalize the sizes to the capacity of the bin, so
                 that a bin will be considered to have a capacity of 1, and
                 the size of each package wil be represented as some fraction
                 of the bin capacity (e.g. 0.3).  We will assume that the bin
                 can hold any number of packages for which (sum of size) <= 1.
                 
           iv. Although we couch the problem in terms of packing bins with
               packages, similar problems arise in other areas - e.g. allocating
               memory using operator new (which satisfies requests by carving
               off smaller pieces from large blocks allocated by the operating
               system, or allocating space for files on disk, when holes are
               created by the deletion of other files.
                 
         b. The problem actually comes in two versions: the online version and
            the offline version.
            
            i. In the online version, a decision about where to place each
               package must be made before the next package is seen.  This
               would correspond to a situation like the following:
               
                          Wall with small window in it
                                        
                                        |
                                O       |
                                |       
                              --+--     _
                               / \      |
                                        |   Customers hand packages
                         Clerk and bins |   to clerk one at a time
                         
               The clerk must place each package in a bin as it is handed
               through the window, before getting to see the next package.
               
           ii. In the offline version, it is possible to look at the entire list of 
               packages before making a decision about where to place each one.
              
         c. It is easy to show that there cannot be an algorithm that always
            finds the optimal packing for the online version of the problem.
            
            Suppose such an algorithm exists, and is asked to pack a total of
            four packages, using the minimum possible number of bins.
            Suppose the first two packages have sizes 0.45 and 0.45.  Into
            which bin should the algorithm place the second package?
            
            It turns out that the answer depends on the size of the next two
            packages, which the online version is not allowed to know until a
            decision has been made about the second package.
               
            i. If the next two packages are size 0.55 and 0.55, then the
               optimal choice would be to place the second package in an
               empty bin.  This would yield a final packing using just two bins
               
               Bin 1: First package (0.45) + Third package (0.55)
               Bin 2: Second package (0.45) + Fourth package (0.55)
               
               However if the second package is placed in the same bin as
               the first, the final packing would require three bins:
               
               Bin 1: First package (0.45) + Second package (0.45)
               Bin 2: Third package (0.55)
               Bin 3: Fourth package (0.55)
               
           ii. If the next two packages are size 0.60 and 0.60, then the
               optimal choice would be to place the second package in the
               same bin as the first.  This would yield a final packing using
               three bins:
               
               Bin 1: First package (0.45) + Second package (0.45)
               Bin 2: Third package (0.60)
               Bin 3: Fourth package (0.60)
               
               However, if the second package is placed in an empty bin, the
               final packing would require four bins:
               
               Bin 1: First package (0.45)
               Bin 2: Second package (0.45)
               Bin 3: Third package (0.60)
               Bin4 : Fourth package (0.60)
               
            Since either choice made by the algorithm for the second package
            could turn out to be wrong in some case, there cannot be an
            algorithm that always makes the right choice.
           
         d. For the offline version of the bin packing problem, it is possible
            to find an optimal packing.  (Consider all possibilities and pick
            the best, which takes time exponential in the number of packages.)
            
            It turns out that offline bin-packing has been proved to be
            NP-complete.  Thus, if the commonly held view of the relationship
            between P and NP is true, then ANY offline algorithm that always
            discovers an optimal solution to the bin-packing problem must
            take exponential time.
              
         e. Since there is not a practical algorithmic solution to either form
            of the bin-packing problem, it is worth considering whether a greedy
            algorithm might yield a solution that is close enough to optimal.
         
      2. We consider first the online version of the problem
         
         a. There are three greedy strategies we might consider.
         
            i. One greedy strategy, called NEXT-FIT, goes like this: if the
               package we are dealing with would fit in the same bin as the
               previously packed package, then put it there - else start a new
               bin.  
            
               (Note that, once we start packing a new bin, we never go back
                and put any packages in previous bins.  This might be
                advantageous in some applications, because once a bin is 
                declared packed, it can be moved out the door and loaded on the
                truck or whatever.)
             
           ii. A second greedy strategy, called FIRST-FIT, goes like this: as 
               we pack each package, look at each of the bins in turn, and place
               it in the first bin we find where it fits.  Start a new bin only
               if we cannot fit the package in any of the others.
            
          iii. A third greedy strategy, called BEST-FIT, goes like this: as we
               pack each package, look at each of the bins in turn, and place it
               it the bin where it fits best - i.e. leaves the least unused 
               space. Start a new bin only if we cannot fit the package in any
               of the others.
            
         b. To see the difference between these strategies, suppose we are
            trying to pack a package of size 0.2 under the following scenario
            (where the last package packed was placed in bin 3)
            
            Bin 1: Currently contains 0.7 
            Bin 2: Currently contains 0.8
            Bin 3: Currently contains 0.3
            
            Next fit would put the package in bin 3
            First fit would put the package in bin 1
            Best fit would put the package in bin 2
            
         c. Which strategy is best?  
         
            i. Next fit will never yield an overall result that is better than
               first fit or best fit.  However, it is the simplest to implement,
               and is the fastest running.  (Each choice is theta(1), since
               only the most recently used bin has to be examined, as opposed to
               theta(n) for the other two.)  Also, once next fit declares a bin
               full, it can never be considered again, whereas with the other
               two algorithms no bins can be "shipped" until all the packages
               have been placed.
               
           ii. It turns out that there are sets of data for which first fit
               gives the optimal result, and the others don't; and there are 
               other sets of data where best fit gives the optimal result, and
               the others don't.
           
               Example: sequence of sizes 0.3 0.8 0.1 0.6 0.2
               
                        NF: Bin 1: 0.3
                            Bin 2: 0.8 0.1
                            Bin 3: 0.6 0.2
                            
                        FF: Bin 1: 0.3 0.1 0.6
                            Bin 2: 0.8 0.2
                            
                        BF: Bin 1: 0.3 0.6
                            Bin 2: 0.8 0.1
                            Bin 3: 0.2
                            
               Example: sequence of sizes 0.3 0.8 0.2 0.7
               
                        NF: Bin 1: 0.3
                            Bin 2: 0.8 0.2
                            Bin 3: 0.7
                            
                        FF: Bin 1: 0.3 0.2
                            Bin 2: 0.8
                            Bin 3: 0.7
                            
                        BF: Bin 1: 0.3 0.7
                            Bin 2: 0.8 0.2
                            
         d. It is possible to analyze the behavior of each of these strategies, and
            to show that:
            
            i. Next fit is guaranteed to find a result that requires no
               more than twice the optimal number of bins (and there is some
               data that will force it to use very close to this number.)
               
           ii. First fit is guaranteed to find a result that requires no
               more than 17/10 times the optimal number of bins (and again there
               is some data that will force it to use very close to this
               number.)
               
          iii. Best fit is also guaranteed to find a result that requires no
               more than 17/10 times the optimal number of bins (and again there
               is some data that will force it to use very close to this
               number.)
               
      3. For the offline version of the problem, a greedy algorithm is still
         of interest, even though it cannot guarantee optimal results, since
         the problem is NP-complete.
         
         a. The offline versions of the greedy algorithms are derived from the
            online versions based on the observation that we will generally
            get better results by packing the bigger items first, and then
            fitting the smaller items into the remaining spaces.
            
         b. An offline version of the first fit algorithm is called FIRST FIT
            DECREASING.  It considers packages in decreasing order of size,
            beginning with the largest.  Each is placed using first-fit.
        
            Example: earlier we showed that the sequence 0.3 0.8 0.2 0.7
                     requires three bins if packed using an online first fit
                     algorithm.  If we use first fit decreasing offline, we
                     consider the packages in the order 0.8, 0.7, 0.3, 0.2, and
                     pack them as follows:
                     
                     Bin 1: 0.8 0.2
                     Bin 2: 0.7 0.3
               
            It is possible to prove that if M is the optimal number of bins needed to 
            pack some list of items, then first fit decreasing never uses more than
            11/9 M + 4 bins to pack the same items.
            
         c. It is also possible to derive offline versions of next fit and
            best fit, which we won't discuss.
            
IV. Divide-And-Conquer Algorithms
--  ------------------ ----------

   A. An algorithm-design strategy behind several of the algorithms we have seen
      is divide and conquer.  
      
      1. The basic strategy is this:
      
         partition the initial problem into two or more smaller subproblems
         solve each subproblem (recursively)
         stitch the solutions to the subproblems together to yield a solution to
          the original problem
       
      2. Examples we have seen?
      
         ASK
         
         a. One of the solutions to the maximal vector subsequence sum problem
            we discuss when we introduced algorithm analysis
         b. Fibonacci Numbers
         c. Towers of Hanoi
         d. Traversal of a binary tree
         e. Quick Sort
         f. Merge Sort
         
   B. Divide and conquer is often a useful strategy for finding good algorithms.
      Let's look at another example:
      
      1. As you know, standard integer representations are limited by the
         number of bits used to represent an integer (64 on modern machines). 
         What happens if we need to represent integers larger than this?
         
         a. The typical solution is to use an array of int (32-bit integers), 
            treated as digits base 2^32.
            
            Example: a 100 decimal digit integer a might be represented by an
                     array of 10 32-bit binary integers as
                     
       288     256     224     192     160     128     96      64      32      0
  a * 2 + a * 2 + a * 2 + a * 2 + a * 2 + a * 2 + a * 2 + a * 2 + a * 2 + a * 2
   9       8       7       6       6       4       3       2       1       0
 
            In general, we can measure the size of such a representation by
            the size of the array - e.g. we would consider the size of the
            above example to be 10.
               
         b. Now suppose we had two large integers (a and b) each represented 
            using an array of n 32-bit integers Let's consider the complexity of
            various arithmetic operations.
            
            i. Addition: We will require n additions - i.e. sum  = a  + b;
                                                               0    0    0
               sum = a  + b  + carry from sum,  etc.
                  1   1    1                 0
                  
               - so the operation is theta(n)

           ii. Subtraction is similar, and is also theta(n).
          
          iii. However, for multiplication, it looks like we will require theta(n^2) 
               multiplications, since
                
                     288     256      224              288     256     224   
               (a * 2 + a * 2 +  a * 2 + ... ) * (b * 2 + a * 2 + a * 2 + ...) =
                 9       8        7                9       8       7
                        576                     544                            512
               a * b * 2 + (a * b + a  * b ) * 2 + (a * b + a * b + a * b ) * 2 + ...
                9   9        9   8   8    9          9   7   8   8   7   9
            
               - so each of the n coefficients in a are multiplied by each of the n
                 coefficients in b.
                 
      2. We could consider a divide and conquer approach
      
         a. divide the arrays representing each number in half (which we call 
            A  , A  and B , B  below).  Then the product becomes
            
             1    0      1   0
                16n          16n
            (A 2    + A )(B 2  + B ) =
              1        0   1      0
                   32n                  16n             0
            A B * 2   + (A B  + A B) * 2    + (A B ) * 2
             1 1          1 0    0 1            0 0
             
         b. Then we can continue by calculating each of the A's and B's by
            dividing the arrays in two until we get to arrays having a
            single element, at which point ordinary multiplication works.

         c. However, this hasn't reduced the total effort = each of the products
            after the first division only requires n^2/4 multiplications, but 
            since there are 4 of them the overall computation is still theta(n^2).
            
         d. At this point, though, we could take advantage of an observation
            first made by Gauss in a different context.  Observe that
            
            A B + A B = (A + A )(B + B) - A B + A B
             1 0   0 1    1   0   1   0    1 1   0 0
                             
            Since we need to calculate A B and A B anyway, we can use this to
                                        1 1     0 0
            replace the original four products by three products and a subtraction.
            
         e. That means, at each stage in the divide and conquer, we only need
            to create 3 subproblems with 1/4 the effort, rather than 4.  And
            that benefit compounds itself at each stage.  (We will look
            at the effect quantitatively in a bit)
        
   C. An algorithmic pattern that is very similar to divide and conquer is 
      decrease and conquer.
      
      1. In this pattern, we partition a problem into some number of
         subproblems, but then discard all but one of these subproblems and
         solve the original problem by solving this one.
         
         (Note that the term "divide and conquer" is usually not used for 
         algorithms that discard all but one of the subproblems and then solve 
         the original problem by solving it.) 
         
      2. It turns out that many search strategies are actually examples of this 
         pattern.

         a. Example: binary search of an ordered array - we compare the search
            target to the middle key of the array.  Based on the outcome of
            this comparison, we continue our search in either the first or
            last half of the array, ignoring the other half.
            
         b. Example: search in any sort of m-way search tree (binary, 2-3-4,
            or B-Tree) - we compare the search target to the keys stored in
            a node, and then continue our search in one of its children,
            ignoring the others.
            
      3. Moreover, maintenance of an m-way search tree is a form of decrease
         and conquer.  
         
         a. For example, when we insert into a binary search tree, at each level
            we use comparison of the key we are inserting with the key at the
            current node to decide whether to insert into its left or right
            subtree.
            
         b. Deletion is similar.
         
      4. Let's look at another example.  Suppose we have an unordered list of n 
         numbers, and want to find the k-th smallest member.
         
         a. If we wanted the smallest (or the nth smallest - which would be
            the largest), there is a straightforward theta(n) algorithm.
            
         b. For arbitrary k, it would be possible to sort the list and then take 
            element in position k of the result.  However, this would require
            theta(n log n) time because of the sort.
            
         c. Can we do this for arbitrary k in just theta(n) time?  It turns out
            the answer is yes.
            
            i. Choose an partitioning element (perhaps at random or using
               some arbitrary scheme such as first element).  Partition
               the original list into two sublists, one containing all the
               elements less than or equal to the partitioning element and one 
               containing all the elements greater.  While doing this, keep
               track of the count of elements (c) in the list containing the
               smaller elements.
               
           ii. Now, if c >= k, it means that the element we want is also
               the kth smallest element in the first sublist.  If c > k + 1, 
               the element we want is the (k - c - 1) smallest element in
               the second sublist.  (Of course if c = k + 1 the partitioning 
               element is the one we want, but this would be rare).
               
          iii. What is the complexity of this process?  Since, on the average,
               partitioning with a random pivot like this produces sublists
               of roughly equal length, the first partioning would require
               looking at all elements, but the second would look at only n/2,
               the third only n/4 ...
               
           iv. Therefore, the total number of elements examined is
               n + n / 2 + n / 4 + ... + 1 = 2n.  So we now have an theta(n)
               algorithm!
               
   D. Analysis of Divide and Conquer Algorithms
         
      1. Recursive algorithms of the sort that arise in connection with divide
         (or decrease) and conquery can be hard to analyze.  In the case of these
         algorithms, there is a general approach that works for many (but
         not all) divide/decrease and conquer algorithms.
     
         a. Let T(n) = the time it takes the algorithm to solve a problem of
            size n.  (Assume T(n) = O(1) for sufficiently small n.)
         
         b. Assume that, for the recursive case, the algorithm solves a problem
            of size n by partioning it into a subproblems of size n/b, where a
            and b are integer constants.
        
            E.g. for the average case of Quick Sort a is 2 and b is 2 - we
                 partition a problem of size n into two subproblems of size n / 2.
             
                 the same is true for Merge Sort
             
         c. Suppose, further, that the time for partitioning a problem of size n
            into subproblems is f(n), and the time for stitching the solutions
            together after the subproblems have been solved is g(n).
       
            E.g. for Quick Sort, f(n) is O(n) and g(n) is O(1).  For
                 Merge Sort, f(n) is O(1) and g(n) is O(n).
                          
         d. Then the time to solve a problem of size n is given by the recurrence
     
            T(n) = time to partition + time to solve subproblems + time to stitch
        
                 = f(n) + aT(n/b) + g(n).
             
                 = aT(n/b) + (f(n) + g(n))
             
         e. There is a general rule for solving recurrences of this form (which we
            state here without proof)
     
            If a recurrence is of the form 
                                                k
               T(n) = aT(ceiling(n/b)) + theta(n ) - where and b and k are constants,
                                                     with a > 0, b > 1, and k >= 0
           
            Then
                             (log a)
                                 b           k
               T(N) = theta(N      ) if a > b
               
                             k               k
                      theta(N log N) if a = b
                      
                             k               k
                      theta(N )      if a < b
                      
         f. This formula is known as the "master theorem"
            
      2. Examples of applying this:
        
         a. Traversal of a binary tree: 
        
            - We visit the root (which we'll assume is O(1)), and traverse
              each of the subtrees in some order
            - On the average, each subtree has almost N/2 nodes
        
            Recurrence is T(N) = O(1) + 2 T(N/2), so a = 2, b = 2, k = 0
        
            First case applies:  T(N) = theta(N) - which is, of course, what we would
            expect since we visit each node exactly once
           
         b. Multiplication of big integers as discussed above - here's a case where the 
            formula really helps
     
            - At each step, We split into two sublists of length N / 2 and perform 
              three multiplications.  Splitting takes O(1) time but stitching the
              result together requires O(N) additions to handle carry, so 

            Recurrence is T(N) = O(1) + 3 T(N/2), so a = 3, b = 2, k = 1
            
                                               log  3
                                              2                1.58
            First case applies:  T(N) = theta(N     ) = theta(N    ) - a significant 
            improvement of the theta(n^2) algorithm we considered at first.
           
         c. Merge Sort:
        
            - We split into two sublists of length N/2, which takes O(1) time,
              sort them, then merge them together (which takes O(N) time) 
     
            Recurrence is T(N) = 2 T(N/2) + O(N), so a = 2, b = 2, k = 1
        
            Second case applies: T(N) = theta(N log N)
        
            (Quick sort is similar, excpet the split is O(N) and the stitch is
             O(1), but the recurrence equation and hence the solution is the same.)

         d. Binary search
     
            At each step, we create two subproblems, but only need to solve one.
            Since both splitting and stitching are O(1), we get the recurrence: 
            T(N) = T(N/2)+O(1), so a = 1, b = 2, and k = 0
            
            Second case applies: T(n) = theta(log N)
       
         e. k-selection.
     
            At each step, we create two subproblems in O(N) time, but only need
            to solve one, so recurrence is T(N) = T(N/2) + O(N), so a = 1,
            b = 2, and k = 1.
        
            Third case applies: T(n) = theta(N)
                 
      3. Note that the master theorem does not apply to all divide and conquer
         algorithms, because it requires a, b, and k to be constants.
         
         For example, it does not apply to the recursive computation of the Fibonacci
         numbers using the definition Fib(n) = Fib(n-1) + Fib(n-2)[ with base cases
         n = 1 and n = 2]
         
         a. The recurrence is
     
            T(n) = T(n-1) + T(n-2)
        
            (Note that, by inspection, T(n) is O(Fib(n)))
        
         b. Here, if we wished to attempt to apply the master theorem, we could
            argue that a = 2 and k = 0 (the partition/stitch time is constant.)
            However, b is n / (n - 1), which while always greater than 1 becomes
            increasingly close to 1 as n increases, so the master theorem does not
            apply.

         c. In fact, the recursive divide and conquer algorithm to calculate Fibonacci 
            numbers is impractical for n of any significant size, so the analysis is 
            not useful in any case.  Fortunately, there is a linear time algorithm, 
            as we shall see when we talk about dynamic programming!
         
V. Dynamic Programming
-  ------- -----------

   A. In the last section, we were reminded that sometimes a recursive divide
      and conquer algorithm can have very poor performance.
      
      1. A good example of this is Fibonacci numbers.  To see why, consider the
        tree generated by the computation of Fib(6):
      
                                                Fib(6)
                                         /              \
                                Fib(5)                          Fib(4)
                              /        \                     /          \
                        Fib(4)          Fib(3)          Fib(3)        Fib(2)
                       /      \        /      \        /     \
                   Fib(3)    Fib(2)  Fib(2)  Fib(1)  Fib(2)  Fib(1)
                  /    \
                Fib(2)  Fib(1)
                                                  
          Observe that we do certain computations many times - e.g. we compute
      
                Fib(5) once  
                Fib(4) twice 
                Fib(3) thrice
                Fib(2) 5 times 
                Fib(1) 3 times
       
      2. A much more efficient approach is to save previously computed results
         and re-use them when needed, instead of repeating the computation.  
         This would yield the following tree for Fib(6), which would require 
         linear time.  (Cases marked with an asterisk re-use previously computed
         results instead of re-doing them - note that each Fibonacci number 
         value from 1 to 6 is computed just once.)
      
                                        Fib(6)
                                         /              \
                                Fib(5)                          Fib(4) *
                              /        \ 
                        Fib(4)          Fib(3) *
                       /      \ 
                   Fib(3)    Fib(2) *
                  /    \
                Fib(2)  Fib(1)
        
       3. The following linear time algorithm incorporates this insight:
       
          int fibAux(int n, int saved [])
          {
              if (saved[n-1] == -1)
                  saved[n-1] = fibAux(n-1, saved) + fibAux(n-2, saved);
              return saved[n-1];
          }
          
          int fib(int n)
          {
              // Use an array to save previously computed values.  An
              // initial value of -1 indicates we have not yet computed
              // the value and so need to do so.
              
              int saved[n];
              for (int i = 0; i < n; i ++)
                  saved[i] = -1;
              saved[0] = saved[1] = 1;  // By definition
                  
              return fibAux(n, saved);
          }
                 
       4. A simpler algorithm that builds up the solution from small values
          is the following.
       
          int fib(int n)
          {
              if (n <= 2)
                  return 1;
              int last = 1;
              int nextToLast = 1;
              int current = 1;
              for (int i = 3; i <= n; i ++)
              {
                  current = nextToLast + last;
                  nextToLast = last;
                  last = current;
              }
              return current;
          }     
       
   B. The strategy we just used to improve the calculation of the Fibonacci
      numbers is an illustration of a general algorithm design technique
      called Dynamic Programming.
      
      In Dynamic Programming, we use a table of previously calculated results
      to assist us in deriving new results, rather than calculating them
      from scratch.
      
   C. An example developed in the book: Longest Common Subsequence (LCS).
   
      1. Recall the following from the book discussion:
      
         a. A subsequence of sequence is a sequence of elements that occur in the same 
            order somewhere in the sequence - not necessarily without gaps between
            elements.
         
            Example: For the string ABC, the subsequences are
         
            <empty>, A, B, C, AB, AC, BC, and ABC
         
         b. A common subsequence of two sequences is a subsequence of both sequences
         
            Example: For the strings ABC and DADCD, the common subsequences are
            
            <empty>, A, C, and AC - since they are also subsequences of DADCD, while 
            the subsequences of ABC that contain B are not subsequences of DADCD
            
        c. The longest common subsequence (LCS) is the subsequence that is the longest
          
           i. In the example we have been using, the LCS is AC
           
          ii. It may be for some pairs of strings that the LCS is of length 0 - e.g.
              the LCS of ABC and DEF is <empty>
              
         iii. It may be that the LCS of two strings is not unique - i.e. there may be
              two or more different subsequences that both have the same maximal length.
              
              Example: both AB and AC are LCSs of ABC and ACB
              
        d. As the text notes, LCS if useful in genetics for comparing DNA strings
           (sequences of the bases A, C, G, and T) and in other areas as well.
           
      2. A brute force algorithm would compute all the subsequences of the shorter 
         string and then test each to see if it is a subsequence of the longer - an 
         approach that is more than exponential in the length of the shorter string, 
         and hence usually not practical.
         
      3. The book discusses how dynamic programming might be used to develop an algorithm 
         whose complexity is proportional to the product of the lengths of the two 
         strings - i.e. theta(n^2) if the two strings have the same length.  The basic 
         idea is to make use of a table with rows corresponding to the characters of one 
         string, and columns corresponding to characters of the second string (and with 
         an extra row and column at the start).  The entries in the table represent the 
         length of the LCS ending with that position in each of the two strings, with 
         the bottom rightmost entry representing the length of the overall LCS.

         For the example in the book: LCS of the DNA sequences GTTCCTAATA and 
         CGATAATTGAGA the initial table would look like this (dummy rows and columns 
         filled in with 0)
         
         PROJECT

                 C  G  A  T  A  A  T  T  G  A  G  A
             -1  0  1  2  3  4  5  6  7  8  9 10 11
          -1  0  0  0  0  0  0  0  0  0  0  0  0  0
         G 0  0
         T 1  0
         T 2  0
         C 3  0
         C 4  0
         T 5  0
         A 6  0
         A 7  0
         T 8  0
         A 9  9
         
      4. The table is filled in row by row from top to bottom.
      
         a. If an entry corresponds to a place where the two strings agree, the value
            is 1 more than the entry diagonally above it.
            
         b. If an entry corresponds to a place where the two strings disagree, the 
            value is the maximum of the value just to the left and just above it
            
         c. Example - the entry in row 0, column 0 (G, C) is filled in with 0.
            
         d. Example - the entry in row 0, column 1 (G, G) is filled in with 1.
         
         e. The last entry to be filled in - the bottom right one - represents the
            length of the LCS.
            
         PROJECT: Figure 12.2 - page 563
         
     5. The table gives the _length_ of the LCS.  To get the LCS itself, one works
        backwards from the bottom right corner, finding entries in the LCS from last
        to first.
        
        a. If an entry corresponds to a place where the two strings agree, the
           character in question is part of the LCS, and one moves diagonally up and
           to the left.
           
        b. If an entry corresponds to a place where the two strings don't agree, one
           moves either left or up, choosing the bigger of the two or choosing one
           direction arbitrarily if the two are the same.  (Of course, in this case, a
           character is not included in the LCS).
           
           PROJECT same figure - note trace of finding CTAATA
           
        c. Sometimes, a pair of sequences will have two or more LCSs of the same 
           length.  This will be reflected in a situation in the table where the choice 
           of moving up or left is arbitrary because of a tie.  
           
           Example: the example in the book actually has two LCSs of length 6.  The
           second can be found by making the choice to go up rather than left in row
           8, column 10.
           
           PROJECT - same figure, but showing trace of finding GTTTAA
           
           ASK - are there others?  (Yes - first derivation, but go up rather than left
           at row 4 column 2, yielding GTAATA 

   D. Another Example: Weight-Balanced Binary Search Trees

      1. Earlier, we talked about strategies for maintaining height-balanced
         binary search trees.  Where the set of keys to be stored in a tree
         is fixed, and we know the relative probabilities of accessing the
         different keys, it is possible to build a WEIGHT-BALANCED tree in
         which the average cost of tree accesses is minimized.

         a. For example, suppose we had to build a binary search tree consisting
            of the following C/C++ reserved words.  Suppose further that we had
            data available to us as to the relative frequency of usage of each
            (expressed as a percentage of all uses of words in the group), as 
            shown:

                 break  55%             Note: The numbers are contrived to make a point!
                 case   25%             In no way do they represent actual frequencies
                 for    11%             for typical C/C++ code! 
                 if      5%             
                 int     2%
                 switch  1%
                 while   1%

         b. Suppose we constructed a height-balanced tree, as shown:

                           if
                       /       \
                     case     switch
                    /   \    /       \
                 break  for int      while

            - 5% of the lookups would access just 1 node (if)
            - 25% + 1% = 26% would access 2 nodes (case, switch)
            - 55% + 11% + 2% + 1% = 69% would access 3 nodes (the rest)

            Therefore, the average number of accesses would be

            (.05 * 1) + (.26 * 2) + (.69 * 3) = 2.64 nodes accessed per lookup

         c. Now suppose, instead, we constructed the following search tree

                   break
                        \
                        case
                            \
                            for
                              \
                               if
                                  \
                                   int
                                     \
                                   switch
                                       \
                                       while

            The average number of nodes visited by lookup is now

            - 55% access 1 node (break)
            - 25% access 2 nodes (case)
            - 11% access 3 nodes (for)
            - 5%  access 4 nodes (if)
            - 2%  access 5 nodes (int)
            - 1%  access 6 nodes (switch)
            - 1%  access 7 nodes (while)

            (.55 * 1) + (.25 * 2) + (.11 * 3) + (.05 * 4) + (.02 * 5) +
            (.01 * 6) + (.01 * 7) = average 1.81 nodes accessed

            This represents over a 30% savings in average lookup time

         d. Interestingly, for the particular distribution of probability
            values we have used, this tree is actually optimal.  To see
            that, consider what would happen if we rotated the tree about
            one of the nodes - e.g. around the root:

                    case
                   /    \
                break     for
                              \
                                if
                                  \
                                   int
                                     \
                                   switch
                                        \
                                       while


            We have now reduced the number of nodes accessed for lookups in
            every case, save 1.  But since break is accessed 55% of the
            the time, the net change in average number of accesses is
            (.55 * +1) + ((1 - .55) * - 1) = .55 - .45 = +.10.  Thus, this
            change makes the performance worse.  The same phenomenon would
            arise with other potential improvements.

         e. In general, weight balancing is an appropriate optimization only
            for static trees - i.e. trees in which the only operations performed
            after initial construction are lookups (no inserts, deletes.)  Such
            search trees are common, though, since programming languages, 
            command  interpreters and the like have lists of reserved or 
            predefined words that need to be searched regularly.  Of course, 
            weight balancing also requires advance knowledge of probability 
            distributions for the accesses.  (For a compiler for a given
            programming language, this might be discovered by analyzing
            frequency of reserved word usage in a sample of "typical" programs.)

      2. We could consider a greedy approach to discovering the optimal
         binary search tree.
         
         a. The basic idea would be to make the key of highest probability
            the root of the tree.  The keys of next highest probability would
            be its children, etc. - subject to the constraints of the tree
            being a binary search tree (e.g. only a key smaller than the
            root of the overall tree could be the root of the left subtree.)
            
         b. Applying this approach to the example we just considered would
            yield an optimal tree.
            
         c. However, the greedy strategy will not always find the optimal tree.
            
         d. However, unlike previous cases where the greedy strategy fails to find the 
            optimal tree, finding the optimal tree does not require exponential time.
            
      3. We now consider a method for finding the optimal binary search tree for a 
         given static set of keys, given an advance knowledge of the probabilities of
         various values being sought, which finds the optimal tree in theta(n^2) time.

         a. The basic idea
         
            i. For an optimal tree containing n keys, if key k is the root, then
               the two subtrees are optimal trees made up of the first k-1 keys and
               the last (n-k)-1 keys. 
               
           ii. We build up a table in with rows describing optimal trees with 1 key, 
               2 keys ... n keys, and columns corresponding possible starting positions
               of the subtree (e.g. the first column corresponds to subtrees that
               start with the first key).
               
               (a) If there are n keys, there will be n rows, with the last describing 
                   the optimal tree that contains all n keys - which is what we want.
                   
               (b) While the first row has n columns, the second row (describing 
                   subtrees containing two keys) has only n-1 columns, since a subtree 
                   that contains two keys cannot start with the last key.  This pattern
                   continues until the last row has only one column, since the subtree
                   it describes must start with the first key.
                   
         b. Filling in the first row of the table is trivial, since there is only
            one possibility in each case for a tree containing only one key.
                   
         c. We then fill in the rest of the table row-by-row, using information
            from the previous row.
                   
            Example: when filling in the entry for an optimal tree containing the
            first four keys, we consider four possibilities:
                   
        key 1                key 2                 key 3                  keys 4
       /   \                 /    \               /     \                /     \
  empty      optimal    optimal    optimal    optima1    optimal    optimal    empty
  subtree    subtree    subtree    subtree    subtree    subtree    subtree    subtree
  containing containing containing containing containing containing containing 
             keys 2-4   key 1      keys 3-4   keys 1-2   key 4      keys 1-3
                   
            Since the costs of the different subtrees have already been calculated,
            we choose the least expensive root, and continue working across, then
              
         d. Of course, we must also allow for the possibility of unsuccessful search.  
            To handle this, we convert our search tree into an EXTENDED TREE
            by adding FAILURE NODES (by convention, drawn as square boxes.)

            Example: our a balanced tree for the seven C++ keywords:

                           if
                       /        \
                     case     switch
                    /   \       /  \
                 break  for    int  while
                 /  \   /  \   /  \   /  \
                []  [] []  [] []  [] []  []

            Each failure node represents a group of keys for which the search would fail 
            - e.g. the leftmost one represents all keys less than break [e.g. a, apple, 
            boolean]; the second all keys between break and case [c, class] etc.
            
            To discover the optimal tree, we need to consider both the probabilities of
            the keys and the probabilities of the various failure nodes - i.e. the
            probability that we will be searching for something that is not in the tree
            and will end up at that node.

      4. To find an optimal tree, we need to define some terms and measures:

         a. We will number the keys 1 .. n
         
         b. Probabilities connected with the various keys
         
            i. Let p  be the probability of searching for key  (1 <= i <= n)
                    i                                        i

           ii. Let q  be the probability of searching for a non-existent key
                    i
               lying between key  and key   .  (Of course q  represents
                                i        i+1               0
               all values less than key , and q  all values greater than key .)
                                       1       n                            n

          iii. Clearly, since we are working with probabilities, the sum of
               all the p's and q's must be 1.

         c. T    is the optimal binary search tree containing key    through key .
             ij                                                  i+1            j
                                 
         d. T   , then, is an empty tree, consisting only of the failure node
             ii 
            lying between key  and key   .
                             i        i+1

         e. We will denote the weight of T   by w  . Clearly,
                                          ij     ij
            the weight of T   is p    + p   + ... + p  + q  + q   + ... q
                           ij     i+1    i+2         j    i    i+1       j

            which is the probability that a search will end up in T   .  The
                                                                   ij
            weight of the empty tree T  , then, is q  - the probability of
                                      ii            i
            the failure node lying between key  and key   .  Note that, for a
                                              i        i+1
            non-empty tree, the weight is simply the probability of the root
            plus the sum of the weights of the subtrees.

         f. We will denote the cost of T   - i.e. the average number of comparisons
                                        ij
            needed by a search that ends in T   by c  .
                                             ij     ij
            c   is calculated as follows:
             ij

            - If T   is empty (consists only of a failure node), then its
                  ij
              cost is zero - i.e. once we get to it, we need do no further comparisons.
        
            - Otherwise, its cost is the weight of its root, plus the sum
              of the weights of its subtrees, plus the sum of the costs of
              its subtrees.

              - The first term represents the fact that search for the key
                at the root costs one comparison.

              - The rationale for including the costs of the subtrees in the
                overall cost should be clear.  To this, we add the WEIGHTS
                of the subtrees to reflect the fact that we must do one
                comparison at the root BEFORE deciding which subtree to go into,
                and the probability that that comparison will lead
                into the subtree is equal to the weight of the subtree.

         g. Clearly, an optimal binary search tree is one whose cost is minimal.
         
         h. We will denote the root of T   by r   .
                                        ij     ij

         i. Example - the balanced tree we considered earlier would be optimal if the 
            probabilities of all keys and failures were equal (i.e. each p and q = 1/15)
            
                           if
                       /        \
                     case     switch
                    /   \       /  \
                 break  for    int  while
                 /  \   /  \   /  \   /  \
                []  [] []  [] []  [] []  []

            i. Cost of external nodes = 0 in each case, and weights of
               external nodes = 1/15 in each case.  So
               
               c   = c   = c   = c   = c   = c   = c   = c   = 0.
                 00   11    22    33    44    55    66    77   
                 
               w   = w   = w   = w   = w   = w   = w   = w   = 1/15.
                 00   11    22    33    44    55    66    77   

           ii. Cost of each tree rooted at a level 3 node (break, for, int, while) = 
               weight of root (1/15) + sum of costs of subtrees (0) + sum of weights of 
               subtrees (2/15) = 3/15.  The weight of each subtree is also 3/15.  So
               
               c   = c   = c   = c   =  3/15
                01    23    45    67

               w   = w   = w   = w   =  3/15
                01    23    45    67

          iii. Cost of each tree rooted at a level 2 node (case, switch) is 1/15 (weight
               of root) plus 2 x 3/15 (costs of two subtrees) + 2 x 3/15 (weights of
               two subtrees) = 13/15, and weight is 1/15 + 2 x 3/15 = 7/15.  So
               
               c   = c   = 13/15
                 03   47

               w   = w   = 7/15
                 03   47

           iv. Cost of overall tree (c   ) =
                                      07
                                      
                Probability of root (4) = p  = 1/15 +
                                           4
                                           
                Weight of left subtree (T  ) = w   = 7/15 +
                                         03     03
                                         
                Weight of right subtree (T  ) = w   = 7/15 +
                                          47     47
                                          
                Cost of left subtree (T  ) = c   = 13/15 +
                                       03     03

                Cost of right subtree (T  ) = c  = 13/15
                                        47     47
                                        
                So total cost is 41/15
                
            v. Weight of overall tree (w  ) = 
                                        07
                                        
                Probability of root = 1/15 +

                Weight of left subtree = 7/15 +
        
                Weight of right subtree = 7/15  =         1
                
                (as expected)

      5. Dynamic programming is used in  an algorithm for finding an optimal tree, given
         a set of values for the p's and q's.  

         a. T   is the OPTIMAL tree including keys i+1 .. j.
             ij
            Therefore, T  is the optimal tree for the whole set of keys,
                        0n
            and is what we want to find.

         b. w   is the WEIGHT of T   .  
             ij                   ij

            - For i = j, w   = q .
                          ij    i

            - For i < j, w   = p    + w         +  w
                          ij    r      i r - 1      r  j
                                 ij       ij         ij
 
         c. c   is the COST of T  .
             ij                 ij

            - For i = j, c   = 0.
                          ij
            - For i < j, c   = w    + w          + w       + c        + c
                          ij    r      i r  - 1     r   j     i r - 1    r   j
                                 ij       ij         ij          ij       ij

                             = w    + c        + c
                                ij     i r - 1    r   j
                                          ij       ij

         d. r   is the ROOT of T   .
             ij                 ij

            - Obviously, r   is undefined if i = j.
                          ij
              (We will record the value as 0 in this case.)

            - If i < j, then the subtrees of T   are T         and T
                                              ij      i r - 1       r   j
                                                         ij          ij
              (Clearly, if T   is optimal then its subtrees must be also.)
                            ij
                            
            - We consider each possible value for r   and then pick the one that
                                                   ij
              yields the lowest value for c   .  Because we build the tree up by
                                           ij
              first considering trees containing 0 keys, then 1, then 2 ... we have
              already calculated the w and c values we need to perform this comparison.
              
            - It turns out that, in exploring possible values for r   , we don't need
                                                                   ij 
              to consider values less than r      or greater than r     , which greatly
                                            i j-1                  i+1 j
              reduces the effort.
                                           
      6. As an example, the operation of the algorithm for four keys is looks like this, 
         if the probabilities are: p = (3/16, 3/16, 1/16, 1/16) 
         and q = (2/16, 3/16, 1/16, 1/16, 1/16).
         
         PROJECT - For convenience the probabilities are multiplied by 16 which doesn't
         affect the correct operation of the algorithm but eliminates a lot of "/16"

         a. The first row represents empty trees, whose weights are simply
            the appropriate "q" value, whose costs are 0, and whose roots
            are undefined.

         b. The second row represents trees containing just one key.
            In each case, the weight is the sum of the weight of the one key
            plus the weights of the two adjacent failure nodes, and the
            cost is the weight of the one key (since the costs of failure
            nodes are zero.)  The root, of course, is the one key.

         c. The third row represents the optimal choice for constructing
            trees of two nodes.  

            i. For example, the first entry represents a tree including keys 1
               and 2 - ie. T  .  The two options would have been to let key 1
                            02
               be the root or key 2 be the root.  Calculating the costs:

               - if key 1 is the root, then the cost is 

                 p  + w   + w   + c   + c  = 3 + 2 + 7 + 0 + 7 = 19
                  1    00    12    00    12

               - if key 2 is the root, then the cost is

                 p  + w   + w   + c   + c  = 3 + 8 + 1 + 8 + 0 = 20
                  2    01    22    01    22 

               Thus, 1 is chosen as r   and the cost of 19 is recorded.
                                     02
           ii. The remaining entries in the row are calculated in the
               same way.  Note that the weights and costs needed to compare
               root choices are always available from previous rows.

          d. Subsequent rows represent optimal trees with 3 and then 4
             keys.  The latter is, of course, the final answer.
               
             Note that, in each case, we consider all viable possibilities for the root
             using information already recorded in the table, and then choose the
             choice with the lowest cost

      5. This algorithm is implemented by the following program:

         PROJECT CODE

      6. Time complexity?  (ASK CLASS)
      
         a. At first it may appear to be theta(n^3) [ three nested loops ]
         
         b. The code incorporates an improvement suggested by Donald Knuth that makes 
            this theta(n^2) by limiting the range of possible roots considered when 
            searching for the optimal root by again taking advantage of previously 
            computed values.  We won't pursue this.
          
VI. Randomized Algorithms
--  ---------- ----------

   A. A final category of algorithm design approaches we want to consider
      is randomized algorithms.  

      1. One variant on this approach is to use randomization to deal with
         the possibility of worst case data sets.

      2. A second variant arises when exhaustively testing all the data we 
         need to test to get a guaranteed answer is computationally infeasible.
         In such a case, it may be possible to test a random sample
         and get an answer that is sufficiently reliable.

   B. As an example of the first category of uses of randomization, consider
      quick sort.  

      1. We know that if we choose the first element in the unsorted data as 
         the pivot element, the algorithm degenerates to O(n^2) performance
         in the case where the data is already sorted in either forward or
         reverse order.

      2. Now consider what would happen if we chose a RANDOM element as the
         pivot element.

         a. Obviously, it could still be the case that we happen to make
            a bad choice - indeed, we could end up with a bad choice even
            if the data itself is random, if we happened to choose the
            smallest (or largest) element.

         b. However, the probability of making a bad choice is small, and
            the probability of making bad choices over and over again on
            successive iterations becomes increasingly small.  

         c. Further, the pathological case of already sorted data now poses
            no more problem than any other data set.  If there is a
            significant probability that we will have significant pre-existing
            order in our data, randomly choosing the pivot element may
            greatly reduce the likelihood of pathological behavior (though it
            cannot eliminate it, of course.)

   C. As an example of the second category of uses of randomization, consider
      testing an integer to see if it is prime.

      1. This is an important problem in connection with cryptography, since
         the most widely used encryption scheme generates its key from two
         large prime numbers (potentially 100's of bits.)

      2. To exhaustively test an integer n to see if it is prime, we would
         have to try dividing it by all possible factors less than or equal to
         sqrt(n).  This would seem to be an O(n^1/2) operation, which is
         certainly not bad.  However, when dealing with cryptographic
         algorithms, we tend to use the NUMBER OF BITS as the measure of
         problem size.  For a b bit number, the maximum value is 2^b - 1, and
         we need to test possible factors in the range 2 .. 2^b/2.  This
         means exhaustively testing an integer to see if it is prime takes
         time exponential in the number of bits.

      3. There are various results from number theory that allow us to test a 
         small, randomly-chosen subset of the possible factors.  If any of these
         declares the number to be non-prime, it is definitely non-prime.  If 
         the number passes all the tests, we can say with a very high 
         probability that it is prime.  (Since I don't claim any expertise in 
         the relevant number theory, I leave the details to someone like Dr.
         Crisman)

   D. One further issue with using a randomized algorithm, of course, is how
      do we get random numbers on a deterministic machine?

      1. Absent very specialized hardware, the answer is that we settle for
         PSEUDORANDOM SEQUENCES that behave, statistically, like random
         numbers.

      2. One good way to generate such a sequence is by using a linear
         congruential generator, which generates each new element of the
         sequence x(i+1) from the previous member of the sequence x(i) by
         using the congruence:

                x    = A x   mod M
                 i+1      i

         for appropriately chosen values of A and M.  

      3. It is important to choose appropriate values of A and M, and also to 
         deal appropriately with the possibility of overflow in the computation.  
         (Multiplying two 32-bit integers can yield a product as big as 64 bits).  
         Some widely-use "random number" functions actually have some very bad 
         characteristics.

      4. As a practical matter, when writing randomized algorithms on a
         Unix system, use the newer random number function random() instead
         of the older rand(), whose lower bits cycle through the same
         pattern over and over.  (On Linux systems, rand() is actually
         random() - the old rand() is not used.)