CPS222 Lecture: Disk-Based Search Structures; B-Trees    Last revised 1/30/2013

Objectives:

1. To introduce hashtables on disk.
2. To introduce B-Trees and key variant (B+ tree)


I. Introduction
-  ------------

   A. All of the search structures we have considered thus far have one thing 
      in common: they are stored in main memory, where access to any item
      is equally fast (true random access).  However, it is often the case
      that we must build search structures on disk rather than in primary
      memory, for two reasons:

      1. Size.  Large structures cannot be kept in their entirety in main
         memory.

      2. Permanency.  Structures in main memory are volatile and need to be 
         created (or read in from disk) whenever a program using them is run.

      One important use of disk-based search structures is to index tables in
      databases, to expedite operations like selection and natural join.  (The
      use of an index avoids the need to read every row in the table to find
      a desired key value.)

   B. When we build structures on disk, we must deal with certain realities of
      access and transfer time:

      1. Random access to disk typically requires on the order of 10-20 ms
         access time to position the head and wait for data to come up under
         it.  This is equivalent to about 10 million CPU cycles!

      2. However, once the head position is right, data can be transferred at 
         rates in excess of 100 million bytes/sec.

      3. Observe, then, how total transfer times behave for different size
         blocks (assuming a 10 ms access time, and 100 megabyte/sec transfer 
         rate)

         Size of block  Access time     Transfer time   Total time

          1 byte         10 ms           .01 micro-sec          10.00001 ms
         10 bytes        10 ms           .1 micro-sec           10.0001 ms
        100 bytes        10 ms           1 micro-sec            10.001 ms
       1000 bytes        10 ms           10 micro-sec           10.01 ms
      10000 bytes        10 ms           100 micro-sec          10.1 ms
     100000 bytes        10 ms           1 ms                   11 ms

         Clearly, then, transfers have very high overheads for access time
         (often in excess of 99%), so one would prefer to organize a search 
         structure in such a way as to allow fairly large transfers to/from 
         disk.

      4. For this reason, disk files are typically block-oriented.

         a. Data is stored in blocks of some fixed size (determined by disk
            geometry) - typically a power of 2 ranging from 512 on up.  While
            access/transfer time considerations argue for fairly large blocks,
            too large of a block can result in wasting space in the last block
            of a file when the file size is not a multiple of the block size -
            as it rarely is.  (On the average, each file wastes about 1/2 block
            of storage.)

         b. Blocks in a given file are numbered 1, 2, 3, ... (there is no
            block 0.)  Any given block can be accessed at any time by
            specifying its block number, which serves as a kind of on-disk
            "pointer" to the block.  (In the discussion that follows, when
            we use the term "pointer" we mean such a block number.)  A block
            number of 0 refers to a non-existent block - the disk equivalent
            of a null pointer.

         c. However, different blocks are usually stored at different
            places on the disk, so that accessing data in two different
            blocks entails the costs associated with two disk accesses.
            (Defragmentation of a disk can bring consecutive blocks
            physically together, which reduces this cost somewhat, but the
            benefits are not permanent, and the code cannot be written to 
            assume the ability to access two successive blocks in one 
            operation.)

      5. Most of the structures we have considered do not lend themselves well
         to on-disk implementation because they require accesses to too many
         different parts of the structure.
         
         E.g. a binary search tree of 1000 nodes has minimum height 10, so
         would require 10 disk accesses / item.

II. Hashtables on disk
--  ---------- -- ----

   A. While most of the search structures for main memory that we have looked
      at do not adapt well for use on disk, one does - the hashtable.
      
   B. Recall that a hashtable consists of a number of buckets, each of which
      in turn consists of a number of slots that can hold a key and its
      associated value, with the hash function determining what bucket holds
      a given key-value pair.
      
      1. When a hashtable is stored in main memory, normally each bucket has
         just a single slot.
         
      2. But when a hashtable is stored on disk, an entire block is normally 
         used as a bucket, with as many slots as there is room in a block for
         key-value pairs.
         
         Example: If the block size is 4096, and a key-value pair requires
         100 bytes, then a bucket will have 40 slots (with some bytes
         available for overhead or wasted)
         
         a. In this case, the hash function determines which disk block should
            hold a given key and its associated value, with a search of the
            bucket being needed to find the correct slot.
            
         b. However, the time for this search is typically very small relative
            to the access time for the bucket in the first place.
            
   C. Another difference between in-memory and on-disk hashtables arises in
      conjunction with handling collisions.
      
      1. When a collision occurs with an in-memory hashtable, a strategy like
         linear probing is used to find an available slot for the item being
         added.
         
      2. Of course, collisions are less frequent when using a bucket with
         multiple slots; but if a bucket fills up, the normal strategy is to
         allocate an additional disk block, with the original block containing
         a pointer to this overflow block.  This strategy is called chaining.
         
         Example: Suppose we had a hashtable on disk using a bucket size of 5,
         and six keys hashed to the same bucket.  The following situation would
         result:
         
         ---------------------    ---------------------
         | Primary bucket    |    | Overflow bucket   |
         | [ first five keys |    | [ sixth key and   |
         |  and associated   |    |   associated      |
         |  values ]         |    |   value ]         |
         |               o---|--->|                   |
         ---------------------    ---------------------
         
         a. At this point, any new keys that hash to this chain would be added
            to the overflow bucket until it fills up - at which time another
            overflow bucket could be allocated, with the chain consisting of
            three buckets.
         
         b. In the worst case - typically resulting from an overfilled table or
            a poor hash function - a chain could become long, resulting in
            performance tending toward O(n) rather than O(1).  But this is
            improbable if the hash function and table size are chosen well.

III. B-Trees
---  -------

   A. Now, we consider a search structure specifically designed for use with
      disk files: the B-Tree.

      1. A B-Tree is a form of search tree in which breadth is traded for depth.
         Each node contains multiple keys (instead of 1 as in a binary search
         tree), and so has multi-way branching rather than 2-way branching.

      2. The result is a very "bushy" sort of tree, with typical heights in the
         range 2-4, thus requiring only 2-4 disk accesses per operation.

      3. Further performance improvements can be had by keeping a copy of the 
         root block in main memory while the file is open, reducing the
         effective height of the tree by 1.  It may also be possible to
         cache the blocks at the next level down.  These two steps could
         reduce the number of disk accesses needed for most operations to 1
         or 2.
         
   B. Preliminary: An m-way search tree.

      1. An m-way search tree is a tree of degree m in which:

         a. Each node having s children (2 <= s <= m) contains s-1 keys.
            Let the children be called t  .. t     and the keys k  .. k
                                        0     s-1                0     s-2

         b. The keys and children obey the following properties:

            i.  k < k < k  ... < k              (or <= if duplicate keys are
                 0   1   2        s-2            allowed; we will assume <
                                                 and no duplicates for our
                                                 examples.)

           ii.  All the keys in child t are < k
                                       0       0

           iii. All the keys in child t  (1 <= i < s-1) lie between k    and k
                                       i                             i-1      i 

            iv. All the keys in child t    are > k
                                       s-1        s-2 

      2. Observe that a binary search tree is simply an m-way search tree
         with m = 2.

      3. Examples of 4-way search trees:

                        C     F    K                    A
                      /    /    \    \                 / \
                    A B   D E    HIJ  L                   B
                   / | \ / | \  /||\  /\                 / \
                                                            C
        (where the empty children are called               / \
         failure nodes, and are represented by                D
         "null pointers")                                    / \
                                                                E
                                                               / \
                                                                  F
                                                                 / \
                                                                    etc..

         a. Clearly, the first is much more desirable than the second!  

         b. Note that if the first example were implemented as a binary search
            tree instead, it would have height 4 instead of 2, a sizable cost
            increase at 10ms per disk access.  (The savings become even larger
            as m increases.  For example, the first example could be implemented
            as a 12-way search tree with only 1 level.)

      4. Observe that a 2-3-4 tree is simply a variant of a 4-way search
         tree.  In fact, B-Trees generalize some of the ideas we used with
         2-3-4 trees, though the standard algorithms for maintaining B-Trees
         are slightly different from those used with 2-3-4 trees.

      5. When a search tree is stored on disk, each node is typically one
         block (or cluster) and the branching factor m is chosen so that
         a node with a maximal number of keys and children just fits in
         one block (cluster).

   C. Definition of a B-Tree

      1. As was true with binary search trees, we recognize that m-way
         search trees can be very efficient if well-balanced, but have
         undesirable degenerate cases.  With binary search trees, we defined a
         variants such as AVL trees and Red-Black trees that avoid the 
         degenerate behavior.  We do the same here.

      2. A B-Tree of order m is an m-way search tree in which:

         a. All the failure nodes are on the same level.  (The term "failure
            node refers to the empty subtree one ultimately encounters when
            one is searching for an nonexistent key.  Of course, there isn't
            really any such node - it is represented by an impossible block
            number (corresponding to a null pointer in an in-memory tree.))

         b. Each internal (non-failure) node, except the root, has at least
             __ __
            |  m  |
              ___    children
               2

         c. The root, if it is not a failure node (meaning the tree is
            totally empty), has at least 2 children.

         d. Of our two examples, only the first is a B-Tree

     3. Examples: which of the following is/are B-Tree(s) of order 5:
        
         E J O              E J O             E  J  O            E  J  O
       /  / \  \          / /  \  \         /  / \   \          /  / \   \
     ABC  F KLM PQRS   ABF  GH  KL PQRS    AB  FI  KL  PQRS   AB  HI  KL  PQRS
    //\\ /\ //\\//|\\ //\\ /|\ /|\ //|\\  /|\ /|\ /|\ //|\\   /|\ /|\ /|\ //|\\
                                                                 FG
                                                                 /|\
     no: Node "F"    no: the tree is    yes             no: all the failure
     has only 2      not a search tree,                 nodes are not on the
     children        since F > E                        same level

      4. Note: because all the failure nodes of a B-Tree are on the same
         (bottom) level, we normally do not bother to draw them.  Thus,
         we will draw the one good tree in the above example as follows from
         now on:
                                E  J  O
                               /  / \   \
                             AB  FI  KL  PQ

   D. Some properties of a B-Tree

      1. What is the MAXIMUM number of KEYS in a B-Tree of order m of
         height h?  (Measuring height in terms of the number of NODES).
                
         a. In such a tree, each non-failure node would have the maximal
            number of children (m), and thus the maximal number of keys (m-1).
            Thus, we would have:

            1 node              m-1 keys                 at level 1
            m nodes             m * (m-1) keys           at level 2
            m**2 nodes          m * m * (m-1) keys       at level 3
            ...
            m**(h-1) nodes      m**(h-1) * (m-1) keys    at level h
                                --------------------
                                m**h - 1 keys total

         b. Compare our result for complete binary trees of height h -
            2**h - 1 nodes.  (h measured in NODES)

      2. What is the MINIMUM number of KEYS in a B-Tree of order m of
         height h?  (Measuring height in terms of the number of NODES).

         a. In such a tree, the root would have only 2 children (1 key), since
            this is the minimum allowed for a root.  All other nodes would
            have ceil(m/2) children, and ceil(m/2) - 1 keys.

         b. For convenience, let c = ceil(m/2).

         1 node                 1 key                   at level 1
         2 nodes                2 * (c-1) keys          at level 2
         2*c nodes              2 * c * (c-1) keys      at level 3
         2*c**2 nodes           2 * c**2 * (c-1) keys   at level 4
         ...
         2*c**(h-2) nodes       2 * c**(h-2)*(c-1) keys at level h
                                -----------
                                2 * [c**(h-1)-1] + 1 =

                                2 * c**(h-1) - 1 keys total

      3. To determine the height of a B-Tree of order m containing n keys, we 
         solve each of the above for h, as follows:

         a. From the equation for the maximum number of keys, we know:

                n <= m**h - 1

            or, solving for h:

                n+1 <= m**h

                log (n+1) <= h
                   m

            Now, since h must be an integer, we can take the ceiling of the
            log to obtain:
                  ___________
                 | log (n+1) | <= h
                      m

         b. From the equation for the minimum number of keys, we know:

                n >= 2 * c**(h-1) - 1

            or, solving for h

                (n + 1)
                ------- >= c**(h-1)
                   2

                log  ((n+1)/2) >= h-1
                   c

                h <= 1 + log       ((n+1)/2)
                            - -
                           | m |
                            ___
                             2

            Now, since h must be an integer, we can use the floor of the log
            to obtain:

                h <= 1 + | log     ((n+1)/2) |   (floor of the log to the base
                         |     _ _           |   ceiling m/2 of ((n+1)/2))
                         |    | m |          |
                         |     ___           |
                         |      2            |
                         |___________________|

         c. Combining the above results from minimal and maximal trees, we 
            obtain the following bounds for h:

                ceil(log (n+1)) <= h <= 1 + floor(log          (n+1)/2)
                        m                            ceil(m/2)

      4. Some examples:

         a. 1 million keys - B-Tree of order 200: height is 3 

            - Lower bound is ceil(log    1,000,001) = 3
                                      200

            (Note that a maximal tree of height 2, order 200, contains
             39,999 keys - so tree must have height at least 3)

            - Upper bound is 1+ floor(log    500,001) = 3
                                         100

            (Note that a minimal tree of height 4, order 200, contains 
             1,999,999 keys, so tree must have height no greater than 3)

         a. 2 million keys - B-Tree of order 200: 

            - Lower bound is still 3

            - Upper bound is now 4

            so the height could be 3 or 4.

   E. An important note: In our discussion, we have talked only about nodes
      containing KEYS.  In practice, we build search structures to allow us
      to associate keys with VALUES (e.g. name = key; phone number and address
      = value).  In the form of B-Tree we are discussing, then, a node actually
      contains s pointers to children, s-1 keys, and s-1 values.  These can be
      stored in one of two ways:

      1. The actual value can be stored in the node.  This, however, can reduce
         m, and thus the branching factor of the tree, if the size of the value
         is large compared to that of the key (as it often is).

         Example: node size 8000, key length 12, pointer size 4 bytes, allows
                  m = 500 if we don't store any value with the key

                  if we also have to store a value of size 36, however, we
                  would reduce m to 125

      2. The node can contain the number of another disk block that stores the
         actual value.  (This additional pointer adds minimally to the size of
         the node.)  However, this means that successful searches require an
         additional access to get the data.

      3. We can use a variant of the B-Tree called a B+ Tree - to be discussed
         shortly.

IV. Operations on B-Trees:
--  ---------- -- -------

    A. For our examples, we will use the following B-Tree of order 3 (sometimes
       called a 2-3 tree, since each node has 2 or 3 children and 1 or 2 keys). 
       We use order 3 to keep the size of the examples down:

                 J   T

         C F      M P         Y

      AB DE GH  KL NO R    UW   Z

    B. Locate a given key k in a (sub) tree whose root is block t.  (Assume
       blocks in the file are numbered 1 .. , with 0 denoting a failure
       node.)

        InfoType locate(KeyType k, int t)
        {
            if (t == 0)
                search fails ---
            else
            {
                read block t from the disk
                determine how many keys it holds
                int i = 0;
                while (i < number of keys && key[i] < k)
                    i ++;
                if (i < number of keys && key[i] == k)
                    return associated information
                else
                    return locate(k, child[i])
            }
        }               

        Example: Locate J: succeeds immediately at the root

                 Locate L: at the root, we end the while loop with i = 1,
                           since key[1] = 'T', so we go to the second child of 
                           the root

                           in that child, the while loop exits with i = 0,
                           (since key[0] = 'M' >'L'), so we go to the first 
                           child.

                           In that node, we find what we are looking for.

                 Locate Z: at the root, we end the while loop with i = 2,
                           since i = Number of keys, so we go to the third
                           child of the root

                           in that child, the while loop exits with i = 1, for
                           the same reason, so we go to the second child.

                           In that node, we find Z

                 Locate X: at the root, we end the while loop with i = 2

                           In the third child, the while loop exits with i = 0,
                           since key[0] = 'Y' > 'X', so we go to the first 
                           child.

                           In that node, we exit the while loop with i = 2.
                           Since the third child (and all children, in fact) of
                           this node is a failure node, the search fails.

   C. Inserting a new node.  (Assume that we disallow duplicate keys).

      1. We first proceed as in locate, until we get to a leaf - that is,
         a node whose children are empty.  (Of course, if we find the key we
         are looking for along the way, we declare an error and quit.)

      2. If the leaf node we have arrived at contains less than the maximum
         number of keys, then we simply insert the new key at the appropriate
         point, and add an extra failure node child pointer (0).

        Example:        Insert S: We work our way down to the node containing
                                  R.  Since it contains only one key, and can
                                  hold two, we add S, and our tree becomes:

                 J   T

         C F      M P         Y

      AB DE GH  KL NO RS    UW   Z


        Note that inserting a new key in a leaf may require moving other
        keys over.

        Example:        Insert Q in the original tree.  Result:

                 J   T

         C F      M P         Y

      AB DE GH  KL NO QR    UW   Z
                       ^
                       |__ R has been moved over one place


      3. Life becomes more interesting if the leaf we reach is already
         full.  (E.g. consider trying to insert "X" in the above.)  In this
         case, we cannot add a new node on a lower level, since this would
         violate one of the B-Tree constraints.  Instead, we proceed as
         follows:

         a. Allocate a new node from a free list, or extend the file by one
            node.

         b. Redistribute the keys in the original node, plus the new key,
            so that:

            - The first half remain in the original node
            - The middle key in the order of keys is held out for a use to
              be explained shortly.
            - The second half go into the new node.

          Note:

            The key we were inserting can go into either of the nodes, or
            it might be the middle key.  (e.g. if we are inserting X, it
            will go into the new node; but if we are inserting V into
            the same node, it would be the middle key.)

         c. Insert the middle key we saved out, plus a pointer to the newly
            created node, into the parent at an appropriate point, just after
            the pointer that we followed to go down to the node we split.  Of
            course, this means we move keys and pointers into the parent to
            make room.

        Example: insert X into the original tree

                               __________ this key was promoted into parent
                 J   T        |
                              v
         C F      M P         WY

      AB DE GH  KL NO R    U  X   Z
                           ^  ^
                           |  |___ new node now contains X
                           |______ original node now contains only U

      4. Observe that this strategy guarantees that the resulting tree will
         still meet all the tests for a B-Tree.

         a. Clearly, all the leaves are still on the same level.

         b. What about the number of children of the new nodes?

          - If we are forced to split a node, it is because it contained
            the maximum number of keys before insertion - m-1.  With the
            new key, this gives m keys, to be divided two ways plus the
            promoted key.  This leaves m-1 keys to be divided.

            - If m is odd, then each node gets (m-1)/2 keys, and has
              (m+1)/2 children, which is exactly ceil(m/2), as required.

            - If m is even, then one node gets m/2 keys - 1, and the other
              gets m/2 keys.  The smaller node then has m/2 children, which is 
              exactly ceil(m/2), as required.  (The larger node has more than 
              the minimum, which is fine.)

      5. Now what if there is no room for the promoted key in the parent?
         (Example: insert I into the original tree.  Node GH splits, with
         H to be promoted to node CF.  But this node has no room for another
         key and child.)

         a. Solution: split the parent as before, creating a new parent to
            hold half of the keys and pointers to half of the children.
            Again, promote one key and a pointer to the new node up one
            level.

         b. Note that, if carried to its ultimate, this can result in splitting
            the root, which is how the tree gains height.  At this point, the
            single middle key resulting from the splitting of the root becomes
            the one key in the new root.  (This is why we allow the root of a
            B-Tree to have as few as 2 children.)

        Example: insert I:

                    J
                /         \
              F             T
           /   \          /    \
         C     H        M P      Y
        / \   / \     / | \     /  \
      AB DE   G I   KL NO  R    UW   Z

      6. You will note that the approach we have taken to splitting nodes in
         a B-Tree is somewhat different from the one we used with 2-3-4 trees.
         Here, we have postponed splitting a node until absolutely necessary.
         
         a. If one splits a full node when it is not necessary to do so, the
            result won't be a BTree if m is odd - e.g. when inserting S into
            the above example, if we split MP (which we don't have to), and
            promoted one of the keys to the node containing T, we'd end up
            with one empty node in the middle of the tree!  If m were > 3, we
            would not end up with an empty node, but would end up with one child
            having less than m/2 children.

         b. The price is more complex code; but given the time required for disk
            accesses this policy makes some sense, since it postpones height
            increases until the absolute last moment possible. (One could choose
            to compromise the B-Tree requirement and use the 2-3-4 tree approach
            of anticipating the need for splits on the way down the tree,
            however - trading simpler code for earlier splitting of the root.)

   D. Deletion from a B-Tree

      1. As we have seen in other kinds of trees, deleting a key from a
         leaf will be much simpler than deleting a key from an interior node.
         As before, then, we use the trick of converting a deletion from an
         interior node into a deletion from a leaf by promoting a key from
         a leaf - typically the first key in the leftmost subtree of the
         child just after the key.

         Example: to delete J from the root of our original tree, we would
                  promote K to take its place, and delete K from the leaf.

      2. Deleting a key from a leaf is basically trivial - we simply slide
         other keys over as necessary to fill in the gap.  

        Example: Delete N from our original tree:

                 J   T

         C F      M P         Y

      AB DE GH  KL  O R    UW   Z


      3. However, we can run into a problem if the leaf we are deleting from
         already contains the minimal number of keys.

        Example: Delete R from our original tree.

      4. In this case, we essentially reverse the process we used to deal with
         an over-full node on insert.

         a. We find one of the siblings of the node from which we are deleting
            (we can use either side.)

         b. We rearrange keys between the node we are working on and the sibling
            so as to give each the minimal number, if possible.  This will mean
            changing the divider key between them in the parent.

        Example: When deleting R from the original tree, we can combine R's
                 node with NO and rearrange as follows:

                 J   T

         C F      M O         Y

      AB DE GH  KL N  P    UW   Z

         c. If, as a result, we do not have enough keys to make two legal
            nodes (i.e. if the sibling we are using also contains the minimal
            number of keys), then we combine the two nodes into one, also
            removing a key and child pointer from the parent.

        Example: working with the above (not the original tree), we can now
                 try to delete P.  Since the only node we can combine with
                 is N, and it has the minimal number of keys already, we must
                 pull O down from the parent and combine everything into one
                 node, recycling the other:

                 J   T

         C F       M          Y

      AB DE GH   KL NO     UW   Z

         d. Of course, removing a key from the parent may get us into trouble
            as well.  (e.g. suppose that, in succession, we removed L, N, and
            then O from the above).  In this case, the parent may have to
            "borrow" keys and children from a sibling.  In an extreme case,
            we may even have to merge the parent with its sibling, and could
            ultimately even reduce the height of the tree if we had to merge
            two children of the root.

        Example: Recall the tree we got by splitting the root:

                    J
                /         \
              F             T
           /   \          /    \
         C     H        M P      Y
        / \   / \     / | \     /  \
      AB DE   G I   KL NO  R    UW   Z

        Suppose we now try to delete Z:
        - We have to merge UW with the now vacated node.  

                    J
                /         \
              F             T
           /   \          /    \
         C     H        M P      W
        / \   / \     / | \     /  \
      AB DE   G I   KL NO  R    U   Y

        Suppose we now delete Y: We must merge with U, but this pulls W
        out of the parent, leaving it with too few keys.  (Zero, in this
        case; but in a higher degree tree we're in trouble when the
        number of keys drops below ceil(m/2) - 1.

        - We therefore rearrange keys and children with MP:

                    J
                /         \
              F             P
           /   \         /    \
         C     H       M       T
        / \   / \     / \     /  \
      AB DE   G I   KL  NO   R    UW
        
V. Variants of the B-Tree
-  -------- -- --- ------

   A. Achieving good performance with a disk-based search structure requires
      keeping the height of the tree down (2 is rarely practical, but 3 is
      often ideal) - which in turn entails using a large branching factor.
      Indeed, if we know the desired size of the tree, we can calculate the
      minimum branching factor needed to guarantee a certain height, using
      formulas derived earlier.  In particular, for a B-Tree, we can guarantee
      a maximum height of 3 as follows:

         n              Minimum m

        1000             15
        10,000           34
        100,000          73
        1,000,000       158
        10,000,000      341
        100,000,000     736

      (Clearly, m grows rather slowly with n)

   B. However, considerations of node size may make it difficult to achieve
      the necessary branching factor.  Recall that, in a B-Tree, each node
      contains up to m pointers, m-1 keys, AND M-1 VALUES ASSOCIATED WITH THE
      M-1 KEYS.

      1. Suppose we need an m-value of 20 for some application in which the
         keys are 10 bytes long and the associated values are 100 bytes long.
         If a pointer is 4 bytes, then the minimum node size is

         4*20 + 19*(10+100) = 2170 bytes 

      2. For efficient performance, we must guarantee that each node can be
         read or written with a single disk access - which requires that the
         entire node reside in a single contiguous blocks on disk.  Since this 
         is dictated by disk geometry and system software, the designer of
         a B-Tree is usually faced with a fixed upper bound to node size.
         
         Example: If we were working with a disk that restricted the block
                  (or cluster) size to 2048, then we could not achieve the
                  desired branching factor with these parameters.

   C. Two techniques can be used to hold the node size down while still
      achieving a desired branching factor.  These techniques lead to two
      variants of the B-Tree.

   D. An often-used variant of the B-Tree known as the B+ tree.

      1. We saw that the total size of a node is generally the limiting factor
         in terms of the value of "m" that can be used for a B-Tree.  Here, of
         course, the main villain is usually the value that is stored in the
         node along with the key.  If the value is - say - 10 times bigger than
         the key, then its presence in the node reduces the potential branching
         factor by a ratio of almost 10:1!

      2. One way to address this would be to not store the values in the tree
         at all.

         a. Rather, each node would contain up to m child pointers, m-1 keys
            and m-1 POINTERS TO VALUES STORED ELSEWHERE.  

         b. The difficulty with this scheme, though, is that once the tree has
            been searched to find the desired key, an additional disk access is
            needed to find the data.  The effect on performance is the same as 
            if the height of the tree were increased by one, so this may undo 
            the gain obtained by using the higher branching factor.

      3. A B+ tree addresses this problem as follows:

         a. Values are only stored in the lowest level of the tree.  Nodes at
            higher levels contain keys, but not values.

         b. This means that the branching factor in the upper levels is much
            greater than the branching factor at the lowest level (where the
            children are failure nodes.)

            Example: assume nodes are 512 bytes, keys are 10 bytes, values are
                     90 bytes, and pointers are 4 bytes.

                     Each node in the lowest level of a B+ tree could store up 
                     to 5 key-value pairs, with 12 bytes to spare.  (No pointers
                     need be stored, because the 6 children are all failure 
                     nodes.)

                     Each node at upper levels would have branching factor 37.
                     It would store up to 36 key-pointer pairs, plus one extra
                     pointer, with 4 bytes to spare.

                     We assume that we can distinguish between a leaf and a
                     non-leaf node in some way during our processing - perhaps
                     by keeping track of the height of the tree or by tagging 
                     the node itself or the pointer to it in some special way.

         c. Of course, this means that all keys must occur at the lowest level
            of the tree, so that a value can be stored with them.  The keys in
            the upper levels, then, are copies of keys stored lower down; some
            keys are stored twice.  In particular, each upper level key is a 
            copy of the least key in the subtree to its right.  (Alternately, we
            could store the greatest key in the subtree to the left.)

            Example: given the above scenario, assume that we have a B-Tree
                     that holds the 26 letters of the alphabet as keys.

                     Since the maximum branching factor of a leaf is 6, the
                     minimum branching factor would be 3, and each leaf would
                     hold 2-5 keys.  Thus, we would have 6-13 leaves, which 
                     could easily be accomodated as children of a single root 
                     node. Thus, our tree might look like this:

                     -------------------------------------
                     | C    F    J    L   O    R   T   W |
                     -------------------------------------
                    /   |     |    |    |    |   |    |    \
                   AB  CDE  FGHI  JK   LMN  OPQ  RS  TUV  WXYZ

                     Note that the separator keys in the root are copies of the
                     first key in the leaf to the right of the separator.  We
                     could also have chosen to store the last key in the leaf
                     to the left.

         d. For this particular set of characteristics, we might contrast:

            i. This B+ tree of height 2, with plenty of room to grow without
               gaining height.

           ii. An ordinary B-Tree of order 6 (where all levels hold values) -
               which would be of height 3 for this configuration.

         e. Just to illustrate the concept of a B+ tree further, we consider the
            what might happen with different assumptions about how many keys 
            would fit in a non-leaf node, so that we end up with a three-level 
            tree, like the following:
            
                                   -----
                                   | L |
                                   -----
                                   /   \
                     ---------------    ------------------
                     | C    F    J |    | O    R   T   W |
                     ---------------    ------------------
                    /   |     |    |    |    |   |    |    \
                   AB  CDE  FGHI  JK   LMN  OPQ  RS  TUV  WXYZ

            Note that the root key - L - is a copy of the smallest key in its
            right subtree, which doesn't actually occur in its child but only
            at the bottom of the tree.)
            
      4. A common modification of the B+ Tree is to add across the bottom
         level, like this:
         
                                   -----
                                   | L |
                                   -----
                                   /   \
                     ---------------    ------------------
                     | C    F    J |    | O    R   T   W |
                     ---------------    ------------------
                    /   |     |    |    |    |   |    |    \
                   AB->CDE->FGHI->JK-->LMN->OPQ->RS->TUV->WXYZ
                   
         a. There is a slight overhead for this, of course - but typically
            the key/value size doesn't evenly divide the block size, so some
            extra bytes are available for the link.
            
         b. This arrangement facilitates range queries, where we want to find
            all entries whose keys lie in a certain range.

   E. A B* Tree of order m is an m-way search tree in which each node (save
      the root) has a minimum of (ceil (2m-1)/3) children and a maximum of m
      children.

      1. Nodes in a B* Tree of order m have the same size as those in a
         B Tree of order m, but their minimum branching factor is greater.

      2. This is achieved by using the following strategy on insertion of
         a new key:

         a. If the leaf in which the key belongs has room for the new key,
            then it is put there (as with the basic B-Tree.)

         b. However, if the leaf is full, then instead of splitting the
            leaf we choose one of its siblings and attempt to redistribute
            keys between the two leaves.  (This is sort of the reverse of what
            we did when deleting a key from a B-Tree.)

            Example: A B* Tree of order 5 has 4-5 children (3-4 keys) for each
                     node.  Consider the following example of such a tree:

                           E   K   P   V
                         /   |   |   |   \
                      ABCD FGHI LMN  RSTU XYZ

            If we go to insert Q, we find that leaf RSTU is full.  Instead
            of splitting it (which would force a split of the root and a new
            level in the tree), we combine RSTU with one of its siblings - say
            LMN - and rearrange keys between them and the divider in the
            parent to get

                           E   K   Q   V
                         /   |   |   |   \
                      ABCD FGHI LMNP RSTU XYZ

         c. If the chosen sibling is also full, then we combine the keys from
            the two nodes and split the result to give three nodes.  This
            preserves the ceil((2m-1)/3) branching factor.

    3. To see the advantage of the B* Tree, consider the following table
       showing the minimum number of keys in a B Tree and a B* Tree of
       height 3 for different values of m

            m      minimal B Tree height 3         mimimal B* Tree height 3

            5      17                              17
            10     49                              97
            20     199                             337
            50     1249                            2177
            100    4999                            8977
            200    19,999                          35,377

            (The advantage is even greater for higher trees)