CPS222 Lecture: Lists                           Last revised 12/17/14

Objectives:

1. To introduce the data type sequence (ordered list).
2. To show how sequences can be implemented by arrays, vectors, or linked lists.
3. To introduce representation for matrices as two or more dimensional arrays

I. Sequences
-  ---------

   A. Many of the interesting "standard" abstract data types are variations
      on the theme of a sequence.   

   B. A sequence is a group of items that has the following basic properties.

      1. Either the sequence is empty, or it has a unique first item and
         a unique last item.  (If it consists of exactly one item, these
         two are the same, otherwise they are different.)

      2. Each item, except the last, has a unique successor.

      3. If you start with the first item and apply the successor operation
         repeatedly, you will eventually visit each item exactly once, ending
         at the last item.

      4. We may also want to define the operation of predecessor analogous
         to successor:

         a. Each item, except the first, has a unique predecessor

         b. If you start with the last item and apply the predecessor
            operation repeatedly, you will eventually visit each item
            exactly once, ending at the first item.

         c. Successor and predecessor are inverses - e.g.

            if B is the successor of A, then A is the predecessor of B,
            and vice versa
            
      5. We may want to be able to access items by relative position in
         the list (with position 0 being the first).

   B. For a sequence, we have the following set of values and set of 
      potential operations.  Note that, for different kinds of sequences,
      we may be interested only in a subset of the set of operations:

           Values: { all sequences of items (of some type) }
           Operations: { add a new item at a specified position - interesting
                           special cases are beginning, end, or specific
                           numbered position.
                         access the item at a specified position - same 
                             options as above
                         delete the item at a specified position - same 
                             options as above
                         determine whether the sequence is empty
                         obtain the successor of a given item
                         obtain the predecessor of a given item
                       }
                       
II. Representations for Sequences
--  --------------- --- ---------

   A. There are two basic alternatives for implementing a sequence - using
      an array (or a variant known as a vector), or using a linked list.
      
   B. Arrays

      1. Since arrays are supported directly in almost all programming
         languages, they are an attractive representation for sequences.

         a. In an array, LOGICAL ADJACENCY (B follows A) is modelled by
            PHYSICAL ADJACENCY (B occurs just after A in memory.)  
            
         b. If the sequence is allowed to grow or shrink over time, we
            might also store a count of the number of items, along with
            the actual array of items, which would be allocated with
            extra space to allow for growth.
            
         c. In C/C++, an array is declared by a declaration of the form
            <type><name>[<size>], which both declares the array and allocates
            the needed storage.
            
            i. Example:
            
               int n[100];   // declares n to refer to an array of integers
                             // and allocates storage 100 integers.
            
           ii. Contrast this with Java, where the declaration of an array and
               storage allocation are two distinct steps - e.g.
               
               int n[];
               n = new int[100];
               
          iii. An array element is accessed by subscript - e.g. n[i] is the
               ith element of the array.  (Subscripts are 0 origin, as in Java)
               
           iv. A potential source of errors in C/C++ programs is that array
               subscripts are not checked for legitimacy - e.g. given the
               above declaration of n as an array of 100 ints, it would be
               possible to refer to n[200] - which would access a storage
               location belonging to some other variable.  Storing a value
               into this location could result in a hard to find error!

      2. With an array representation of a sequence, certain operations
         are very easy:

         a. Accessing an item at an arbitrary position.  If the items in the
            sequence are numbered 0, 1, ... and we know the address in
            memory of the first item, then the address of the ith item is

             (address of first item) + i * (size of an item)

            Example: Given the array declaration 
         
            int n[100];
            
            and assuming that the array n starts at location 1000 in memory
            and an int occupies 4 bytes of memory, then n[10] is at location
            1000 + 10 * 4 = 1040

         b. Obtaining the successor or predecessor of an item.  If we know the address of 
            a particular item, then its successor is at address:

             (address of current item) + (size of an item)

            and its predecessor is at address:

             (address of current item) - (size of an item)

         c. Adding a new last item (assuming there is room for one more
            item in the array)

            - Store the item at address
     
              (address of first item) + (item count) * (size of an item)

            - Increment the item count

         d. Deleting the last item

            - Decrement the item count.  (The old value is still stored
              in memory, but is no longer considered part of the sequence.)

         All of the above are O(1)

      3. With an array representation of a sequence, certain operations
         are relatively hard:

         a. Adding a new item at an arbitrary position (or at the beginning)
            entails moving all the items currently at the same or 
            higher-numbered positions up one slot.

         b. Deleting an item at an arbitrary position (or at the beginning)
            entails moving all the items currently at higher-numbered 
            positions down one slot.

         The above are O(n), where is n is the number of items in the sequence.
            
      4. Many programming languages (including C/C++) support creating arrays
         with two or more dimensions.  (A two-dimensional array is often used 
         for modeling mathematical matrices).  Though these are not sequences as we
         have been talking about, we mention them briefly here.  You will use a matrix
         in your "Game of Life" project.
         
         C/C++ Example
         
         a. Declaration
            
            float x[10][20];   // Declares x to be a matrix of floats
                               // The matrix has 10 rows and 20 columns
                               // Allocates storage for 200 floats
                               
         b. Access
        
            x[i][j] refers to the element in row i and column j

   C. Vectors
   
      1. When we create an array, we must specify how many items it
         may contain.  If the sequence grows larger than this, we
         typically have to move the entire array to some new, larger
         location in memory, since the memory allocator typically will
         have put other variables immediately after the space we reserved
         for the array.  This is a non-trivial O(n) exercise at best - and
         may not even be easily possible.

         For this reason, we may be tempted to allocate memory to more
         than adequately accomodate the potential growth of the sequence -
         which leads to either wasting memory or an unpleasant surprise
         when we discover we guessed too small!
            
      2. Many languages provide a variant typically known as a vector which can be
         resized at any time - though increasing the size can take
         O(n) time because the vector is implemented by an array that
         may need to be copied to a new larger location in storage.
         
   D. Linked lists
         
      1. The use of a linked implementation of a sequence typically requires
         that the programming language support variables of pointer or vector type -
         which we will discuss in the next lecture.

      2. The fundamental idea is that we abandon the notion of modelling
         logical adjacency by physical adjacency.  Instead, we associate
         with each item an explicit LINK - the address in memory of its
         successor.

         EXAMPLES: We often represent linked lists using a "box and arrow"
                   notation

                     -----   -----   -----
                     | A |   | B |   | D |
                     | o-|-->| o-|-->| o-|--
                     -----   -----   ----- |
                                          ---
                                           -

                   (It is common to refer to the individual boxes as NODES.)

                   Form class into a list linked in alphabetical order
                   by pointing to each other.

      3. With a linked representation of a sequence, certain operations
         are very easy:

         a. Adding a new item at an arbitrary position is a matter of
            readjusting links - assuming we know its predecessor.

            EXAMPLES: Show modifications to above drawing to insert a node
                      containing "C" just after "B".
                     
                      Show process of inserting a new person into class
                      list.

         b. Deleting an item at an arbitrary position is a matter of
            readjusting links - assuming we know its predecessor.

            EXAMPLE: Show modifications to above drawing to delete node
                     containing "B".

                     Show process of deleting a person from class list.

         c. Accessing the successor of an item involves following its link.

            The above pointer operations are O(1).

      4. With a linked representation of a sequence, certain operations
         are relatively hard:

         a. Accessing an item at an arbitrary position entails starting
            at the beginning and following links (traversing the list)
            the required number of steps - e.g. to access item 10,
            we start at the beginning (item 0) and follow links 10
            times.  

            This is an O(n) operation.

            (Note that this may also be part of the cost of adding or
             deleting an item at an arbitrary position, since we need
             access to its predecessor - unless we are already there
             for some reason.)

         b. Accessing the predecessor of an item entails starting
            at the beginning of the list and following links until
            we find a node whose successor is the one we want - i.e.
            the links are "one way streets".

            This is an O(n) operation.

            (This can be avoided by maintaining a doubly-linked list,
             in which each node has two links - one to its successor
             and one to its predecessor.)

      5. Provided memory is not totally full, it is easy to grow the
         sequence by allocating a new node and linking it in at the
         right place.  There is no need to specify a size up front.
                      
   E. You should already be quite familiar with working with arrays.  In the next 
      lecture, we turn to implementing linked lists, using C++.