Home Previous Next

CSC205AA::Lecture Note::Week 10
Assignments | Handouts | Resources | Email Thurman { compufoo::Twitter | Facebook | YouTube}
GDT::Bits:: Time  |  Weather  |  Populations  |  Special Dates

Overview Assignment(s):  Code [stack] NumberSystemStack.java | Stack.java | ParenChecker.java | Stack2.java
[queue] Queue.java | FootNotes.java (footnotes.in.txt) | Q.java | TestQ.java | PriorityQueue.java
...... CircularArrayQ.java | Buffer.java
[linked-list] SinglyLinkedList.java | TestLinkList.java [new version] SinglyLinkedList.java | DoublyLinkedList.java
[sort/search] BubbleSort.java | Bsearch.java


A  stack  is a data structure in which all access is restricted to the most-recently-inserted elements (LIFO: Last-In-First-Out).

My favorite real world example of a stack is a salad bar. Typically, at the start of the salad bar line is a stack of plates and normally you take the top-most plate (unless it is dirty or has some type of foreign looking object on it, in which case you dig down and take a plate from somewhere lower in the stack [atypical stack behavior]). In actualality, a stack is the wrong way to store plates at a salad bar because of the following scenario: The dishwasher cleans the dirty plates, brings them out front to the salad bar, and  pushes  them on top of the stack -- now John Q. Customer comes up and is stuck using a hot plate.

A stack has three natural operations: insert, delete, and find; but in stack terminology these operations are called  push ,  pop  and  top .

Elements are inserted onto a stack by using a push operation.

Elements are deleted from a stack by using a pop operation.

The most-recently inserted element can be examined by using a top operation.

Stacks are heavily used in system software (e.g. compilers). A stack can be used to check for unbalanced symbols (e.g. (), {}, []). Example algorithm:

  1. create an empty stack
  2. read symbols until EOF
    1. if the symbol is an opening symbol, push it onto the stack
    2. if the symbol is a closing symbol and the stack is empty, then report an error
    3. otherwise, pop the stack (if symbol popped is not the corresponding opening symbol, then report an error)
  3. at EOF, if stack is *not* empty, then report an error

A stack is used to implement function (method) calls in most languages. For example in C++, when a function is called, memory for parameters and locally defined variables is allocated from the stack. When the function returns, the memory for the parameters and local variables is de-allocated from the stack. In addition, "jump" information and return value data is stored on the stack.

Stacks are also used in the evaluation of expressions (e.g. operator precedence parsing).

{TopOfPage} {Oracle.com::API Specification | Tutorial} {Additional Resources}


A  queue  is a data structure in which all access is restricted to the least-recently-inserted elements (FIFO: First-In-First-Out).

Operations are:

Examples of a queue: buffered input and a line printer spooler.

Another queue example: message queues.

{TopOfPage} {Oracle.com::API Specification | Tutorial} {Additional Resources}

Priority Queues

A  priority queue  allows the highest priority element to be removed from a queue even though it may not be the "oldest."

Real world example of a priority queue: You go to a restaurant with a friend that has a wait. There are a number of queues: a queue for two person table, a queue for a four person table, and a queue for large party table. You are placed on the queue for a two person table. There are three "elements" ahead of you. In comes a VIP with a friend. The next two person table that opens up is given to the VIP.

An operating system may use a priority queue when scheduling jobs. By default, each job is given the same priority. CPU access is granted using a queue (first come, first served). In certain cases, however, you may have a job that needs to get more CPU usage. In those cases, it will be given a priority higher than the default. The OS will use a scheduling algorithm to ensure the higher priority job gets the CPU with more frequency.

{TopOfPage} {Oracle.com::API Specification | Tutorial} {Additional Resources}

Linked Lists

A linked list is a low-level structure upon which high-level data structures can be built (e.g. stacks and queues can be implemented using a linked-list).

Sometimes the only Abstract Data Type (ADT) available for organizing data is the built-in array; however, this can be problem if the elements of the array need to be stored in some sort of order and elements are going to be inserted and deleted from the array. It can also be problem when arrays need to grow (i.e. increase in length [capacity]). In these cases a linked-list may be the solution.

A linked-list is a collection of records, called nodes, each containing at least one field (member) that gives the location of the next node in the list. In the simplest case, each node contains two members: a data member (the data value of the list item) and a link member (a value locating the next node).

If each node contains only one link member, then we have a singly linked list.

Unlike an array, the nodes of a linked-list do not necessarily occupy consecutive memory locations.

The implementation of a linked-list relies heavily on dynamic memory allocation. [Every node is dynamically allocated. When a linked-list is created, its length (i.e. # of nodes) is unknown.]


A linked list consists of a set of nodes. Each node contains data and link members. The first element is called the front and is pointed to by head. The end-of-list (EOL) is called the rear its link data member has the value NULL (i.e. 0). In some implementations the rear is pointed to by tail. List applications traverse the linked list using a cursor as a reference (or pointer) to the current element.

class Node

Example instance variables.

   some_data_type data   // 1 or more of these
   Node next             // pointer to next node 
   Node prev             // if doubly-linked-list

class LinkedList

Example instance variables.

   head      // handle to the 1st node in the linked-list
   size      // # of nodes in linked-list
   capacity  // maximum nodes allowed
   // maybe more...

Possible instance methods.

   constructors to initial linked-list objects
   add() -- add nodes to the end of linked-list
   insert() -- insert nodes into a linked-list
   delete() -- delete (remove) nodes from a linked-list
   update() -- update (modify) a node in a linked-list
   search() -- search a linked-list for data
   // maybe more...
Bjarne Stroustrup: Why you should avoid Linked Lists

{TopOfPage} {Oracle.com::API Specification | Tutorial} {Additional Resources}

Introduction to Sorting

Typically, sorting is executed on files that contain records having keys. The keys are used to control the sort.

The objective of a sort is to rearrange records so that their keys are ordered according to some well-defined ordering rule (e.g. numerical order or alphabetical order).

The sorting problem is defined as follows:

   Given a set of records r1, r2, ..., rn with key 
   values k1, k2, ..., kn, respectively, arrange the
   the records into any order  s  such that
   records rs1, rs2, ..., rsn
   have keys obeying the property 
   ks1 <= ks2 <= ... <= ksn.

If the file being sorted can fit into memory, then an internal sort can be done; otherwise, sorting files from a disk (or some other secondary storage device) is called external sorting.

For discussion purposes, we will focus on internal sorting (more specifically, the sorting of arrays).

A characteristic of sorting methods is stability. A sort is stable if it preserves the relative order of equal keys in a file; otherwise, it is non-stable. For example, if an alphabetized class list is sorted by grade, then a stable method produces a list in which students with the same grade are still in alphabetical order, but a non-stable method is likely to produce a list that is no longer in alphabetical order.

The efficiency of a sorting algorithm depends on two measurements:

The amount of extra memory used by a sort is important. Extra memory is used by sort in the following ways:

In general, if the records being sorted are large, then avoid "shuffling" them around by rearranging the elements of an array of pointers (or indicies). This is called indirect sorting.

Swapping adjacent records (i.e. items) is called an exchange.

There are numerous "simple" sorting algorithms are based on comparing and/or exchanging adjacent records -- they are: bubbleSort, exchangeSort, selectionSort, and insertionSort.

There are numerous sorting algorithms that might be better than the aforementioned "simple" sorts. Some of these sorts are: shellSort, mergeSort, and quickSort.

Sorting and Searching: The Art of Computer Programming by Donald E. Knuth is a classic text on the subject of sorting. It covers about 25 different sorting algorithms. The following a quote from the book:

"But you can't look up all those license numbers in time," Drake objected. "We don't have to, Paul. We merely arrange a list and look for duplications." -- Perry Mason (The Case of the Angry Mourner, 1951)

YouTube.com::Sorting Out Sorting::Part 1 | Part 2 | Part 3 (heap sort) | Part 4

{TopOfPage} {Oracle.com::API Specification | Tutorial} {Additional Resources}


BubbleSort is an exchange sort that is inefficient, but easy to to understand.

Keep passing through the list, exchanging adjacent elements, if necessary; when no exchanges are required on some pass, the file is sorted.

  1. begin with elements 0 and 1 ('i' and 'j')
  2. compare two adjacent elements 'i' and 'j'
  3. if element 'i' less than element 'j', swap them
  4. move one position to the right

When sorting in ascending order, the largest element "sinks" to the end of the list and the smallest values "bubble up" towards their proper positions near the front of the list.

BubbleSort pseudo-code:

   // assume we have an array 'a' with length 'n'

   end_i = n-1;
   while (end_i > 0) {
      last_i = 0;
      for (next_i = 0; next_i < end_i; next_i++) {
         if (COMPARE (a, next_i, next_i+1)) {
            SWAP (a, next_i, next_i+1);
            last_i = next_i;
      end_i = last_i;

BubbleSort is terrible [O(n^2)] if the list is sorted in reverse order. It works great given a list that is already sorted.

YouTube.com::Bubble Sort by CS50 at Harvard

{TopOfPage} {Oracle.com::API Specification | Tutorial} {Additional Resources}


In most cases, InsertionSort is the best of the simple sorts (i.e. BubbleSort, ExchangeSort, and SelectionSort).

InsertionSort works well if only a few elements need sorting; however, it is a poor choice if dealing with a large number of elements.

Often times InsertionSort is used as the final stage of more advanced sorts (e.g. QuickSort).

For an example of InsertionSort, assume we have an array A and its length is N. Let's look at pass i (1 <= i <= N-1). The sublist A[0] to A[i-1] is already sorted in ascending order. The pass assigns A[i] to the list. Let A[i] be the TARGET and move down the list, comparing the target with items A[i-1], A[i-2], and so forth. Stop the scan at the first element A[j] that is less than or equal to TARGET or at the beginning of the list (j=0). As we move down the list, slide each element to the right (A[j]=A[j-1]). When the correct location for A[i] is found, insert it at location j.

Said another way.

Assume elements are being collected using array named 'a'. The initial state is that the first element, considered by itself, is sorted. Each time a new element with value 'n' is "added" to the collection, it is inserted into a slot 'i' such that a[i-1] <= a[i] <= a[i+1]. If the new element is inserted into the array (i.e. it is not the last element), then the elements to the right of the insertion point must be shifted.

Many card players use the insertion sort. They are dealt a hand of cards. Each time they receive a card they place it in their hand in a sorted order. Luckily for them, shifting cards is simple. The following is an example of a card player that stores their cards in ascending order (lowest-to-highest).

   start with an empty hand
   receive an 8:  8                 //1 element is always sorted
   receive a  4:  4 8               //4 is inserted, the 8 was shifted
   receive a 10:  4 8 10            //10 added (no shifting necessary)
   receive a  2:  2 4 8 10          //4,8,10 all shifted
   receive a  7:  2 4 7 8 10        //8,10 shifted

InsertionSort pseudo-code.

   // assume we have an array 'a' with 'n' elements

   for (cur_i = 1; cur_i < n; cur_i++) {
      next_i = cur_i;
      while (next_i > 0) {
         if (COMPARE (a, next_i, next_i-1))
            SWAP (a, next_i, next_i-1);

YouTube.com::Insertion Sort by CS50 at Harvard

{TopOfPage} {Oracle.com::API Specification | Tutorial} {Additional Resources}


ExchangeSort sort is an exchange sorts that works as follows: The item at index 0 is compared with each subsequent item in the list at index 1, 2, ..., N-1. For each comparison, if the subsequent item is smaller than the element at index 0, the two entries are exchanged. Repeat this process with index 1, index 2 and so on up to index N-2.

ExchangeSort pseudo-code:

   // assume we have an array 'a' with 'n' elements

   for (cur_i = 0, limit = n-1; cur_i < limit; cur_i++) {
      for (next_i = cur_i+1; next_i < n; next_i++) {
         if (COMPARE (a, next_i, cur_i))
            SWAP (a, next_i, cur_i);

{TopOfPage} {Oracle.com::API Specification | Tutorial} {Additional Resources}


SelectionSort is a simple sorting algorithm that makes a number of passes through a list (or sublist) and, on each pass, selects one element to be correctly position. Each element is moved only once; therefore, this type of sort can be handy when sorting files with very large records and small keys.

The item at index 0 is compared with each subsequent item in the list at index 1, 2, ..., N-1. The index of the smallest item encountered is saved. After the comparisons have been completed, the smallest item is swapped with the item at index 0. Repeat this process with index 1, 2, and so on up to N-1.

SelectionSort pseudo-code:

   // assume we have an array 'a' with 'n' elements

   for (cur_i = 0, limit = n-1; cur_i < limit; cur_i++) {
      small_i = cur_i;
      for (next_i = cur_i+1; next_i < n; next_i++) {
         if (COMPARE (a, next_i, small_i))
            small_i = next_i;
      if (small_i != cur_i)
         SWAP (a, cur_i, small_i);

YouTube.com::Selection Sort by CS50 at Harvard

{TopOfPage} {Oracle.com::API Specification | Tutorial} {Additional Resources}


ShellSort was discovered by a computer scientist named Donald Shell in 1959. It is also referred to as the diminishing gap sort.

It is based on InsertionSort, but adds a new feature that improves performance. The basic idea behind this sort is that in early stages, far-apart elements are compared, rather than adjacent ones as in simplier exchange sorts. This tends to eliminate large amounts of disorder quickly, so later stages have less work to do. The interval (i.e. gap) between compared elements is gradually decreased to one, at which point the sort effectively becomes an exchange method. The performance of this algorithm can be influenced by the gaps that are used.

ShellSort is good for medium-sized arrays (few thousand items). By default, ShellSort should probably be your choice for sorting. If performance becomes an issue, then change to a more advanced sorting technique such as QuickSort.

ShellSort pseudo-code:

   // assume we have an array 'a' with length 'n'

   for (gap = n/2;  gap > 0;  gap /= 2) {
      for (i = gap;  i < n;  i++)
         for (j = i-gap;  j >= 0;  j -= gap) {
            if (!COMPARE (a, j, j+gap))
            SWAP (a, j, j+gap);

YouTube.com::Java Shell Sort by Derek Banas

{TopOfPage} {Oracle.com::API Specification | Tutorial} {Additional Resources}


QuickSort  was first developed by C.A.R. Hoare. It requires the picking of a partition element (pivot) and then partially sorting the array about the partition. You can then sort each of the two partitions by recursive application of the same technique. The algorithm can sort quite rapidly, but it can also sort very slowly [O(n^2) worst case].

QuickSort is a fast divide-an-conquer algorithm when comparison sorting arrays in place (frequently O(n log n)).

The basic algorithm for QuickSort is recursive and consists of the following steps:

  1. If the number of elements is 0 or 1, return.
  2. Pick any element and call it the pivot.
  3. Partition remaining elements into two disjoint groups called L and R.
  4. Return the result of QuickSort(L), followed by the pivot, followed by QuickSort(R).
The pivot divides array elements into two groups: those smaller than the pivot and those larger than the pivot.

The partition step places every element except the pivot in one of two groups.

   13  81 43  92  31  65  57  26  75  0

   Assume 65 is selected as the pivot.  Partition the elements.

    Left:  13  43  31  57  26  0
   Right:  81  92  75

   Quicksort the left items.  Quicksort the right items.
   0  13  26  31  43  57      75  81  92
   Original list:
      8  1  4  9  6  3  5  2  7  0

   Select the pivot  6  and swap the pivot with the last element.

      8  1  4  9  0  3  5  2  7  6

   Run   i  from left-to-right, and  j  from right-to-left.
   When  i  sees a large element,  i  stops.  When  j  sees
   a small element,  j  stops.  If  i  and  j  have not
   crossed, swap their items and continue; otherwise, stop
   this loop.

      i  stops at element 8; j stops at element 2
      the 8 and 2 are swapped

   2  1  4  9  0  3  5  8  7  6

      i  stops at element 9; j stops at element 5
      the 9 and 5 are swapped

   2  1  4  5  0  3  9  8  7  6

      i  stops at element 9; j stops at element 3
      i and j have crossed -- swap pivot with
      i'th element (6 with 9)

   2  1  4  5  0  3  6  8  7  9
Picking the pivot is important to good performance. Never choose the first element as the pivot (you should also avoid the last element). The middle element is usually a reasonable choice. The perfect pivot is one that results in equally sized partitions. Picking a random pivot has proven efficient.

The  median-of-three  is a popular technique for picking a pivot. Sort the first, middle and last elements of the list and use the median (in this case middle) value. Using this technique is good because while in the process of determining the pivot, you are also getting started on the sorting of the list (element 0 will be less than the pivot and element N-1 will be greater than the pivot).

QuickSort pseudo-code:

   quickSort(T a[], int left, int right) 

      if (left >= right) return;
      SWAP (a, left, (left+right)/2);
      last = left;
      for (i = left+1; i <= right; i++) {
         if (COMPARE (a, i,  left) {
            SWAP (a, last, i);

      SWAP (a, left, last);

      quickSort(a, left, last-1);
      quickSort(a, last+1, right);

{TopOfPage} {Oracle.com::API Specification | Tutorial} {Additional Resources}


A tree is a "nonlinear" structure that consists of nodes and edges (or branches). A tree is a set of nodes and edges that connect the nodes. There is exactly one path between two nodes. A path is a connected sequence of edges.

Arrays, stacks, queues, and linked-lists define collections of objects that are sequentially accessed. These are linear lists: they have first and last elements, and each interior element has a unique successor.

In a nonlinear structure, an element may have multiple successors.

Tree structures are used to represent a "hierarchy" of information. A commonly used hierarchy is an operating system's file system.

A tree structure consists of a set of nodes that originate from a unique starting node called the root.

Some nodes may be considered a parent node which implies they have zero or more children nodes to which they are connected. Every node except the root node has a parent.

The children of a node and children of those children are called descendants, and parents and grandparents of a node are its ancestors.

Nodes are siblings when they have the same parent.

A node that doesn't have any children is called a leaf.

Each node in a tree is the root of a subtree, which is defined by the node and all descendants of the node.

            |          |           |
         father       uncle       aunt
      |           |
   brother      sister

      grandfather  is the  root  node.  grandfather is the parent
      node to father, uncle and aunt.  father, uncle, aunt, brother
      and sister are all descendants of grandfather.  father and
      grandfather are ancestors of brother and sister.  brother and
      sister are children of father.  brother are sister are leaf
      nodes.  brother and sister are siblings.  father is the root
      of the subtree that contains the father, brother, sister nodes.

A single-inheritance class hierarchy can be represented by a tree.

             |                  |            |            |
         Container            Button       Canvas    TextComponent
             |                                            |
   +---------+----------+                            +----+----+
   |         |          |                            |         |
 Panel  ScrollPane   Window                      TextArea   Textfield
                  |          |
               Dialog      Frame

Movement from a parent node to its child and other descendants is done along a path.

The level of a node is the length of the path from the root to the node.

The depth of a tree is the maximum level of any node in the tree. The depth of a node is the length of its path from the node to the root.

            A               level 0
          / | \
         B  C  D            level 1
          \  \
           E  F             level 2
          / \
         G   H              level 3

   The depth of this tree is 3 (the longest path from the root
   to a node).  The path length for node E is 2.  The path
   length for node D is 1.  The path for node H is:  A->B->E->H.


A general tree is a set of nodes that is either empty, or has a designated node called the root from which descend zero or more subtrees. Each node is not an ancestor of itself, and each subtree itself satisfies the definition of a tree.

Note that a tree is defined in a recursive fashion; consequently, most tree-processing algorithms are recursive.

YouTube.com::CS 61B Lecture 23: Trees and Traversals

{TopOfPage} {Oracle.com::API Specification | Tutorial} {Additional Resources}

Binary Trees

A binary tree is a popularly used tree in which each node has at most two (2) subtrees. These subtrees are designated as the left and right, respectively. Either or both of these subtrees may be empty.

The following operations are performed on a binary tree:

[Definition] An algorithm that systematically "visits" all items in a tree is called a tree traversal.

Here is a binary tree and its respective (depth-first) traversals:

                     /     \
                  B           C
               /            /   \
            D             E        F
                        /   \
                      G       H

       Preorder traversal: A (root); B,D (left); C,E,G,H,F (right)
        Inorder traversal: D,B (left); A (root); G,E,H,C,F (right)
      Postorder traversal: D,B (left); G,H,E,F,C (right); A (root)

The following tree takes on a heap property:

                         /    \
                        /      \
                      84        63
                     /  \         \
                   68    79        12
                   /      \       /  \
                 32       67     6   10
                                    /  \
                                   8    9

   The data in any given node of the tree is greater than or 
   equal to the data in its left and right subtrees.

A binary tree can have a hierarchy that exhibits the property known as the ordering property.

                         /    \
                        /      \
                      84        103
                     /  \       / \
                   68    86    90  109
                  /  \        /  \
                32   74      88   97
                    /  \
                  70   80

   The data in each node of the tree are greater than all of the
   data in that node's left subtree and less than or equal
   to all the data in the right subtree.

A binary tree with the ordering property is often called a binary search tree (BST).

A complete binary tree of depth N is a tree in which each level 0 to N-1 has a full set of nodes and all leaf nodes at level N occupy the leftmost positions in the tree.

               A          Complete tree with depth 3.
             /   \
            B     C
           / \   / \
         D    E F   G
        / \   /\
       H   I J  K         Leaf nodes occupy leftmost positions.

A full binary tree is a complete tree that contains 2N nodes at level N.

               A           Full tree with depth 2.
             /   \
            B     C        Every node has two children.
           / \   / \
         D    E F   G      22 is 4 and there are 4 nodes.

A degenerate tree is one that has a single leaf node and each non-leaf node has only one child. [Note: a degenerate tree is equivalent to a linked-list.]


Many compilers make use of tree structures in obtaining forms of an arithmetic expression for evaluation.

   Operators are non-leaf nodes and operands are leaf nodes.

                    /   \
                  -       *
                 / \     / \
                A   B   C   ÷
                           / \
                          E   F

      (A - B) + C * (E / F)       infix
      + - A B * C / E F           prefix
      A B - C E F / * +           postfix

Binary Tree Structure

A binary tree structure is built with nodes. A tree node contains a data field and two pointer (i.e. link) fields.

Given the non-linear structure of trees, a linked-list type of searching (i.e. sequential) is not available; therefore, a variety of traversal methods (inorder, preorder, postorder) are commonly used.

Tree traversal methods rely heavily on recursion. Each traversal method performs three actions at a node:

The descent terminates when an empty tree pointer is encountered.

A node visit depends on the application (i.e. it is application-dependent).

Inorder Traversal

An inorder traversal begins its action at a node by first descending to its left subtree. After recursively descending through the nodes in that subtree, the inorder traversal takes the second action at the node and uses the data value. The traversal completes its action at the node by performing a recursive scan of the right subtree. In the recursive descent through the subtrees, the actions of the algorithm are repeated at each new node.

  1. traverse the left subtree
  2. visit the node
  3. traverse the right subtree
Postorder Traversal

The postorder traversal delays a visit to a node until after a recursive descent of the left subtree and a recursive descent of the right subtree.

  1. traverse the left subtree
  2. traverse the right subtree
  3. visit the node
Preorder Traversal

The preorder traversal visits the node first, then it does recursive descents of the left and right subtrees, respectively.

  1. visit the node
  2. traverse the left subtree
  3. traverse the right subtree
Breadth First Traversal (or Level Traversal)

A breath first (or level) traversal scans the tree nodes level-by-level starting with the root at level 0.

A common implementation of this algorithm is queue-based.

   create queue  Q
   insert root into  Q
   while Q is not empty
      delete front node  p  from  Q
      use node  p  to identify its children at next level
         if (p.left() != null)
         if (p.right() != NULL)
      Using the queue helps ensure that nodes on the same
      level are visited in the correct order (in this case


Trees perform inserts, deletes, and finds in O(log N) time (on average). A degenerate tree can be O(N).


The following inputs inserted in the given order produce a good binary tree.


What the binary tree look like? [answer]

What order do these state abbreviations need to be inserted in order to produce a degenerate tree? [answer]

WolframAlpha.com::Binary Tree

{TopOfPage} {Oracle.com::API Specification | Tutorial} {Additional Resources}

Binary Search Tree

A general binary tree can hold a large collection of data and yet provide fast access.

Recall, creating a "container" or "collection" class using an array or linked-list, requires those collections to be searched using a linear search. This can be inefficient when dealing with large amounts of data [O(N)].

Trees are efficient for searching because the path to any data value is no greater than the depth of the tree. Searching is maximized with a complete binary tree O(log2N).

To store elements in a tree for efficient access, a  binary search tree  (BST) can be used.

   For each node, the data values in the left subtree
   are less than the value of the node and the data
   values in the right subtree are greater than or equal
   to the value of the node.
Using a BST implies that the data found in each node contain a  key . In many cases, the data of a node is a record and the key is a field in the record.

Operations performed on a BST are:

   find -- search a tree of a specific data value
   insert -- find appropriate insertion spot and add new
             data value to the tree
   delete -- search a tree and remove the first occurrence
             of a given data value; reattach all subtrees
             to maintain BST structure
   update -- if key at current position matches key for
             the data item, update data value; otherwise,
             insert data value into the tree

Insertion on a Binary Search Tree

Search the tree and find the location where the new node is to reside. This is done be scanning the left and right subtrees until the insertion point is found. For each step in the path, maintain a record of the current node and the parent of the current node. The process terminates when an empty subtree is encountered. At this location, the new node is inserted as a child of the parent.
        /  \
      20    35
     /        \
   12          40

   Insert the value 32 into this tree.
   Step 1:  compare 32 with 25 -- traverse the right subtree
   Step 2:  comapre 32 with 35 -- traverse the left subtree
   Step 3:  insert 32 as left child of parent (i.e. 35).

        /  \
      20    35
     /     /  \
   12    32    40 

BST Pseudo-code For find(), min(), max(), delete()

BST is a collection of Node objects

class BST                class Node
   Node root                int key
   int nodecnt              Node parent
                            Node left
                            Node right

   if root is null
      return null
   return find(key, root)

find(key, node)
   if key equals node.key
      return node
   if key less than node.key
      if node.left null
         return null
      return find(key, node.left)
   if node.right null
      return null
   return find(key, node.right)

   if node.left null
      return node
   return min(node.left)

max(node)  // left to the reader
   node = find(key)
   if node null
   decrement nodecnt

   // case 0: delete childless root node

...the rest is available by request...

Update on a Binary Search Tree

Find the node that needs updating and copy in new data. If node note not found, then insert it. What if key being modified? Find node, delete it and then insert new node containing the updated key.

{TopOfPage} {Oracle.com::API Specification | Tutorial} {Additional Resources}

Threaded Binary Search Trees

If a binary tree has N nodes, then the total number of links is 2*N (each node has a left subtree pointer and a right subtree pointer).

Each node has exactly one pointer to it (except the root, which has none).

At most (given a complete binary tree) only N-1 subtree pointers will have non-null values.

The idea behind a threaded binary search tree is to make use of unused pointers (i.e. those set to null.

Threaded BSTs allow for efficient non-recursive traversals.

Given a BST, here is an algorithm that converts it into a right-threaded BST.

   Perform an inorder traversal of a BST.  Whenever a node  'x' 
   with a null right pointer is encountered, replace the right 
   pointer with a thread to the inorder successor of 'x'.

   The "inorder successor" of a node 'x' is it parent if 'x'
   is a left child of its parent; otherwise, it is the nearest
   ancestor of 'x' that contains 'x' in its left subtree.

Here is an example of a right-threaded BST.

               |                          |
               E                          K
              / \                        / \
             /   \                      /   \
            B     F                    J     L
           / \     \                  /       \
          A   D     G                I         M

   The threads are:  

      A.right ---> B
      C.right ---> D
      D.right ---> E
      G.right ---> H
      I.right ---> J
      J.right ---> K

For a threaded tree, each node must contain a flag that is used to indicate if the right pointer is a thread pointer.

{TopOfPage} {Oracle.com::API Specification | Tutorial} {Additional Resources}

AVL Trees

Binary search trees (BSTs) can be efficiently searched, but this is true only if the tree is "balanced" (O(log N) because binary-searching can be executed). A de-generate tree (i.e. a binary tree in which each node has only one child) results in O(N) speed.

If you are going to have a tree that is going to be used extensively for searching, then you may want to keep the tree balanced. A balanced BST is called an AVL tree. [Technique invented in the 60's by Russian mathematicians Georgii Maksimovich Adel'son-Vel'skii and Evgenii Mikhailovich (say these names fast three times).]

Implementing an AVL tree requires a balance factor be added to each node of the tree. A balance factor is defined as follows:

   The balance factor of node  'x'  is the height of the left
   subtree of 'x' minus the of the right subtree of 'x'.

   The height of a tree is the number of levels in it.

The following tree is used to demonstrate balancing factors.

               |                          |
               E                          K
              / \                        / \
             /   \                      /   \
            B     F                    J     L
           / \     \                  /       \
          A   D     G                I         M

   Balancing factors:

      H:  +1
      E:  +1
      B:  -1
      F:  -1
      D:  -1
      J:  +1
      L:  -1
      A,C,G,K,I,M:  0

Keeping a tree balanced requires making rotations when balancing factors take on values other than 0, 1, or -1.

The following trees are un-balanced.

   Inputs:  L, I, N, U, X

                           L (-2)
                          / \
                     (0) I   N (-2)
                               U (-1)
                                 X (0)

   To balance:  perform a left rotation on node N

                           L (-1)
                          / \
                     (0) I   U (0)
                            / \
                       (0) N   X (0) 


   Inputs: Z, E, B, R, A

                           Z (2)
                         E (1)
                       B (0)

   To balance:  perform a right rotation on node Z

                           E (0)
                          / \
                     (0) B   Z (0)

                           E (0)
                          / \
                     (1) B   Z (1)
                        /   /
                   (0) A   R (0)

There are four types of rotations:

Here is an example where a left-right rotation is needed:

   The following tree is balanced.

                               PA (1)
                              /  \
                        (0) GA    RI (0)
                           /  \
                     (0) DE    OH (0)
   Insert MA.
                               PA (2)
                              /  \
                       (-1) GA    RI (0)
                           /  \
                     (0) DE    OH (1)
                            MA (0)

   Balance by first doing a left rotation on node GA.

                              /  \
                            OH    RI
                         /  \
                       DE    MA

   Next, do a right rotation on node PA.

                              OH (0)
                             /  \
                       (0) GA    PA (-1)
                          /  \     \
                        DE    MA    RI
                       (0)   (0)   (0)

Coming up with an example of when is right-left rotation is needed and is left as an exercise.

{TopOfPage} {Oracle.com::API Specification | Tutorial} {Additional Resources}

2-3-4 Trees

2-3-4 Trees are trees that have nodes which may have more than two children.

A benefit of a 2-3-4 tree is that it contains shorter search paths because there are fewer levels in the tree.

When searching a binary tree, you either go left or right at each node. In a 2-3-4 tree, there can be several different directions that a search path may follow.


   An  m-node  in a search tree stores  m-1  data values
   k1 < k2 < ... < km-1,
   and has links to  m  subtrees  T1, ..., Tm,
   where for each value  i, all data values in  
   Ti < ki <= all data values in Ti+1

Here is an example 2-3-4 tree.

                            /  \
            +---------------    ------------+
            |                               |
        27     38                       60     70
       /    |    \                     /    |    \
   16,25  33,36  41,46,48          55,59  65,68  73,75,79

      This tree has 9 nodes.  The root node contains one value.
      Both nodes at level 1 have two values.  On level two,
      there are 4 nodes containing 2 values, and 2 nodes 
      containing 3 values.

   Search for the value 36.

      36 is less than 53, traverse the left subtree.
      36 greater than 27 and less than 38, traverse 
      the center subtree.  36 less than 33, compare
      to the next number.  36 equals 36 -- found.

2-3-4 trees satisfy the following properties:

In a nutshell, 2-3-4 tree is constructed as follows:

   Construct a 2-3-4 tree using the inputs:  10 3 9 15 21 44 18 1 12 5 7

   insert 10:       10
   insert  3:       3,10
   insert  9:       3,9,10
   insert 15:       split
               3              10

         15 greater than 9 -- search right subtree

               3              10,15

   insert 21:
               3              10,15,21

   insert 44:
      44 greater than 9 -- search right subtree
      internal node is full -- split
      median value moved into parent

            3        10       21,44

   insert 18:
      18 greater than 15 -- search right subtree
            3         10       18,21,44

   insert 1:
      1 less than 9 -- search left subtree
         1,3         10       18,21,44

   insert 12:
      12 greater than 9 and less than 15 -- search middle subtree
         1,3        10,12       18,21,44

   insert 5:
      5 less than 9 -- search left subtree
         1,3,5     10,12       18,21,44

   insert 7:  
      7 less than 9 -- search left subtree
      internal node is full; split
      median value 3 moved into parent
                  3,     9,     15
                1    5,7   0,12    18,21,44

2-3-4 trees stay balanced during insertions and deletions.

Think about top-down insertion versus the demonstrated bottom-up insertion. [Don't allow any parent nodes to become 4-nodes.]

2-3-4 trees have numerous unused pointers. Explore Red-Black trees for an alternative.

   2-3-4 tree with N nodes has 4*N pointers
   each node (excluding the root) has only one pointer to it
   4*N - (N-1) = 3N+1 of the pointers are null 
      (i.e. approximately 75% of the pointers are null)

Example 2-3-4 tree node class.

   class Node234 {
      Object data[3];
      Node234 links[4];

azrael.digipen.edu::Deleting Elements From a 2-3-4 Tree

YouTube.com::UCBerkeley::CS61B::Lecture 26: Balanced Search Trees

{TopOfPage} {Oracle.com::API Specification | Tutorial} {Additional Resources}

Big-O Notation

The run-time efficiency of a program has two main ingredients: space and time.

Space efficiency is a measure of the amount of memory a program requires.

Time efficiency is a measure of how long a program takes to execute.

In general, there is a trade-off between space and time. If your algorithm uses more space, then typically it will execute faster and vice versa.

The relationship between N (where N is a variable representing the number of inputs on which your algorithm must operate) and the running time of an algorithm as N becomes large is called the computational complexity of that algorithm.

Computational complexity is denoted using BigO notation. The letter O stands for the word order as it is used in the phrase on the order of. It signifies approximation.

Given two functions f and g, the function g is said to dominate function f if there is some positive constant c such that:

   c * g(x) >= f(x), for all possible values of  x

Asymptotic dominance is a variation of dominance such that:

   c * g(x) >= f(x), for all  x >= x'

In other words, function g asymptotically dominates function f for all "large" values of x. Big-O notation is represented using asymptotic dominance.

          5N   versus     N2
   N=1:   5    versus     1
   N=2:   10   versus     4
   N=3:   15   versus     9
   N=4:   20   versus     16
   N=5:   25   versus     25
   N=6:   30   versus     36
   N=7:   35   versus     49

      In this example, x'  is equal to 6 and the constant is 1.

The following are some examples of how dominance is used to determine BigO values:

   O(N3) + O(N2)  =>  O(N3)
      As  N  gets large, N cubed grows at a faster rate than
      does  N squared.  For large values of N, N cubed dominates.
   O(10 * N) + O(1000 * N) + O(1)   =>  O(N)
      Constant values are ignored.  The significance of the
      constant values are minimized for large values of N.
   O(N log N) + O(log N) + O(N2)  =>  O(N2)
      N squared dominates  N log N  which in turn dominates  log N.

[About logarithms.]

Let's review:

BigO-notation is used for three distinct purposes.

  1. To bound the error that is made when we ignore small terms in mathematical formulas.
  2. To bound the error that is made when we ignore parts of a program that contribute a small amount to the total being analyzed.
  3. To allow us to classify algorithms according to upper bounds on their total running times.

Categories of Common Running Times
O(1) constant time (most efficient)
O(log N) logarithmic time
O(N) linear time (polynomial time)
O(N log N) "linearithmic" time (polynomial time)
O(N2) quadratic time (polynomial time)
O(N3) cubic time (polynomial time)
O(2N) exponential time (impractical for large values of N)

The following table presents growth rates for some of the common functions (i.e running times):

    N     log N         N   N log N      N**2          N**3          2**N
    1         0         1         0         1             1             2
    2         1         2         2         4             8             4
    4         2         4         8        16            64            16
    8         3         8        24        64           512           256
   16         4        16        64       256          4096         65536
   32         5        32       160      1024         32768   4.29497e+09
   64         6        64       384      4096        262144   1.84467e+19
  128         7       128       896     16384       2097152   3.40282e+38

Typically constant factors are ignored in BigO analysis (although in reality constants frequently matter).

To compare the speed of different algorithms, computer scientists use bigO notation. The bigO notation doesn't exactly predict performance times. Instead, what the bigO notation tells us is that given two algorithms with the same order, then both will slow down at roughly the same rate as the number of items being sorted gets larger.

Search algorithms also have bigO values. Linear search is O(n), binary search is O(log n). Once again, these differences are most important for large data sets. If you have 1 million items in an array, a linear search will require about 1 million tests, but a binary search will only require 20 tests (because 2 to the 20 power is about 1 million -- the 'log' in bigO notation is always log-base-2 -- when you have a logarithmic algorithm, the analysis need not be concerned with the base of the logarithm, since this can change the total running time only by a constant factor (and constant factors are ignored).

   for (i = SOME_CONSTANT; i <= ANOTHER_CONSTANT; i++)      // O(1)
      // loop body requiring constant time
   for (i = SOME_CONSTANT; i <= N; i++)                     // O(N)
      // loop body requiring constant time
   for (i = SOME_CONSTANT; i <= N; i++)                     // O(N*N)
      for (j = ANOTHER_CONSTANT; j <= N; j++)
         // loop body requiring constant time

   for (i = SOME_CONSTANT; i <= N; i++)                     // O(N*M)
      // loop body requiring time O(M), where M is a variable

The following is example of how one can go about determining the approximate running time of an algorithm.

    // the following logic initializes a 'n' element array
    index = 0;                // assignment is 1 step
    while (index < n) {       // EXPR  index < n  is 1 step
       array[index] = -1;     // assignment is 1 step
       index++;               // increment is 1 step

             Si + Sc*(n+1) + Sb*n
   n = 0:    1  + 1*1      + 2*0    =  2
   n = 1;    1  * 1*2      + 2*1    =  5
   n = 2;    1  * 1*3      + 2*2    =  8
   n = 3:    1  + 1*4      + 2*3    =  11
   n = 6:    1  + 1*7      + 2*6    =  20
   n = 9:    1  + 1*10     + 2*9    =  29
   n = 300:  1  + 1*301    + 300*2  =  902
      Si => # of initialization steps
      Sc => # of steps in the condition EXPR
      sb => # of steps in the loop body
   Conclusion:  linear -- O(n)

Analysis of a Couple of Code Snippets

   for (k = 1; k <= n / 2; k++) {
      for (j = 1; j <= n * n; j++) {

Since these loops are nested, the number of times statements within the innermost loop are executed is the product of the number of repetitions of the two individual loops. Hence the efficiency is n3, or O(N3) in BigO terms, with a constant of proportionality equal to 1/2.

   for (k = 1; k <= n * 10; k++) {

   for (j = 1; j <= n * n * n; j++) {

Since one loop follows the other, the number of operations executed by both loops is the sum of the individual loop efficencies. Hence the efficiency is 10n + n3, or O(N3) in BigO terms.

Analysis of Exchange Sort

Assume we have array A with length N and we want to find the minimum element in the array.

   Pass 1:  compare the  n-1  elements A[1]...A[n-1] 
            with A[0] and, if necessary, exchange elements
            so that A[0] always has the smallest value

   Pass 2:  compare the  n-2  elements A[2]...A[n-1] 
            with A[1] and, if necessary, exchange elements
            so that A[1] less than all elements to the right

   Pass i:  compare the  n-i  elements A[i]...A[n-1]
            with A[i-1] and, if necessary, exchange elements 

   The total number of comparisons is given by the
   arithmetic series  f(n) from 1 to n-1:
      f(n) = (n-1) + (n-2) + ... + 3 + 2 + 1 =  n * (n-1) / 2

   The number of comparisons depends on  n2.

Using an Array for a List

Append an item: O(1). Insert an item at the front of the array: O(N).

Simple Sorts

Bubble, selection, insertion, and exchange are all O(N2).

Shell sort can give much better performance than O(N2) in the worst case.

Quick sort in the worst case can be O(N2), but on average it is O(N log N).

Merge sort is an O(N log N) sort.

Order of a Function: Mathematical Definition

The order of a function is defined as follows:

Given two non-negative functions f and g, the order of f is g if and only if g asymptotially dominates f. Using BigO notation, this can said as: f = O(g) (f is of order g).

Rules for Comparing BigO Estimates

Some rules to help calculate and compare BigO estimates:

   Let  f  and  g  be functions, and  C  a constant:
      O(C * f) = O(f)
      O(f * g) = O(f) * O(g)
      O(f / g) = O(f) / O(g)
      O(f) > O(g), if and only if  f  dominates  g
      O(f + g) = Max[O(f), O(g)]

About Logarithms

A logarithm of base b for the value y is the power to which you must raise b to get y. Normally, this is written as: logby = x.

   logby = x  <=>  bx = y  <=>  blogby = y

      where <=>  means "is equivalent to"

Logarithms have the following properties:

    For any positive values of  m,  n, and  r, and any positive integers
    a  and  b:
      log nm = log n + log m
      log n/m = log n - log m
      log nr = r log n
      logan = logb n / logba

{TopOfPage} {Oracle.com::API Specification | Tutorial} {Additional Resources}

Home Previous Next