Contents

Introduction

These notes accompany Data Structures and Algorithms II. The course, to a large extent, follows on from Data Structures and Algorithms I. However, while DS&A I focused on fundamental datastructures, DS&A II will focus on practical algorithms, applicable to a wide range of tasks. The approach will be somewhat less formal, with a little more focus on applications.

There is huge range of reasonable textbooks on data structures and algorithms. Standish (1994), recommended in DS&A 1, covers a fair amount of the course, but has relatively little on string processing algorithms. Sedgewick (1992) is another established text, and the material on string processing in this course is based on this text.

Motivation: 4 Reasons to Learn about Algorithms

Why should you want to know about algorithms? There are a number of possible reasons: For all these reasons it is useful to have a broad familiarity with a range of algorithms, and to know what they may be used for.

Overview of Course

This course will have two main sections. The first, larger section will look at different sorts of algorithms, and in particular string processing algorithms and graph algorithms. The second, fairly short section will look at object oriented programming (building on the brief introduction in DS&A 1), discussing inheritance, and how we could rework some of the ADTs discussed to date to exploit objects properly. Although we will use Turbo Pascal, the principles obviously apply to other object oriented languages, such as C++.

Before starting these main sections I'll briefly introduce a new List ADT, which as well as being important in its own right, provides a review of some of the term 1 ideas, and is the basis of the first assessed exercise.

Review of DS&A 1: Abstract DataTypes

DS&A 1 focused on the specification and implementation of some useful abstract data types. I'll very briefly review this material here, to provide the context for the rest of the course.

A given programming language will have a number of data types available, along with some operations on them. Pascal includes boolean, reals, integers, characters, records etc. Operations on booleans include and, or, etc. We don't normally worry about how these datatypes and operations are implemented, but just know what we can do with them.

However, most real programming tasks require more complex and sophisticated data types. If your programming language doesn't provide them, you'll have to define them yourself. Once you've defined such data types, you again don't want to worry again about how they're implemented, just know what operations are allowed on that data type. The definition of your abstract data type (ADT) should specify the operations allowed, but the implementation details should be hidden from the ``user'' of that abstract data type.

As a simple example, mentioned in DS&A 1, some high level programming languages will provide a ``stack'' data type built in, with operations ``initialise'', ``empty'', ``push'' and ``pop'' (or some equivalents) and maybe ``full''. In a language such as C or Pascal that does not provide such a thing, a stack data type can easily be defined, and implemented using either an array or a linked representation. Once implemented, we no longer need be concerned about which implementation technique is used, just what operations are allowed and what they do.

The notion of abstract data types obviously allows much more complex data types to be defined, even in languages that do have a good range of built in data types. So, for example, Common Lispgif has hash tables, stacks, lists, but does not have a graph ADT (discussed later in this course). If a programming problem requires the use of a graph, it should be defined and implemented in such a way that the main program only uses a set of well defined operations on that datastructure (independent of implementation decisions). It also helps if these operations are fairly standard, so that standard algorithms can be used on that graph.

Review Case Study: Generalised Lists

To review some of the ideas from term 1, and to introduce some new concepts, I'll discuss the generalised list ADT. This is also presented in section 8.3 of Standish (though my implementation and specification will be a little different).

The Generalised List ADT

For many applications, it is useful if we allow list structures where items in the list may be lists themselves, for example (2 (3 (4 5)) 6). This is a list with 3 items in it, the second of which is a list itself. So, a list item can be a list, or of a given simple type such as integer or string. A list item that is of a simple type is referred to as an 'atom', and we'll call its type AtomType. The list itself, and its elements will all be of type GenList. An object of this type can be either a list of items, or just an atom -- you may find the name 'GenList' confusing for something that need not be a list, in which case choose your own name!. Anyway, we can think of an object of this Genlist type as a container that may contain a list, or it may contain something of AtomType.

The operations we want to allow on such a structure are similar to those we might want on a normal linear list (though the names I use will be a bit different):

Access Primitives:
Function First(I: Genlist): Genlist;

Returns the first item in a list, which may be a list itself.
Function Rest(I: Genlist): Genlist;

Returns the remainder of the list.
Function Null (I: Genlist): Boolean;

A boolean function returning TRUE if the argument is an empty list.
Function Atomic (I: Genlist): Boolean;

A boolean function returning TRUE if the node contains an atom.
 
 
Storage and Retrieval Primitives
Function GetValue(I: Genlist): Atomtype;

Returns an item of Atomtype (e.g., integer) given a node in the list.
Preconditions: I is atomic.
(Note: We could also have a procedure PutValue, but this is omitted for simplicity).
 
 
Insertion/Construction primitives:
Function CreateNode(I: Atomtype): Genlist;

Creates a new object of GenList type containing an item of Atomtype
Function Cons(I1, I2: Genlist): Genlist

Returns a new list with I1 stuck in front of I2.
It is not normally essential that I2 is a list, but if it is not then we don't have a simple list returned, so I'll assume it is a list.
 
 
Note that we have just used functions in the above specification. So, rather than destructively modifying an existing list when we ``cons'' (ie insert) a new node, we return a new list. This, on the one hand, allows for a programming style close to functional programming, but on the other hand may lead to memory management problems in a non-garbage collected language, so may not be practical for many problems.

A Possible Implementation

To implement structures that may be either one type (list) or another, we can use Pascal's variant records. Look it up in your favourite Pascal book if you haven't met them. Figure 1.1 gives an example implementation of the basic datatype, along with a few of the operations, while figure 1.2 illustrates schematically how the nested list (1 (2 3)) is implemented. The remainder of the operations are left as an exercise for the reader. Note that, for now, I'm just using Pascal units rather than objects to define abstract datatypes. We will come back to objects later in the course.

[figure39]
Figure 1.1: Partial Generalised List Implementation

[figure44]
Figure 1.2: Linked Representation of Generalised List (1 (2 3))

A simple extension of the above representation allows us to have nodes of different types (e.g.,integer, character and string), allowing list structures like ((1234 Fred) (2345 Jim)). We'd just extend the node-types to explicitly include these, rather than just the category simplenode. This is essentially what is done in the Lisp programming language. (A diagram illustrating this is given in Standish, fig 8.13.) Providing such flexible datatypes may be useful in some applications, though possibly at the expense of efficiency and some compile-time typechecking (which may catch certain errors).

Manipulating Generalised Lists

Anyway, when manipulating and constructing generalised lists, you obviously don't have to know anything about how they are defined (typically using pointers).You just have to know the operations defined at the interface. For example, using the given operations, and assuming an integer AtomType, then we can construct and access lists as follows. To construct a list (1 2) we can call Cons(CreateNode(1), Cons(CreateNode(2),nil)). A more complex list (1 (2 3)) could be constructed using Cons(CreateNode(1), Cons(Cons(CreateNode(2), Cons(CreateNode(3), nil)), nil)). (Think about it, to convince yourself that this is right). If we assigned this to a list L, then to access the integer value in the first element we'd use GetValue(First(L)). However, this would give an error if it turned out that the first value was a (sub)list.

Now that we can construct and do some basic operations on generalised lists we can use these to define higher level operations. Two common list processing operations are ``append'' and ``reverse''. These can straightforwardly be implemented using the existing primitives. It is also clearly necessary to write some reading and writing functions/procedures -- creating a list using all these Cons's is a bit awkward, so we'd really like to be able to say something like l := ReadList('(1 (2 3))'). This can be viewed as a very simple kind of parsing problem, and will be set as part of the assessed exercise.

Once we have defined a useful library of higher level operations we can then consider what algorithms can be used to manipulate the generalised list to solve our programming task. For example, we might use a generalised structure to represent logical expressions, such as (forall X (man x) -> (mortal x)). (The AtomType is now string). Then we could construct a theorem prover to use these structures to derive new conclusions.

Exercises

The first assessed exercise will involve implementing the generalised List ADT, and developing a simple program to 'evaluate' expressions of the Lisp programming language. The first stage of this, to be done week 1, is to complete the generalised list ADT and write a procedure to print the lists in the form given in the examples.

Further Reading

The main textbooks that will be referred to are: For this lecture, read sections 8.1, 8.2, 8.3 and 8.4 of Standish. The first two sections provide a useful revision of last terms work, which you can omit if you feel confident about this. The next two discuss generalised lists and their applications.

Comments?

Please email me if you have any constructive suggestions or comments on the course, or serious difficulties with the material. This is the first time this course has run in this form, so there may no doubt be teething problems. My email is 'alison'.

String Processing Algorithms

Introduction

Having discussed the motivation for studying algorithms, and reviewed the concept of abstract datatypes, in this chapter we'll move on to look at a class of useful algorithms, used in many practical applications: string processing algorithms. These include: You might want to know about these algorithms for all the reasons listed in chapter 1: they give the basic ideas that will allow you do develop better string processing algorithms of your own (e.g., perhaps you want an algorithm to score two strings according to some somplex notion of closeness of match); they are useful in wide variety of programming tasks; they provide the basis for understanding existing tools (e.g., encryption and compression tools); and some of them are surprisingly neat and simple, and interesting in their own right.

A String ADT

Although many languages have strings (and associated operations) built in, we'll define an abstract string datatype, and discuss how it might be implemented using arrays or pointers. This is based on Standish, section 8.5, with the addition of GetChar and PutChar (Standish assumes the standard array operations are used, e.g., s[1] := 'c'). As far as possible the algorithms discussed in the next section will use the operations defined for this ADT. Different versions of Pascal (or other languages) may have slightly different operations and names for the operations. Turbo Pascal supports both standard Pascal strings, which have very few operations defined on them, and a special Strings unit that provides greater functionality. Anyway, here just the following operations will be assumed:
Function GetChar(S: String, I:Integer): Char;

Returns Ith character of string S;
Procedure PutChar(S: String, I:Integer, C:Char);

Replaces the Ith character of string S with C;
Function Length(S: String): Integer;

The number of characters in the string.
Function Pos(T, S: String): Integer;

The first position of string T inside S, or zero if no match.
Function Concat(S1, S2: String): String;

Returns a new string consisting of characters in S1 followed by characters in S2
Function Copy(S: String; I, M: Integer): String;

A substring of length M starting at position I in string S
Procedure Delete(S: String; I, M: Integer);

Deletes M characters from S starting at position I.
Procedure Insert(T,S: String; I: Integer);

Changes S into a new string with T inserted in position I
 
 
A simple representation for strings (normally used for the standard string type in Pascal) uses a fixed length packed array (normally 0..255 els), where the first element is used to store the length of string, e.g, [6,a,l,i,s,o,n,.....]. You should be able to fairly straightforwardly implement the above operations using this representation (extending Pascal's standard string type to have a more useful range of operations). (You would have to store the first element as a character, and convert to integer when required).

However, this representation is obviously wasteful for small strings and inflexible for strings more than 255 characters. So another approach would be to use a linked representation with its end marked with a null character. This approach is obviously much more flexible, but some operations (like length) are more expensive. Turbo Pascal's strings unit uses this implementation.

Standish discusses how strings are represented in text files, in terms of the physical organisation of the storage medium (e.g., hard disk). However, I'll just stick to the basic logical operations on strings, and string processing algorithms which use these.

String Searching

Motivation

As mentioned above, string searching algorithms are important in all sorts of applications that we meet everyday. In text editors, we might want to search through a very large document (say, a million characters) for the occurence of a given string (maybe dozens of characters). In text retrieval tools, we might potentially want to search through thousands of such documents (though normally these files would be indexed, making this unnecessary). Other applications might require string matching algorithms as part of a more complex algorithm (e.g., the Unix program ``diff'' that works out the differences between two simiar text files).

Sometimes we might want to search in binary strings (ie, sequences of 0s and 1s). For example the ``pbm'' graphics format is based on sequences of 1s and 0s. We could express a task like ``find a wide white stripe in the image'' as a string searching problem.

In all these applications the naive algorithm (that you might first think of) is rather inefficient. There are algorithms that are only a little more complex which give a very substantial increase in efficiency. In this section we'll first introduce the naive algorithm, then two increasingly sophisticated algorithms that give gains in efficiency. We'll end up discussing how properties of your string searching problem might influence choice of algorithm and average case efficiency, and how you might avoid having to search at all!

A Naive Algorithm

The simplest algorithm can be written in a few lines:
Procedure NaiveSearch(s1, s2: String): Integer;
{Returns an index of S2, corresponding to first match of S1 with S2, or}
{ -1 if there is no match}
var i, j: integer;
match: boolean;
begin
   i:=1; match := FALSE;
   while match = FALSE and i <= length(s1) do 
     begin
       j := 1;
       {Try matching from start to end of s2, quitting when no match}
       while GetChar(s1, i) = GetChar(s2, j) and (j <= length(s2)) do
          begin
           j := j+1;           {Increment both pointers while it matches}
           i := i+1
          end;
       match := j>length(s2);       {There's a match if you got to the end of s2}
       i:= i-j+2                    {move the pointer to the first string back to
                                    {just after where you left off}
     end;
   if match
      then NaiveSearch := i-1       {If it's a match, return index of S1}
                                    {corresponding to start of match
      else NaiveSearch := -1
end;
We'll illustrate the algorithms by considering a search of string 'abc' (s2) in document 'ababcab' (s1). The following diagram should be fairly self-explanatory (the îndicates where the matching characters are checked each round):
'ababcab'               i=1,j=1 
'abc'
 ^                      matches: increment i and j

'ababcab'               i=2.j=2
'abc'
  ^                     matches: increment i and j

'ababcab'               i=3,j=3
'abc'   
   ^                    match fails: exit inner loop, i=i-j+2 

'ababcab'               i=2, j=1        
 'abc'                  match fails, exit inner loop, i=i-j+2
  ^

'ababcab'               i=3, j=1
  'abc'                 matches: increment i and j
   ^

'ababcab'               i=4, j=2
  'abc'                 matches: increment i and j
    ^

'ababcab'               i=5, j=3
  'abc'                 matches: increment i and j
     ^

                        i=6, j=4, exit loop (j>length(s2)), 
                        i = 4, match=TRUE, 
                        NaiveSearch = 3
Note that for this example 7 comparisons were required before the match was found. In general, if we have a string s1 or length N, s2 of length M then a maximum of approx (M-1)*N matches may be required, though often the number required will be closer to N (if few partial matches occur). To illustrate these extremes consider: s1= 'aaaabaaaabaaaaab', s2 = 'aaaaa', and s1= 'abcdefghi', s2= 'fgh'.

The Knuth-Morris-Pratt Algorithm

[Note: In this discussion I assume that we the index 1 indicates the first character in the string. Other texts may assume that 0 indicates the first character, in which case all the numbers in the examples will be one less].

The Knuth-Morris-Pratt (KMP) algorithm uses information about the characters in the string you're looking for to determine how much to `move along' that string after a mismatch occurs. To illustrate this, consider one of the examples above: s1= 'aaaabaaaabaaaaab', s2 = 'aaaaa'. Using the naive algorithm you would start off something like this:

'aaaabaaaabaaaaab'
'aaaaa'
 ^

'aaaabaaaabaaaaab'
'aaaaa'
  ^

'aaaabaaaabaaaaab'
'aaaaa'
   ^

'aaaabaaaabaaaaab'
'aaaaa'
    ^

'aaaabaaaabaaaaab'   
'aaaaa'
     ^                  match fails, move s2 up one..

'aaaabaaaabaaaaab'
 'aaaaa'
  ^

etc etc
but in fact if we look at s2 (and the 'b' in s1 that caused the bad match) we can tell that there is no chance that a match starting at position 2 will work. The 'b' will end up being matched against the 4th character in s2, which is an 'a'. Based on our knowledge of s2, what we really want is the last iteration above replaced with:
'aaaabaaaabaaaaab'
     'aaaaa'
      ^
We can implement this idea quite efficiently by associating with each element position in the searched for string the amount that you can safely move that string forward if you get a mismatch in that element position. For the above example..
a 1     If mismatch in first el, just move string on 1 place.
a 2     If mismatch here, no point in trying just one place, as
        that'll involve matching with the same el (a) so move 2 places
a 3
a 4
a 5
In fact the KMP algorithm is a little more cunning than this. Consider the following case:
'aaaab'   i=3,j=3
'aab'
   ^
We can only move the second string up 1, but we KNOW that the first character will then match, as the first two elements are identical, so we want the next iteration to be:
'aaaab'  i=3,j=2
 'aab'
   ^
Note that i has not changed. It turns out that we can make things work by never decrementing i (ie, just moving forward along s1), but, given a mismatch, just decrementing j by the appropriate amount, to capture the fact that we are moving s2 up a bit along s1, so the position on s2 corresponding to i's position is lower. We can have an array giving, for each position in s2, the position in s2 that you should backup to in s2 given a mismatch (while holding the position in s1 constant). We'll call this array next[j].
j       s2[j]   next[j]
1       a       0
2       b       1
3       a       1
4       b       2
5       b       3
6       a       1
7       a       2
In fact next[1] is treated as a special case. In fact, if the first match fails we want to keep j fixed and increment i. One way to do this is to let next[1] = 0, and then have a special case that says if j=0 then increment i and j.
'abababbaa'      i=5, j=5
'ababbaa'
     ^          mismatch, so j = next[j]=3

'abababbaa'      i=5, j=3
  'ababbaa'
     ^
-------------------
'abaabbaa'      i=4, j=4
'ababbaa'
    ^           mismatch, so j = next[j]= 2

'abaababbaa'      i=5, j=2
  'ababbaa'
    ^
-------------------
'bababbaa'      i=1, j=1
'ababbaa'
 ^              mismatch, so j = next[j]= 0, increment i and j.

'abaababbaa'      i=2, j=1
 'ababbaa'
  ^
It's easy enough to implement this algorithm once you have the next[..] array. The bit that is mildly more tricky is how to calculate next[..] given a string. We can do this by trying to match a string against itself. When looking for next[j] we'd find the first index k such that s2[1..k-1] = s2[j-k+1..j-1], e.g:
'ababbaa'      s2[1..2] = s2[3..4]
  'aba....'    so next[5] = 2.
     ^
(Essentially we find next[j] by sliding forward the pattern along itself, until we find a match of the first k-1 characters with the k-1 characters before (and not including) position j).

The detailed implementations of these algorithms are left as an exercise for the reader - it's pretty easy, so long as you get the boundary cases right and avoid out-by-one errors.

The KMP algorithm is extremely simple once we have the next table:

i := 1; j:= i;
while (i <= length(s1) and j <= length(s2)) do
begin
  if (j = 0) or (GetChar(s1,i) = GetChar(s2, j)))  then begin
     i := i+1; j:= j+1
  end
  else j := next[j]
end;
(If j = length(s2) when the loop exits we have a match and can return something appropriate, such as the index in s1 where the match starts).

The Boyer-Moore Algorithm

Although the above algorithm is quite cunning, it doesnt help that much unless the strings you are searching involve alot of repeated patterns. It'll still require you to go all along the document (s1) to be searched in. For most text editor type applications, the average case complexity is little better than the naive algorithm (O(N), where N is the length of s1). (The worst case for the KMP is N+M comparisons - much better than naive, so it's useful in certain cases).

The Boyer-Moore algorithm is significantly better, and works by searching the target string s2 from right to left, while moving it left to right along s1. The following example illustrates the general idea:

'the caterpillar'    Match fails:
'pill'               There's no space (' ') in the search string, so move it
    ^                right along 4 places

'the caterpillar'   Match fails. There's no e either, so move along 4
    'pill'
        ^

'the caterpillar'    'l' matches, so  continue trying to match right to left
        'pill'
            ^

'the caterpillar'    Match fails. But there's an 'i' in 'pill' so move along
        'pill'       to position where the 'i's line up.
           ^
'the caterpillar'    Matches, as do all the rest..
         'pill'
             ^
This still only requires knowledge of the second string, but we require an array containing an indication, for each possible character that may occur, where it occurs in the search string and hence how much to move along. So, index['p']=1, index['i']=2, index['l'] = 4 (index the rightmost 'l' where repetitions) but index['r']=0 (let the value be 0 for all characters not in the string). When a match fails at a position i in the document, at a character C we move along the search string to a position where the current character in the document is above the index[C]th character in the string (which we know is a C), and start matching again at the right hand end of the string. (This is only done when this actually results in the string being moved right - otherwise the string is just moved up one place, and the search started again from the right hand end.)

The Boyer-Moore algorithm in fact combines this method of skipping over characters with a method similar to the KMP algorithm (useful to improve efficiency after you've partially matched a string). However, we'll just assume the simpler version that skips based on the position of a character in the search string.

It should be reasonably clear that, if it is normally the case that a given letter doesnt appear at all in the search string, then this algorithm only requires approx N/M character comparisons (N=length(s1), M=length(s2)) - a big improvement on the KMP algorithm, which still requires N. However, if this is not the case then we may need up to N+M comparisons again (with the full version of the algorithm). Fortunately, for many applications we get close to the N/M performance. If the search string is very large, then it is likely that a given character WILL appear in it, but we still get a good improvement compared with the other algorithms (approx N*2/alphabet_size if characters are randomly distributed in a string).

Tradeoffs and Issues

There are other string search algorithms that use different methods, but the Boyer-Moore algorithm outlined above is the most well known, fairly simple, and widely used. Even today, new algorithms are being developed based on the Boyer-Moore one, but (for example) to work with patterns rather than fixed search strings.

The best choice of algorithm (and the average case efficiency of your algorithm) will depend on your 'alphabet size' (how many different characters may appear in a string), and the amount of repetition in the strings. For normal natural language text (e.g., an English document), the KMP algorithm gives little advantage over the naive algorithm, but Boyer-Moore does very significantly better. For a binary string where subsequences of the search string may occur frequently, KMP may do quite alot better than the naive algorithm, while Boyer-Moore may do little better than KMP.

For applications such as text retrieval, where you may be (conceptually) searching though many text files for the occurence of a given string, approaches that go through each file from start to end are likely to be too inefficient. The only practical thing is to create a giant index which indicates for each word/character sequence just which files contain this sequence (and where in the file). You could have a large hash table, that allowed you to rapidly get from a given sequence to the list of files and locations. This obviously requires alot of preprocessing of files, and a fair amount of storage for the index/hash table. However, for any application where many users will be searching the same (fixed) set of texts such indexing will be worthwhile. I won't discuss details of how you'd do it here, just point out that in such cases we would pre-process the files to create indexes rather than searching them on-the-fly from the raw text.

Conclusion

The string matching algorithms introduced vary from a worst case N*(M-1), best case N complexity, to worst case N+M, best N/M complexity. This is a significant gain, without much cost in terms of algorithm complexity (just some jolly clever chaps to think of it in the first place). This is a good illustration of how knowledge of state-of-the-art algorithms may give good gains in performance if you use them in your programs.

Exercise

Try implementing the Boyer-Moore string search algorithm, and the naive algorithm. Do some tests on a large file and count how many character comparisons are done in each for some example searches.

Further Reading

Standish only covers the string DT, not the algorithms. Sedgewick (and most other algorithm books), cover the algorithms above.

Pattern Matching

Motivation

For many search and retrieval tasks we don't want to have to specify exact strings to match against, but rather we want to specify a pattern. For example, we may want to retrieve all texts that inclue the word `carrot', followed by zero or more exclamation marks, followed by space. One way would be to specify a pattern such as `carrot!* '. The `*' means zero or more occurences of the last character.

Matching against patterns is important for many text processing applications. New languages have been developed whose major feature is good string manipulation features such as pattern matching. A good example is the PERL language, which is widely used to process and generate text files to be used on the World Wide Web.

To allow matches against such patterns we need to consider both how to describe such patterns (so that they are both easy to specify by the user and easy to process by the machine), and how we can do efficient matches and searches given such patterns. The next section will briefly discuss how we can represent patterns, and then we'll look at algorithms to do the matching.

Representing Patterns

A pattern can be represented as a regular expression, using special symbols to describe optional repetition and alternatives. etc. A regular expression language may be based on just two special symbols (+ brackets):
[tex2html_wrap_inline1270]
to represent alternatives (OR), e.g., 'a[tex2html_wrap_inline1272]bcd' will match a or bcd. '(a[tex2html_wrap_inline1274]b)cd' will match 'acd' or 'bcd'.
'*'
to allow zero or more occurences of the last character (or bracketted expressin). e.g., 'ab*' matches 'a', 'ab', 'abb' etc.

 

 

Other special symbols are often included for representing regular expressions (e.g., `.' to match any character, '+' for 1 or more occurences, often a character to represent any character except the last one, etc). However, the above is sufficient, just may be rather verbose.

This simple language is in fact very flexible. We can build up quite complex expressions, such as '[ab[tex2html_wrap_inline1276]c]*z' to match 'cz', 'abababz', 'ccabccz' and a whole lot more.

A regular expression may be represented graphically as a finite state machine as illustrated below. (The numbers in brackets are just so we can go discuss it more easily). Each possible string corresponds to a path from start (right arrow on LHS) to end node (box on RHS) of the network. So, for example 'abcabz' corresponds to following the path between nodes numbered 1,2,3,5,1,4,5,1,2,3,5,6.

[figure122]
Figure 2.1: finite state machine for pattern '[ab|c]+z'

We can implement a datastructure based on this graphical representation by having an array holding, for each state, the possible next states. So, state 1 has possible next states 2 and 4. (We could use a Nx2 array, where N is number of states, and agree on what to put in the second position where there is only one successor). There is then a fairly simple algorithm for checking matches of strings with this representation of a pattern. It is based on using a variant of the list datastructure to hold a representation of the states in the network to check when matching along the string. A list such as (1 2 + 5 6) means ``Possible states to go to in order to process the current character are 1 and 2, and possible next states assuming current character already matched are 5 and 6.''. So the representation allows you to explore many paths simultaneously, by keeping the alternatives next states on the list. The special symbol + is used essentially to separate possible states corresponding to the current character from states corresponding to the next character.

A Simple Pattern Matcher

Using this special representation it turns out that we can check if a string matches a pattern using the following simple algorithm (in pseudocode):
Put '+' on list;
State := initial-state; quit := false;

While not state=final-state and position <= length(string) and not quit do
  begin
    if State = '+' then
      increment position in string and put a new '+' at end of list.
    else if State contains current character then put the next state at end of list
    else if State contains choice point, put both next states at front of list

    If list is empty then quit := true
      else pop State off front of list.
  end
A match succeeds if we reach the final state. The position in the string when the loop exits will indicate the shortest initial substring matching the pattern (we'd need slightly different exit conditions to check if the complete string matched the pattern).

As an illustration, consider what would happen given the network above if we try to match the pattern against 'abcz'. To simplify things we'll not show 'State', but just assume it is the first element of the previous list (which is 'popped off' after each iteration):

String position   List of states to process
1                 (1 +)        Choice point: put next states at front
1                 (2 4 +)      State 2 contains 'a', so put next state at end
1                 (4 + 3)      State 4 doesnt match 'b' so no action.
1                 (+ 3)        '+', so increment string pointer, put + at end
2                 (3 +)        State 3 contains 'b', so put next state at end
2                 (+ 5)        '+', so increment string pointer etc
3                 (5 +)        Choice: put next states at front
3                 (1 6 +)      Choice: Put next states at front
3                 (2 4 6 +)    State 2  doesnt match 'c' so no action
3                 (4 6 +)      .. but State 4 does, so put next state at end
3                 (6 + 5)      no action..
3                 (+ 5)        increment string pointer
4                 (5 +)        Choice: put next states at front
4                 (1 6 +)      Choice: Put next states at front
4                 (2 4 6 +)    State 2  doesnt match 'z' so no action
4                 (4 6 +)      State 2 doesnt..
4                 (6 +)        but state 6 does, so put next state at end
4                 (+ 7)        increment string pointer..
5                 (7 +)        REACHED END! Must match pattern.
Think for a minute what one of the lists corresponds to. The list (6 + 5) given string position 3 means: another node to check for the 3rd character is '6'; but there's a next node 5 corresponding to a path we've already found containing the third character (1 2 3 5 1 4).

The algorithm outlined works reasonably efficiently: with a small modification to avoid the possibility of having redundant repeats of the same state, we have a worst case on [tex2html_wrap_inline1280] where M is the number of states and N the length of the string.

To complete our pattern matcher we need code to translate our regular expressions into a finite-state machine (our network). This is a non-trivial problem in itself.

Further Reading

Sedgewick describes the above pattern matching methods in more detail.

Parsing

Parsing is the process of taking a sequence (e.g., of characters), and checking whether it is a legal sequence in a given ``language''. The sequence may also be decomposed into a structure which allows further processing.

As a very simple example, we might parse an expression ``1 + 2 * 3'', determine that it is a legal arithmetic expression, and return a tree structure such as:

               +
              / \
             1   *
                / \
               2   3
In this section we will give a very brief introduction to parsing. However, it is a very large subject, and so the introduction here will be somewhat superficial, and the methods presented should not be blindly adopted for more serious applications without further study of the subject.

Motivation

Parsing is essential for processing computer programs. A computer program must be checked to see if it is syntactically correct, and a datastructure reflecting the structure of the program built up (e.g., making clear where different statements begin/end). Then it can be compiled into a low level assembly language (or interpreted).

Parsing is also important for the interpretation of ``natural'' language such as English, and for other applications such as the processing of ``marked up'' text files (SGML). We'll include some examples of parsing natural language, as the grammar involved should be more familiar.

Context Free Grammars

Parsers presuppose that there is some grammar that defines the legal expressions in a language. For parsing natural language this will just be a grammar of English. For parsing expressions of a programming language this will be the grammar defining legal constructs of that language.

Programming languages are often described using a particular type of grammar known as a context-free grammar. For example, the following grammar rules define legal arithmetic expressions involving * and +:

<expression> ::= <term> | <term> + <expression>       (rule 1)
<term>       ::= <factor> | <factor> * <term>         (rule 2)
<factor>     ::= ( <expression> ) | v                 (rule 3)
This grammar is meant to allow expressions like `1 + (1 + 3) * 6', but NOT expressions like `1 + * 2 ('. The grammar is such that the right structure will be assigned to the expression, as well as invalid expressions rejected.

The rules above can be read as follows: ``An expression consists of a term OR a term followed by a `+' followed by an expression. A term consists of a factor OR a factor followed by a `*' followed by a term. A factor consists of an expression in brackets OR a single letter/digit.'' (`v' stands for any letter or digit). So in the notation above, `::=' can be read as `consists of' or 'is a' and `[tex2html_wrap_inline1282]' means OR. In the above grammar, '+' and '*' are examples of terminal symbols corresponding to symbols in the language described, while 'expression' and 'term' are examples of non-terminal symbols, which are internal to the grammar. One non-termal is distinguished as the start symbol (in this case `expression'). A sequence is recognised by the grammar if we can rewrite it into a sequence containing just the start symbol, by repeatedly replacing sequences matching the right hand side of a rule with sequences matching the left hand side of the rule.

Simple Parsing Methods

Parsing methods can be divided into top-down methods and bottom-up methods. These are illustrated schematically in figure 2.2, for a tiny grammar of English. The illustration for bottom up parsing starts with the sentence itself, and uses the grammar rules to determine how combinations of words can be rewritten as more abstract grammatical categories (e.g., ``The dog'' is a noun phrase). The parse succeeds if this rewriting process concludes with the whole sentence rewritten to the start symbol (sentence). Top down parsing, on the other hand, starts with this start symbol, and rewrites/expands symbols until it finally gets down to a sequence matching the sequence being parsed.

[figure163]
Figure 2.2: Top Down and Bottom up Parses, for a fragment of English

Both top-down and bottom-up parsing are based on the idea of repeatedly modifying a sequence of symbols (based on allowed rewrites in the grammar rules), with the aim of getting from a sequence consisting of just the start symbol, to our sequence to be parsed. This can be illustrated as follows:

Top Down

(Sentence)                       apply rule 1
(NounPhrase VerbPhrase)          apply rule 2
(Article Noun VerbPhrase)        apply rule 4
(the Noun VerbPhrase)            apply rule 5
(the dog VerbPhrase)             apply rule 3
(the dog Verb)                   apply rule 6
(the dog jumps)                  SUCCEEDS

Bottom up

(the dog jumps)                  apply rule 6
(the dog Verb)                   apply rule 3
(the dog VerbPhrase)             apply rule 5
(the Noun VerbPhrase)            apply rule 4
(Article Noun VerbPhrase)        apply rule 2
(NounPhrase VerbPhrase)          apply rule 1
(Sentence)                       SUCEEDS
Now, of course, the grammar in the figure is a bit limited! In fact, it can ONLY recognise the sentence 'The dog jumps'. There are no options or alternatives. In any real grammar there will be many alternative structures to explore, and that's where the difficulty in parsing arises. This is particularly a problem when parsing natural language. For artificial languages such as programming languages it is not so bad, but we still need ways of choosing between alternatives. For example, if we have the rule:
<factor>     ::= ( <expression> ) | v
and we are doing a top-down parse, how do we know whether to rewrite <factor> to ( <expression> ) or to v (letter/digit). For programming languages, it is often possible to do this by looking ahead one character. In the above rule, if the next character in the sequence you are parsing is '(' then the first option is correct, otherwise it is the second option. Often, in order to make this one-char lookahead work, the grammar must be modified into an equivalent, but easier to process form.

A Top-Down Parser

A simple top-down parser (with one char lookeahead) can be written by effectively translating the grammar rules into procedures. (This may result in a hard to modify system, but illustrates the general idea). We don't bother with an explicit sequence of symbols (we're basically relying on the sequence being implicitly represented as the internal stack of procedures still to be executed). We assume that we are traversing an input string S, and that the current position in that sequence is held in a global variable j.

As an example, the grammar rule above could be implemented as:

Procedure ParseFactor();
begin
  If GetChar(S, j) = '(' then
    begin
      j := j+1;
      ParseExpression;
      If GetChar(S, j) = ')' then 
         j := j+1
      else 
         error
     end
   else if LetterOrDigit(GetChar(S, j)) then
     j := j+1
   else error()
end;
This assumes that there is some error function that will, at least, set a flag to indicate failure. Although rather inelegant, the approach is at least simple, and illustrates the 'one char lookahead' very explicitly in selecting between alternatives in the relevant grammar rule.

A Bottom-Up Parser

A simple bottom-up parser, known as a shift-reduce parser can be implemented using a stack to hold a sequence of terminal and non-terminal symbols. Symbols from the input string can be shifted onto this stack, or the items on the stack can be reduced by applying a grammar rule, such that the right-hand-side of the rule matches symbols on the top of the stack.

The basic mechanism can best be illustrated using an example. (Note that this is similar to the example near the beginning of this section, but we are specifying more precisely the actions to be taken at each iteration). I'll give examples both for natural language, and for arithmetic expressions (I'll use square brackets in the arithmetic expressions as otherwise its confusing).

Stack                  Input Sequence
()                     (the dog jumps)
(the)                  (dog jumps)      SHIFT word onto stack
(Art)                  (dog jumps)      REDUCE using grammar rule
(Art dog)              (jumps)          SHIFT..
(Art Noun)             (jumps)          REDUCE..
(NounPhrase)           (jumps)          REDUCE
(NounPhrase jumps)     ()               SHIFT
(NounPhrase Verb)      ()               REDUCE
(NounPhrase VerbPhrase)()               REDUCE
(Sentence)             ()               SUCCESS

()                       (2 * [ 1 + 3 ])  SHIFT
(2)                      (* [ 1 + 3 ])    REDUCE using rule 3
(<Factor>)               (* [ 1 + 3])     SHIFT
(<Factor> *)             ([ 1 + 3])       SHIFT
(<Factor> * [)           (1 + 3])         SHIFT
(<Factor> * [ 1)         (1 + 3])         REDUCE (2 times..)
(<Factor> * [ <Term>)     (+ 3 ])         SHIFT (twice)
(<Factor> * [ <Term> + 3) ( ])            REDUCE (3 times)
(<Factor> * [ <Term> + <Expression>) ( ]) REDUCE using rule 1
(<Factor> * [ <Expression>) ( ])          SHIFT
(<Factor> * [ <Expression> ]) ()          REDUCE using rule 3
(<Factor> * <Factor>)     ()              REDUCE using rule 2
(<Factor> * <Term>)       ()              REDUCE using rule 2
(<Term>)                  ()              REDUCE using rule 1
(<Expression>)            ()              SUCCESS
In the second example, there were many cases where it was unclear whether to shift or reduce. This is a complicated issue. We can build shift-reduce parsers (for certain types of grammar) that use some look-ahead to determine whether to shift or reduce, in a similar manner to that outlined in the example for top down parsers. However, we'll leave it as an important issue to be discussed in further courses.

Further Reading

Both Standish and Sedgewick have relevant sections on parsing (Sedgewick ch 21, Standish ch 14).

Exercises

Try to write code to 'parse' a string such as '(1 (2 3))' and return a generalised list datastructure corresponding to that string. (You don't really need to use any of the parsing ideas in this section, but it illustrates some of the problems. A couple of functions is all that is required). Write a procedure to write out your list structure, but now in the following form '[1, [2, 3]]'.

Note that now we have got part way towards writing a simple LISP interpreter. A Lisp interpreter takes an input string such as '(first (quote (a b c)))', constructs a generalised list representation of it, then will evaluate the expression by applying a function corresponding to the first item in the list to the results of evaluating the remaining items in the list. (The 'quote' function suppresses evaluation, so the list is treated literally).

File Compression

In this section we will briefly introduce file compression algorithms. These are concerned with ways to reduce the space a file occupies on the file store. Methods are required that can (fairly) rapidly compress a file, such that the compressed file occupies less space, yet can easily be uncompressed to recreate the original file.

As a simple example of how we might compress a file, suppose we replace every occurence of the word 'the' with the symbol '#' (and deal with actual occurences of '#' in some way!). The resulting file would take up less space, but it would be easy to recreate the original. In general we can try and replace common sequences with shorter 'codes' for those sequences.

Motivation

Despite the rapid decrease in the cost of storage (hard disk, CD etc), file compression methods are getting more and more important. It is becoming common to keep the complete text of an encyclopaedia, complex colour images, or even longish video extracts, on a hard disk or CD. Such texts, images and videos normally contain alot of redundancy -- texts have oft repeated words, images large homogeneous areas, and videos have frames that are very similar to the last. By exploiting this redundancy we can compress these files to a fraction of the original (maybe 50% for text, 10-20% for images/videos).

There are four broad classes of compression algorithm, each good for different types of data. These will be discussed in turn.

Run-Length Encoding

A simple compression method uses the fact that, in certain sorts of files, we often get `runs' of repeated characters. A sequence such as:
aaaaabbbbbbbbbccccc
could be represented more concisely as:
5a 9b 5c
For binary files we can be even more concise:
11111111111000000011111111111111
could be represented simply as
11 8 14
as we can assume that we are alternating between 1s and 0s. Of course, numbers take up more space (bits) than 1s/0s to represent, but we can still get good savings if we have quite long runs of 1s or 0s (as may be common in some simple image formats, for example).

Run-length encoding isn't much good for text files, as repeated sequences aren't that common. However, it is quite useful for images. Many standard image-file formats are essentially compressed versions of the simple bitmap representation -- run-length encoding is one compression method used.

Variable-Length Encoding

Variable-length encoding exploits the fact that some characters in a string are more common than others. So, rather than using, say, a full 8 bits to represent each character, we encode common characters using a smaller number of bits. Consider the string 'the'. Using an ASCII file representation, we might encode 't' as 116, h as 104, e as 101, or in binary: 1110100 1101000 1100101. However, we might say that as 'e' is most common, followed by 't', we'll encode these as '1' and '10'. If, say, 'h' was '101' then we'd have the shorter encoding: 10 101 1.

One problem with this is that we have to include spaces between the codes somehow, in order to tell where one code begins and another ends. If we had an encoding such as 101 we wouldnt know whether this represented 'h' or 'te'. One way to avoid this problem is if we avoid using two codes such that one code consists of the first few characters of another (e.g., 101 10).

An acceptable set of codes can be represented as leaves in a binary tree:

[tex2html_wrap1290]

A code for a letter corresponds to the path from the root to the leaf labelled with the letter. So, we have T: 00, H: 010, A: 011, E: 1. If we have an encoding such as '000101' we can uniquely determine what it represents and translate back to the uncompressed form. The decoding can even be done efficiently, using the binary tree as a datastructure to traverse to find the decoding of a given bit sequence.

To actually construct an optimal tree, we can use data on the frequencies of difference characters. A general method for finding optimal codes was developed by D. Huffman, and the coding scheme is referred to as Huffman encoding.

The method involves building up a binary tree, bottom up, storing in each node the total frequency count for the characters below that node. Lets say we're only dealing with the letters t, a, e and h, and the frequencies are 10, 5, 15 and 3. We start off by creating leaf nodes for each of these:

      t:10       a:5          e:15         h:3
Then we pick the two with lowest frequency, and combine them, creating a new node with a combined frequency count:
         *:8
       /     \
     h:3      a:5       e:15         t:10
We then repeat this, but now the nodes we consider are the new one, e and t:
                *:18
             /      \
         *:8         \
       /     \        \
     h:3      a:5      t:10     e:15
and finally:
                       *:33
                     /    \
                *:18       \
             /      \       \
         *:8         \       \
       /     \        \       \
     h:3      a:5      t:10     e:15
We now have an efficient encoding scheme where the most common letter 'e' has a single bit code (1), the next most common 't' has a 2 bit code,

The code to construct such a tree, and use it in encoding/decoding is fairly simple, and described in Sedgewick, ch 22.

Huffman encoding is worth doing for text files, though you won't get any very dramatic compression (maybe 20-30%). It may also be useful for image files, where (for example) a small range of colours may be much more common than others.

Substitutional Compressors

The basic idea behind a substitutional compressor is to replace an occurrence of a particular phrase or group of bytes in a piece of data with a reference to a previous occurrence of that phrase. There are two main classes of schemes, named after Jakob Ziv and Abraham Lempel, who first proposed them in 1977 and 1978.

The LZ78 family of compressors

LZ78-based schemes work by entering phrases into a *dictionary* and then, when a repeat occurrence of that particular phrase is found, outputting the dictionary index instead of the phrase. The dictionary is constructed as the string is traversed. New strings are only added if they extend a current string in the dictionary. By this means, you will tend only to get long strings in the dictionary if they occur frequently in the text.

The LZ77 family of compressors

LZ77-based schemes keep track of the last n bytes of data seen, and when a phrase is encountered that has already been seen, they output a pair of values corresponding to the position of the phrase in the previously-seen buffer of data, and the length of the phrase.So, we might replace the string 'the cat ate the dog' with 'the cat ate (1,4)dog'.

Substitutional compressors are good for strings with frequently reoccuring subsequences. So, they are useful for text files, but also for graphics files where a certain texture or colour may be represented by a repeated sequence. Many popular compression and archive programs (e.g., zip, gzip, compress) are based primarily on substitututional methods.

JPEG and MPEG

All the compression methods discussed so far are lossless, which means that you can restore an exact copy of the original. Although vital for texts, for images lossy compression may be acceptible, where some fine detail in the image may be lost. JPEG is a lossy compression method for images where very good compression may be achieved at the expense of losing some detail in the image. JPEG is designed so that the detail lost is normally not perceivable by the human eye.

JPEG allows you to vary the degree of ``lossiness''. If you want a very small file size, but can put up with a slightly poor quality image, you can set parameters appropriately.

JPEG is jolly complicated. However, there is one key idea that is worth explaining. The basic idea is that you transform (bits of) the image into frequency space by applying a kind of fourier transform. Then you throw away some of the details of the high (spatial) frequencies. The result of this is that gradual changes are kept, but fast or abrupt changes may be lost. (If you've never heard of fourier transforms this probably won't make much sense, but it would take to long to explain).

As an illustration, here's a Piglet before and after its been compressed with JPEG with quality set very low:

[tex2html_wrap1292]

[tex2html_wrap1294]

The size of the file has been reduced by a factor of 20, but the image quality is poor. With higher quality settings any decrease in quality may be barely perceivable by human eye, yet the image size may still be reduced by a factor of 5 or so.

MPEG is a lossy compression algorithm for video. Very roughly, it uses JPEG like compression for a frame, but does uses difference information between frames to reduce the information needed for each frame. (Two successive frames may be virtually identical).

Any lossy compression algorithm is useless for text. It's no good saving your essay, only to find your compression program has removed all the 'details'! However, typically lossy algorithms can give greater compression than lossless methods, and for video, audio or graphics the result may not be noticeably poorer than the original.

Summary

We've discussed here four broad categories of compression algorithm. Each tends to be good for different kinds of file. For example, run-length may be good for artificial images, where there may be long 'runs' of one colour. Substututional will be quite good for texts, as will Huffman encoding. Lossy algorithms such as JPEG which lose detail are OK for images/video, but no good for text. All the algorithms discussed result in reasonable compression/decompression times (though we've just focused on the actual formats, not the details of the algorithms).

In practice many compression programs combine aspects of each approach. For example, Huffman encoding can be used in combination with any of the other techniques, achieving further compression.

Further Reading

Standish, 9.10-9.11, Sedgewick ch22.

Exercises

Try implementing run-length encoding, and testing it on: some images created with a 'draw' program; some scanned in images. Which get greater compression?

Play with an image processing program such as XV, looking at the resulting size of some different image formats (GIF, JPEG, Sun Raster etc). Again, try both with artifificial and natural images.

Cryptography

In the last section we looked at methods to encode a string in order to save space. Here we'll look at methods for encoding a string in order to keep it secret from your enemies!

We'll briefly discuss a number of encryption methods, from the simplest, to an outline of current techniques. We'll be focusing on the encoding methods (cryptography) rather than methods of trying to break the codes (cryptanalysis).

Motivation

Although traditionally used for military/diplomatic communications, encryption is now becoming particularly important for things like electronic funds transfer over the network. There are, for example, encryption methods being used within WWW browsers such as Netscape, to try and ensure that credit card transfers etc are secure. There are active arguments about the security of the encryption methods used (the current system has been broken, but only with very significant computational resources).

A Basic Outline and Terminology

Encryption/decryption systems fit into the following framework. The sender (of the secret message) encrypts their message (the plaintext) using an encryption method and a key. The resulting string (the cyphertext) is sent to the receiver, who uses a matching decryption method, and a key, to tranform the cyphertext back into the original message.

The aim is to develop encryption methods such that someone can't work out the message given the code (cyphertext) without knowing the key.

In practice, for many applications we might be willing to have a system where, in principle, one could decode it without the key, but it would be so very expensive it wouldnt be worth it. (Internet financial applications fit this: People are probably not going to use many computer-years of effort to obtain credit card details which might result in a few hundred pounds gain).

Encryption methods can be divided into symmetric and asymmetric systems. In symmetric systems the same key is used by both the sender and the receiver. In asymmetric systems the sender uses one key (the public) key, and the receiver uses another (the private key. Symmetric systems are simpler, so will be described first, but asymmetric methods are now more important for most applications.

Simple Symmetric Methods

One of the simplest methods (favoured by Calvin) is to encode the message such that the Nth letter in the alphabet is replaced by the N+Kth. (The key is jut K). So, for K=2:
message: attack
code:    cvvcem
However, this is pretty easy to decode: Just try the 26 possibilities until you get one that results in a meaningful message.

A better approach is to use a key that gives a mapping from letters of the alphabet (plus spaces) to letters that should replace them:

 abcdefghij....
zqjprtoisva...

message: a cab
code:    qzpqj
but this is also pretty easy to break: we could use letter frequency information (e.g., spaces, ts and es are common) to guess at the key.

To avoid this we could use a method closer to the first, but with a repeated key to decide on the value of K.

key:     abcabc
message: attack
code:    bvwben
or a method that combines these two ideas, with an offset depending both on a repeated key and the letter itself.

If the key is as long as the message itself (and only used once) then this encyption method is provably secure. But there is the problem of distributing the key. It is only really useful in applications where a long key can be distributed ahead of time, in order to allow urgent messages to be safely transmitted at a later date.

For simple encryption methods it is often possible to decode a message if you know (or can guess) part of the message. Perhaps you know that an email message starts with a standard header (From: ... Return-Path ..). Obviously if a simple repeated key encryption method is used, with a short key, then this would allow someone to break it. Speech signals often have large areas of silence. This too may be used to find the key.

Many encryption methods involve ``generating'' a long key from a shorter key. So, rather than just repeating a short key, we have some cunning method for creating a longer key out of the shorter one, that creates a pseudo random key that is hard to break, even if some plaintext is known.

If you don't know any of the plaintext, breaking the code will be harder whatever encryption method: possible strategies are brute force methods (try all keys until one produces something with dictionary words in), and statistical methods (use knowledge of letter/word frequencies). However, these strategies will also tend to be foxed if a long pseudo key has been generated.

Asymmetric Systems: Public Key CyptoSystems

Symmetric cryptosystems have a problem: how do you transport the secret key from the sender to the recipient securely and in a tamperproof fashion? If you could send the secret key securely, you wouldn't need the symmetric cryptosystem in the first place (because you would simply use that same secure channel to send your message). Frequently, trusted couriers are used as a solution to this problem. Another, more efficient and reliable solution is a public key cryptosystem.

In a public key system, different keys are used to encrypt and decrypt. The receiver has a private key that can be used to decode any message encoded using his public key. But the public key is NOT enough to decode the message. To send a message to someone, you look up their public key (which is, as you might expect, quite public), encode your message with it, and sent it. The receiver can decode it using his private key. We can express this semi-formally as follows. If P(M) is the encoding of M using the public key, S(M) is the decoding using the private key, we want a system where S(P(M))=M (for all M), but where S cannot be (easily) derived from P.

A method that enables this is RSA public-key encryption. Here, the public and private keys are just pairs of BIG numbers. Let the public key be (N, p) and the private key (N, s). The plain text is broken up and translated into numbers (by whatever method), then encryption is just: [tex2html_wrap_inline1296], decryption [tex2html_wrap_inline1298]. For this to work we want keys such that [tex2html_wrap_inline1300] (and such that S cant be derived easily from P). It turns out that if we generate 3 random prime numbers x,y,and z, make S be the largest (say z), N the product of x and y, and P a number such that PS mod (X-1)(Y-1)=1, then we have suitable keys. (You can try this out with small primes, but really we'd use prime numbers with hundreds of digits!). Finding suitable keys (and trying to break codes!) using this system requires number theory algorithms, such as algorithms for finding greatest common divisors, large primes, prime factors etc.

Encryption and decryption using this method is relatively time consuming, compared with other methods. It is therefore common to effectively use public key encryption just to encode a session key to be passed with the message to the receiver, and to encode the message itself using simpler methods (using the session key).

Netscape (a WWW browser) uses RSA public-key encryption to encode session keys that are then used in less costly symmetric encryption algorithms. If, say, a bookstore wanted to setup so that people could securely send orders which included their credit card details and address, they would get hold of suitable public and private keys, then tell their customers their public key (or rather, incorporate the key in the electronic order form so that it is automatically used at the client end to encode messages sent with forms from the server). Now, if a client sends a message using that public key, that message can (in principle) only be read by the bookstore, so it is only the store which has the credit card details.

Summary

Encryption/decryption methods can be divided into symmetric and asymmetric, depending on whether the same key is used for encryption and decryption. There are many simple symmetric methods, but most are liable to attack, using knowledge of letter/word frequencies, or given some fragment of known plaintext. More complex methods avoid this by generating a pseudo-key from a given key which is sufficiently random looking to avoid statistical tricks, and which ``spreads'' a key throughout the text. Asymmetric (public key) methods work by letting the receiver tell the sender a key which can be used to encrypt the message, but having a separate key for decryption. This method can also make encryption transparent to the user: the system they use can lookup the receivers public key and encode the message without them ever even knowing what a key or encryption is.

Exercises

Implement a simple repeated key encryption scheme. Implement a decryption method that will work if you know some initial plaintext at least as long as the key. Suggest a way of generating a long pseudo-key that will foil your simple scheme.

Further Reading

Sedgewick (Ch 23) has a reasonable intro. More up to date material is easily available on the WWW. The details of commercial and government encryption methods are often hard to find out about for understandable reasons.

Graph Algorithms

Introduction

Motivation

Many real-life problems can be formulated in terms of sets of objects and relationships or connections between objects. Examples include: A graph is just a datastructure that consists of a set of vertices (or nodes) (which can represent objects), and a set of edges linking vertices (which can represent relationships between the objects). A tree (met in DS&A 1) is a special kind of graph (with certain restrictions). Graph algorithms operate on a graph datastructure, and allow us to, for example, search a graph for a path between two given nodes; find the shortest path between two nodes; or order the vertices in the graph is a particular way. These very general algorithms can then be used to solve problems of the kind mentioned above, where the data is represented as a graph. For example, search algorithms can be used find a possible winning move in a game; shortest path algorithms can be used to find the shortest route between two cities; ordering algorithms can be used to find a possible sequence of courses to take, given prerequisite relationships between them.

Terminlogy and Definitions

There is alot of terminology associated with graphs that needs to be introduced. Figure ? shows examples of two graphs, illustrating some of the main ideas:

[figure259]
Figure 3.1: Example Graphs

Graphically we can think of a graph as a collection of points, some of which are connected. These connections can be two way or one way (illustrated by arrows, for one way connections). Graphs with one way connections are referred to as ordered. In an unordered graph, if there is a connection from vertex 1 to vertex 2, there also is from node 2 to node 1. Ordered graphs may be cyclic or acyclic depending on whether it is possible to get back to a vertex by following arrowed links. Two nodes are adjacent if an edge connects them, and a path is a sequence of adjacent vertices. Graphs may be connected or unconnected as illustrated in the figure. Vertices may be labelled as illustrated in the first example in the figure, where the labels are just integers.

We often use family tree terminology for graphs. The children of a node n1 are those nodes such that there is an edge from n1 to the child node. (These are also sometimes called the neighbours of the node, particularly when talking about general graphs rather than trees). The parents of a node is the reverse. Ancestors, siblings and descendants are what you might expect.

Formally, a graph G = (V, E) where V is a set of vertices, E is a set of edges, where each edge is a pair of vertices from V. Standish gives formal definitions for paths, cycles, adjacency, connectivity etc in terms of this formal description.

A Graph ADT

An abstract datatype for a graph should provide operations for constructing a graph (adding and removing edges and vertices) and for checking connections in a graph. If we allow labelled nodes then there will be additional operations to add, access and remove labels. Nodes can be just indicated by integers. Assuming that we have suitable type declarations for label and graph, the following is a reasonable minimal set of operations:
Procedure CreateGraph(var G: graph; size: integer);
{ Creates and initialises a graph of given size (no. of nodes)}

Procedure AddEdge(var G: graph; N1, N1: integer);
{ Adds edge from N1 to N2}
{Pre: N1 and N2 are nodes in graph}

Procedure RemoveEdge(var G: graph; N1, N2: integer);
{ Removes an edge}
{ Pre: There is an edge in G from N1 to N2}

Function EdgeExists(var G: graph; N1, N2: integer): boolean;
{ True if there is an edge in G from N1 to N2}

Procedure SetLabel(var G: graph; L: graphlabel; N: integer);
{ Adds a label L to node N}
{Pre: N is a node in G}

Function GetLabel(var G: graph; N: integer): Graphlabel;
{ Returns the label of a node}
{ Pre: N is node of G}

Function Neighbours(var G: graph; N: integer): EdgeList;
{Returns a linear list containing all the neighbours of N}
{Pre: N is a node in G}
(This assumes type definitions for Graphlabel and EdgeList; GraphLabel could be a fixed length string, while EdgeList could be a fairly standard one way linear list or stream).

Other operations might be useful, but this provides a reasonable minimal set.

Implementing a Graph ADT

It turns out that there are two reasonable implementations for a graph ADT.

Adjacency Matrix

For a graph of N nodes, a simple representation is just an NxN matrix of boolean values -say G[1..Max,1..Max] of boolean. An edge between nodes n and m is indicated by a 'True' entry in the array G[n,m] (lack of an edge by 'false'). For a labelled graph, a further 1d array can give the labels of each node. Finally, an integer can be used to represent the size (number of nodes) of a specific graph.

The first graph in the figure would be represented as follows:

  1 2 3 4 5
1 F T F T F
2 F F T F F
3 F F F F F
4 F F T F T
5 F F F F F
The details of this are left as an exercise. In fact, exercise 2. All the ADT operations specified above can be straightforwardly implemented using this representation.

Edge Lists

The above representation is very simple, but it is inneficient (in terms of space) for sparse graphs, ie, those without many edges compared with nodes (vertices). Such a graph would still need an NxN matrix, but would be almost full of 'False's. So an alternative is to have associated with each node a list (or set) of all the nodes it is linked to via an edge. So, in this representation we have a one dimensional array, where each element in the array is a list of nodes.

For the same graph as the one illustrated above this would give:

1   (2 4)
2   (3)
3   ()
4   (3 5)
5   ()
Now comes the question of how to represent the list of nodes. One way is to use linked lists. The graph would involve an array, where each element of that array is a linked list. Some of the graph operations are a little more tricky using this representation. However if you have a linked list ADT defined, you can just 'use' this, rather than redefining basic operations - In pascal unit terms you would have one unit using another (more primitive) unit.) You can actually go further in re-using existing ADTs, as the table ADT gives much what is required.

You might think that an easier method would be to use Pascal's set datatype rather than a linked list. But, in fact, Pascal sets are implemented as bit-arrays (e.g., 1000100, for a set containing item 1 and item 5) so this in fact reduces back to the previous representation.

Which to use?

As is so often the case, the best implementation for a graph depends on properties of the graphs we want to operate on, and on which operations are likely to be used most often. The adjacency matrix representation is not space efficient for sparse graphs, but certain operations such as EdgeExists are likely to be much more efficient using adjacency matrices (particularly for large dense graphs).

Graph Search Algorithms

A graph search (or graph traversal) algorithm is just an algorithm to systematically go through all the nodes in a graph, often with the goal of finding a particular node, or one with a given property. Searching a linear structure such as a list is easy: you can just start at the beginning, and work through to the end. Searching a graph is obviously more complex.

Breadth first and depth first search

There are two main ways to traverse a graph: depth first and breadth first. If we start at a particular node (say, n1), then in breadth first search, all nodes path length M away from n1 are searched before all nodes path length M+1 away. In depth first search, if a node n2 is searched, all nodes connected to n2 are searched before any other nodes. We can also describe it in terms of family tree terminology: in depth first the node's descendants are searched before its (unvisited) siblings; in breadth first the siblings are searched before its descendants.

Tree search

For trees (which are, as we said, just a specialised kind of graph, where each node has only one 'parent' and cycles arent allowed)) this distinction is clear graphically. As we see in figure 3.2, in breadth first we search across the tree before we search down. In depth first we follow a path down, before we search across.

[figure292]
Figure 3.2: Depth first and depth first search

For trees, there is a very simple algorithm for depth first search, which uses a stack of nodes:

push(start-node, stack); {(assumes stack initially empty..)}
repeat
  current-node := pop(stack);
  visit(current-node);
  for {each node n in Neighbours(graph, current-node)} do 
     push(n, stack);
until empty(stack)
Pseudocode has been used to describe putting all neighbours of the node on the stack; you'd actually need to either use list traversal of neighbours or check through all possible nodes and add them if they are a neighbour. 'Visit' is assumed to be some procedure that processes each node. If, rather than processing a node, you want to find a given node, the code would be the same, but would exit when the current-node is the one you're looking for.

We can see how this will work for the tree in the figure above. (The trace below shows 'stack' on entering the body of the loop each time, and 'current node' after it has been popped off stack).

Stack           Current-node    Neighbours
(1)             1               2,3,4
(2 3 4)         2               5,6
(5 6 3 4)       5               none
(6 3 4)         6               none
(3 4)           3               7
(7 4)           7               none
(4)             4               8,9
(8 9)           8               none
(9)             9               none
()
Note that the order of current-nodes is correct for depth first.

The algorithm for breadth first is exactly the same BUT we use a queue rather than a stack of nodes: put the neighbours on the BACK of the queue, but remove current-node from the front. I'll leave it as an exercise for the reader to verify that this leads to the right search. Depth first search also has a simple recursive version of the algorithm, but using a stack makes it more explicit what is going on.

Graph search

If we are searching a general graph rather than a tree it is necessary to keep track of which nodes have already been searched, as they might be met again. If we don't do this, then if there are cycles in the graph the loop might never terminate. Even if there are no cycles, redundant work is done, re-visiting old nodes. Avoiding revisitted previously visited nodes leads to the following modified algorithm, using a set Visited of visited nodes. Again, I'll leave it a an exercise for the reader to work through how it works for some example graphs, and how it is modified for breadth first.

Note that for any of these algorithms, it will only result in nodes in a connected region of the graph being traversed. To traverse the whole of a not fully connected graph you would have to try different starting nodes.

push(start-node, stack);
visited := []; 
repeat
  current-node := pop(stack);
  if not (current-node in visited) then begin
     visit(current-node);
     visited := visited + [current-node];
     for {each node n in Neighbours(graph, current-node)} do 
         if not (n in visited) then
            push(n, stack)
  end
until empty(stack)

Returning the path

So far, the algorithm(s) will process (visit) all the nodes in a graph, or with a minor modification, check whether a node satisfying a given property can be found in the graph, from a given starting point. However, what is often of more use is if the path from a start node to a target node can be returned. The graph is searched for that target node, and when reached, the loop exits and the path is returned.

It turns out that there is a simple trick that can be used to find the path. Whenever we push a node onto the stack, we also make a record of that node's parent (ie, the current-node). As we only push nodes on the stack if they aren't already visited, this will result in a single parent being recorded for each node (so we can just hold the parent record in a 1-d node array). Once the target has been found, we can find its parent, its parent's parent, that node's parent, and so on to the start node. That sequence will be a path from start to target.

This is illustrated in the modification of the algorithm given below. A 'writepath' procedure would also be required to write out the resulting path, given path array, start node and target node.

push(start-node, stack);
visited := []; 
repeat
  current-node := pop(stack);
  if not (current-node in visited) then begin
     visit(current-node);
     visited := visited + [current-node];
     for {each node n in Neighbours(graph, current-node)} do 
         if not (n in visited) then begin
            push(n, stack);
            parent[n] := current-node
         end
  end
until empty(stack) or current-node=target;

Example application

The second assessed exercise will involve constructing and searching a graph. Graph search methods are used in, among many other things, solving simple AI problems which involve find a way to achieve one state from another state. Such problems include game playing problems, when the 'target' state is a winning state; and robot planning problems, where the target state specifies what you want the robot to achieve (e.g., he's brought you your beer, or assembled a car). The methods are often illustrated by considering simple puzzles. The general idea is that a possible 'state' of the problem is going to be represented as a node in the graph. Possible actions, which get us from one state to another, are represented as edges. If we can search from an initial state, and find a target state, then the path between these states will correspond to the sequence of actions that will achieve your target state.

Weighted Graphs and Shortest Path Algorithms

For many applications it is useful if edges in a graph can be labelled with a weight. For example, we could have a graph representing possible routes between cities, where the weight might be the distance along a given route. In other applications the weight might represent the time taken to traverse an edge (e.g., time to get from one city to another, or to do some task), or the cost of traversing an edge (e.g., the cost of taking some action corresponding to an edge).

Such a graph is referred to as a weighted graph. A weighted graph is generally implemented using an adjacency matrix: whereas before the matrix was an NxN array of booleans, now we use an NxN array of integers or reals, where the number indicates the weight of an edge (where there is no edge between two nodes this corresponds in principle with a weight of infinity, but some convention can be adopted for what to put in the matrix for such cases). The ADT definition would be much as before, but with further operations to modify and access weights.

Given a weighted graph, we may want to find the shortest path between two nodes, where the 'length' of the path is just defined as the sum of the weights on the relevant edges. This might be used to, say: find the shortest route between cities; the quickest train journey; the cheapest way to do a sequence of tasks.

If the weights on every edge are the same (e.g., a weight of 1) then we already have an algorithm that will do the trick: Breadth first search. This always explores paths of length N before paths of length N+1, and so will find the shortest path first.

For arbitrary weights, things are a little more complex (at least, if we want to do the task reasonably efficiently). It is necessary to keep track, as we search, of the shortest distance we have found so far connecting the start node to each other node. This information can be kept in an array (say, called shortest). Then, as we search, given a number of possible nodes to explore, the one that is the shortest distance from the start node is explored first. (This is like breadth first, but it is kind of 'shortest' first). If we do this we find that when we find the target node, the path we have effectively traversed is guaranteed to be the shortest.

I'll go through how this works with an example (modified from Standish's example). Suppose we are searching the graph in figure ?, in order to find the shortest path from node 1 to node 5. We'll use a set W to denote the nodes so far examined, and I'll give entries in the wet as Node:PathWeight, so initially :

[figure319]
Figure 3.3: Example weighted graph

W=[1:0].

We then look at the nodes connected to 1, and record the shortest distance from node 1 to these nodes, ie:

shortest[2] = 3; shortest[6]=5

The one with the shortest distance is added to W, and the 'shortest' array updated to also hold information about the nodes connected to 2:

W = [1:0,2:3]; shortest[3]=10

Now, the node with the shortest distance, not already in W is 6, so this is added, and shortest updated as before:

W = [1:0,2:3,6:5]; shortest[4]=7

note that we will also have discovered another path to node 3 via 6, but the distance is 13, so more than shortest[3]=10 (added earlier). So this path is ignored.

Now the best node not in W is 4:

W= [1:0,2:3,6:5,4:7]; shortest[5]=7+6=13

Best is now node 3:

W = [1:0,2:3,6:5,4:7,3:10]; shortest[5]=11 (overwriting old value)

Best is now node 5, with shortest path length 11. If that's the node we were looking for, we now know its shortest path from node 1. Or we could continue to find the shortest paths for all the nodes. With a small extension to the approach, we could have recorded the path taken, which could be returned.

The actual algorithm for this is as follows. T[i,j] gives the weight between two nodes; NodeSet is the set of all nodes; Max is the largest number that we need consider (ie, same as using infinity). I've only elaborated the last part of the loop: more details on the rest of the stages are in Standish, if it is not obvious.

{Initialise shortest array so shortest[i]=MAX if i<>start, else 0.}
W := [start-node];
V := NodeSet - W;
while W<>V do begin
   {find the vertex w in V-W at minimum distance from start-node,
    ie, where shortest[w] is lowest}
   {add w to W}
   {Update the shortest distances to vertices in V-W}
      for {each u in V-W} do 
         Shortest[u] := min(shortest[u], shortest[w]+T[w,u])
end;
It is possible to prove that this algorithm (known as Dijkstra's algorithm) works correctly. Essentially you use induction to show that if the shortest distances are correct for all the nodes in W at one stage, they are still correct after you add a new node to W as specified.

Topological Ordering

Motivation

There are many problems where we can easily say that one task has to be done before another, or depends on the other, but can't easily then work out what order to do things to a whole buch of such things. For example, it is easy to specify/look up prerequisite relationships between modules in a course, but it may be hard to find an order to take all the modules so that all prerequisite material is covered before the modules that depend on it.

The same problem arises in spreadsheets. To recalculate cells after some of the values are changed, it is necessary to consider how one cell's value depends on the value in another, e.g.,

Cell     Contents            Value
1        100                 100
2        (Cell 1) + 10       110
3        (Cell 1) * 2        200
4        (Cell 2) + (Cell 3) 310
In the above example, if you change the value in cell 1, you need to first recalculate the values in cell 2 and 3, and only when that is done can you recalculate the value in cell 4.

Both these problems are essentially equivalent. The data of both can be represented in a directed graph (see fig ?). In the first each node is a module; in the second example each node is a spreadsheet cell. Directed edges occur when one node depends on the other, because of prerequisite relationships among courses or dependencies among nodes. The problem in both is to find an acceptable ordering of the nodes satisfying the dependencies. This is referred to as a topological ordering.

[figure331]

The Algorithm

A straightforward approach to finding this order would be first to find each node's dependent nodes, e.g.,
Node     Dependent Nodes
1        none
2        1
3        1
4        2,3
You would first remove any node that has NO dependencies, as an acceptable first node. That node could then be removed from the dependency lists of the other nodes:
Node order: 1

Node     Dependent Nodes
2        none
3        none
4        2,3
The process then repeats, with either node 2 or 3 being chosen. Let's say node 2.
Node Order: 1,2

Node     Dependent Nodes
3        none
4        3
This continues in the obvious manner, with the final order being 1,2,3,4.

In fact, to implement this we don't even need to store lists of dependent nodes, just the NUMBER of dependent nodes. So you start by finding the number of dependent nodes, for each node. Then when a node (with zero dependent nodes) is removed, you decrement the number associate with each of the dependent nodes:

ie,

Node     Dependent Nodes
1        none
2        1
3        1
4        2

remove 1, then 2 and 3 have 0 dependent.
remove 2, then 4 has 1 dependent
remove 3, then 4 has 0 dependent
remove 4
To implement this, we can use a list datastructure L to return the topologically sorted nodes, and a queue data structure Q to hold the nodes will zero dependent nodes waiting to be processed. When a node N is removed from Q and added to L, all of N's dependent naighbours have their number decremented, and any that are then zero are added to Q. Something like:
Q               L
1               1
2 3             1 2
3               1 2 3
4               1 2 3 4

Summary: Graph Algorithms

This chapter has presented a range of useful graph algorithms, for undirected, directed, and weighted graphs. The algorithms presented are fairly simple. There are many dozens more algorithms that have been invented for related tasks, and a number of tasks for which there is no efficient algorithm. One of these is the 'travelling salesman' problem, where the objective is to find the minimum cost tour in a weighted graph whereby every node is visited. The development of graph algorithms is still a very active research area.

What key ideas should you remember from all this? First, you may, when solving some computational problem, be able to spot that the data can be best represented as a graph, and that solving the problem involves analysing the graph in some way. You can then look up graph algorithms in a major textbook (like Sedgewick) to find out if there is a standard algorithm that fits. Second, the very basic algorithms mentioned here are part of the basic `vocabulary' of computer scientists, so might be referred to when people describe their system or method. A basic familiarity with the algorithms should help in understanding such descriptions.

Geometric Algorithms

Motivation

Many computer applications involve manipulating `geometric' objects: points, lines, polygons, cubes and so on. Obvious examples are drawing packages and computer-aided design (CAD) tools, tools that involve manipulating maps, and even just graphical user interfaces (GUIs).

Geometric problems are often easy for humans, but hard to find good algorithmic solutions. As an example, given a polygon defined by its vertices, it is easy for humans to: decide whether a given point falls within the polygon; find the smallest convex polygon (or convex hull) that encloses the vertices (see figure ?).

[figure339]
Figure 4.1: Polygon with point inside (A) and outside (B), and 'convex hull'

Finding out whether a given point falls within a specified polygon is obviously essential for drawing packages - for example, most of them will allow you to create an arbitrary polygon and have it 'filled' with a given colour. Finding the 'convex hull' surrounding a set of points could be useful when working out the shortest boundary enclosing some set of objects (e.g., a fence around a number of trees..).

In this course I'll only give a very brief introduction to geometric algorithms, roughly corresponding to chapter 24 in Sedgewick. You should, at least, be aware of this category of algorithms, so you know where to look things up if you need such a thing in your programming and problem solving.

Representing Points, Lines and Polygons

Before discussing the algorithms, I'll give datastructures for points, lines and polygons:
type point = record
               x, y: integer
             end;
     line = record
               p1, p2: point
            end;
     polygon = array[0..Max] of point.
We will require that the end point in a polygon will be the same as the first point, so it is possible to know where the polygon array ends.

Line Intersection

Given this representation, how do we find out if two line segments intersect, ie:
\      /    vs    \  /
 \    /            \/
  \  /             /\
   \              /
One way is basically to use school maths (y=mx+c etc) to find the equations, and then intersection points of the lines, and then find out if these intersection points lie 'within' each line.. Another approach, given in Sedgewick, which is introduces ideas useful in other algorithms is as follows.

First define a function that, given three points, tells you whether you have to turn clockwise or counter clockwise when travelling from the first to the second to the third. Call this CCW: it will return TRUE if it is counter clockwise. Now, two lines l1 and l2 will intersect if [tex2html_wrap_inline1312]AND [tex2html_wrap_inline1314]. Basically, this just means that both endpoints of each line are on different 'sides' of the other line.

The CCW function can be defined fairly straightforwardly, by calculating the gradient between point 1 and point2, point 2 and point 3, and checking if one gradient is larger than the other. (The implementation must cope with the fact that the slopes may be infinite, or that they are equal, but that's just a minor fiddle - see Sedgewick).

The following gives what might be an initial attempt at both ccw and intersect. I'll leave it as an exercise for the reader to work out how to make it deal with infinite slopes (ie, dx=0), and the case where two lines are collinear.

function ccw(p1, p2, p3: point): boolean;
{Slightly deficient function to determine if the two lines p1, p2 and
p2, p3 turn in counter clockwise direction}

var dx1, dx2, dy1, dy2: integer;
begin
  dx1 := p2.x - p1.x; dy1 := p2.y - p1.y;
  dx2 := p3.x - p2.x; dy2 := p3.y - p2.y;
  CCW := (dy1/dx1 < dy2/dx2)
end;

function intersect(l1, l2: line): boolean;
begin
   intersect := 
       ((ccw(l1.p1, l1.p2, l2.p1) <> ccw(l1.p1, l1.p2, l2.p2))
        and (ccw(l2.p1, l2.p2, l1.p1) <> ccw(l2.p1, l2.p2, l1.p2)))
end;

Inclusion in a Polygon

A basic approach to finding if a given point is within a polygon is as follows. Draw a line out from that point, and count how many times it intersects lines in the polygon. If it is an odd number, the point must be within the polygon. See figure.

[figure348]
Figure 4.2: Determining if point is included in polygon

This is slightly complicated by the fact that the line might intersect a vertex of the polygon. If so, it may sometimes be counted as two counts and sometimes as one, as illustrated by the two such examples in the figure. Without this complication, the algorithm to determine inclusion is the following (which ignores such points). 'n' is the number of points in the polygon.

function inside(p: point; poly: polygon; n: integer): boolean;
var i, j, count: integer;
    lt, lp: line;
begin
   count := 0; j := 0;
   lt.p1 := p; lt.p2 := p; lt.p2.x := MAXINT; {Defines horizontal line
                                               out from p}
   for i := 0 to n do begin
       lp.p1 := p[i]; lp.p2 := p[i+1];        {Create line for segment in
                                               polygon}
       if intersect(lp, lt) then count := count +1;
   end;
   inside := (count mod 2 <> 0)
end;
To handle the case where our line intersects a vertex, we need to check whether the two points in the polygon either side of the vertex are on the same 'side' of the test line. (Convince yourself that this is right). Implementation details are left to the reader, or to look up in Sedgewick.

Finding the Convex Hull

The convex hull of a set of points, is really its natural boundary or enclosure, as illustrated in the figure. It is defined to be the smallest convex polygon enclosing all the points. (Convex has its usual meaning. One way to describe a convex polygon is that it is one such that any line connecting two points within the polygon also fall within the polygon. Another description could be one such that, if you follow round the polygon, each line segment bends in the same (clockwise or anticlockwise) direction.)

[figure356]
Figure 4.3: Convex hulls

A function for finding a convex hull takes an array of points, and returns a polygon (another array of points) (a subset of the points in the first array). Sometimes it may be easier to modify the original point array, rather than returning a new one.

The most natural algorithm for finding the convex hull is referred to as package wrapping. Starting with some point guaranteed to be on the hull (say, the one with smallest y coordinate), take a horizontal line in positive direction and sweep it up until it hits a point. This point is guaranteed to be on the hull. Record this point, anchor the line at this point, and continue sweeping until another point is hit. Continue in this way until the `package' is fully `wrapped', as illustrated.

[figure360]
Figure 4.4: `Package Wrapping' a Convex Hull

So, how do we find the next point on the hull, assuming we have a suitable line which is to be `wrapped' round to this point? Basically, the next point is the one for which the angle between the current line, and the line from the last point on the hull to the point considered, is a minimum. (This angle is labelled theta ([tex2html_wrap_inline1318]) on the diagram). If we have a function theta, that takes two points and returns the angle between the associate line and the horizontal, then this results in the following algorithm:

procedure wrap(var points: polygon; n: integer);
var i, min, m: integer;
    th, v: real;                 {v will be current 'sweep' angle;}
begin                            {th will keep track of minimum angle}  
   min := 0;
   for i := 1 to n-1 do 
      if p[i].y < p[min].y then min := i;  {find starting point}
   th := 0.0; p[n] := p[min];              {initialise 'th' and p[n]}
   m := 0; finished := false;
   while m < n and not finished begin  {we'll be looking at the m'th point on hull}
      swap(points, m, min);      {Move the next point into the M'th position;}
                                 and vice versa}
      min := n; v := th; th := 360;  {set sweep angle, initialise min & th}
      for i:= m+1 to n do
         if (theta(p[m],p[i]) > v) and (theta(p[m], p[i]) < th) then begin
         {if the angle to this point is > current sweep angle, and is
          the lowest such angle come across, then set is as a new min}
             min := i; th := theta(p[m], p[i])
          end;
      if min=N then finished := true;
      m := m+1;
    end;
end;
Some steps of the above may not be obvious. As points on the hull are found, they are swapped with an actual point in the array, so at any point the array contains the first m points of the hull plus the rest of the points. So that the algorithm can exit when the hull is complete, the starting point in the hull is placed in the N'th position in the array. When the current point on the hull just found is that one, then exit.

The algorithm should look vaguely like the sort algorithms which you should have already met. This is not surprising, as we are basically sorting a bunch of points according to their position around the hull. Like other simplistic sort procedures (selection sort), this algorithm doesnt have very good worst case complexity (N*N). Other methods for finding convex hull can have N log N complexity, and have analogs in other sorting methods.

Summary (and other points)

This section has just given a flavour of what geometric algorithms are like. There are many other algorithms for the given problems (e.g., finding convex hull), and many other problems that reduce to geometric terms. For example, in database queries it is natural to specify queries to request all objects that whose attributes fall in a specified `range' (e.g., height > 6ft; 11st < weight < 13st; 25 < age < 35). This can be viewed in terms of finding the set of points that fall within some (multi-dimensional) region. This `geometric' way of viewing data (where each attribute can be mapped to a scale) is now seen as important when trying to `visualise' the contents of a database, in order to search it. Two attributes can be chosen, (say, height and weight) and points drawn on the screen for each object, the position of the object being based on these two attribute values. (So, a short light chap would be in the lower left corner). The user then gets a feel for how many objects with different values there are. All the other attributes have `sliders' associated with them - the user can use these to constrain which objects are displayed on the screen (e.g., only those between the ages of 25-35). Each object on the screen can be clicked on to obtain more information about it.

Efficiently implementing this kind of interface to a database clearly relies on geometric notions, and some knowledge of geometric algorithms may enable it to be done efficiently. (Very roughly, range searching can either be done by, say, searching all points to find those falling within range in first dimension, then searching those points to find those falling in range, and so on; or mor efficiently by storing the data in a (binary tree like) form that enables comparisons to be done very quickly.)

Anyway, that's it for geometric algorithms, and indeed pretty much for algorithms in general. We've looked at three main categories: string processing, graph algorithms and geometric algorithms. In practical terms, this should be useful if you have to solve problems that might involve finding a suitable data representation and algorithm: you might vaguely think `maybe a geometric algorithm could be used for this' and then go away and look it up in your favourite algorithm book. In examination terms of course a little more is expected: you should be able to explain and work through the algorithms mentioned, giving pseudo code (or Pascal) where this is straightforward, and be able to at least suggest how algorithms might be extended, combined, or applied to given tasks.

Dynamic Programming

Before completely leaving the subject of algorithms I'll briefly talk about a principle that has guided the development of some more advanced algorithms. The principle is analogous to `divide and conquer' (ie, split up your problem, solve each bit, and combine the bits) which is a very general technique applied to the design of many algorithms (e.g., merge sort). In fact, dynamic programming can be viewed as this principle taken to extremes. If it is not possible to work out exactly how to divide up a problem into smaller problems, it is sometimes worth taking the approach of solving ALL the smaller problems, and storing them away to be combined and used when solving the larger problem. (This idea is illustrated to some extent by the shortest path algorithm we looked at: a table of shortest distances found so far to each node is maintained, and this used when working out new shortest paths.).

The basic idea of dynamic programming can be illustrated by looking at the `knapsack' problem. In this problem, a thief robbing a safe finds it filled with N items of various size and value. But his bag has only limited capacity (M). How does he choose which items to take to maximise his loot's value.

As an example, perhaps there are four objects in the safe of following sizes/values: size 12, value 20; size 5 value 8; size 3 value 10; size 6 value 10. His bag's capacity is 12. He should clearly go for the first object and scrap the rest. But what if his bag has capacity 15. This time he should leave the big object, and go for the smaller ones. This problem is clearly of some practical relevance, for example in real packing (e.g., shipping) applications.

In a dynamic programming approach we do a rather non-intuitive thing. We calculate the best combination of objects for ALL knapsack sizes up to M. This can be done very efficiently, as illustrated in the following program (which assumes that we have a potentially number of items of each given size/value, these sizes/values being stored in an array).

for j:=1 to N do             {Go through each item}
   for i := 1 to M do begin  {Consider each size knapsack}
      if i >= size[j] then
         if (cost[i] < cost[i-size[j]] + value[j]) then begin
            cost[i] := cost[i-size[j]] + value[j];
            best[i] := j
         end;
Cost[i] is the highest value that can be achieved with a knapsack of capacity i and is initialised to zero; best[i] is the last item that was added to achieve that maximum. First we calculate the best we can do only using objects of type 1 (j=1). Then we calculate the best considering items of type 1 and 2 (using our result for just type 1). And so on.

The revised calculation of 'cost' when considering a new item is very simple. We can find out the value (cost) of items that we'd be able to take of the other kinds considered so far IF we removed enough to leave room for our new kind of item (cost[i-size[j]]). If that value PLUS the value of the item we're considering adding is greater than the old cost just with all the old items then replacing the old item(s) with this new one is a good idea, and so cost (and best) are updated to reflect this. (We can see parallels with shortest path algorithms).

As an example, suppose we have the following items, and are looking to fill a bag of size 4:

Item  Size  Value
1     3     6
2     2     5
3     1     2
Initially we just consider item 1. For a bag of size i=1 and 2 the item won't fit in the bag. For a bag of size 3, we can fit one item and as cost[3]<cost[0]+value[1], that one is added. So cost[3] is now 6; best[3]=1. For a bag of size 4, we check if cost[4] < cost[1]+value[1] - this is true, so for this size too we have item 1 in the bag, and cost[4]=6; best[4]=1.

Now we consider item 2. For bag size 1, it won't fit. For bag size 2 we check whether cost[2] < cost[0] + value[2]. This is clearly the case (as cost[2] is zero, as item 1 wouldnt fit in the bag) so we make that the `last' (and only) item in this bag, and cost[2]=5. However, for size 3 we could fit a type 1 object - so when we check if cost[3] < cost[1]+value[2] the answer is no. For size 4 we check if: cost[4] < cost[2]+value[2]. Now cost[4] is 6, cost[2] has just been calculated as 5, so the check succeeds, which means it is worth throwing out the previous contents of bag 4, and replacing the top item with item 2 (and assuming the rest of teh bag is te same as what is `best' for bags of size 2). So now we have: cost[0]=0; cost[1]=0; cost[2]=5; cost[3]=6; cost[4]=10.

Now, considering item 3. Item 3 fits in a bag of size 1, so cost[1]=2. It is not worth using for bags of size 2, but for a bag of size 3 we have: cost[3] < cost[2]+value[3], so the top item in this bag is item 3 (and we assume the rest is the best we can get for a bag of size 2). For a bag of size 4, this is not worth doing, so at the end of the day we have: cost[0]=0; cost[1]=2; cost[2]=5; cost[3]=7; cost[4]=10. We also have best[4]=2; best[3]=1; best[2]=2; best[1]=3.

It turns out that when the calculation is complete we can find out the contents of the knapsack using the 'best' array. Suppose best[M] is item k, of size size[k]. Then the next item in the bag must be the last item in a bag of size[M-size[k]]. And so on. So, in the above case, best[4]=2, so we have an item 2 in the bag. best[4-size[2]]= best[2]=2, so we have another item of type 2 in the bag.

All this can be proved correct by induction. ie, we assume that it gives the right result for N items, it gives the right result for N+1 items. Then we just check it is correct for 1 item. We check, for N items, that if it gives right result for bag of size M it gives right result for bag of size M+1. I'll leave that as an exercise for the reader.

This approach to the knapsack problem takes NxM steps, which isn't bad. Try thinking of other solutions to the problem. However, the approach does have a limitation - it only works for bags whose size can be represented as an integer (or where this is an acceptable approximation). The more fine-grained a result we want, in terms of bag sizes, the more partial solutions we need calculate and store for smaller sizes. For dynamic programming it is necessary to think hard about the time and space requirements of the approach. However, the general idea, of solving for simpler cases and then using these results to solve the harder case, is clearly a very widely applicable technique, and it should at least be viewed as a possible problem solving technique to be used for all sorts of hard problems.

Object Oriented Programming

Introduction

In the last section of this course we'll develop further notions of object oriented programming. So far you've used Turbo Pascal objects as a convenient way to define abstract datatypes. In this section I'll re-introduce object oriented programming, and show how it extends the idea of abstract datatypes. Although the examples and exercise will use Turbo Pascal, the ideas should generalise to other object oriented languages such as C++.

Motivation: Programming Paradigms

Object-oriented programming is one of a number of approaches to programming and programming languages: It is important to be aware of these different approaches to programming. A given programming language typically supports more than one paradigm, and it is to some extent up to the programmer what style of programming they adopt even in a given language. For different problems, different approaches are more natural, so there is no panacaea. However, imperative and object oriented approaches are arguably the most important, and widely applicable.

Turbo Pascal and C++ support both imperative and object-oriented programming. You can write C++ programs which don't exploit its object oriented features, however very often there will be a cleaner solution that uses objects.

Object Oriented Programming and ADTs

Object-oriented programming can be viewed as an extension of the idea of abstract datatypes. An abstract datatype involves a data structure with a set of defined operations on that data structure. The data structure is only accessed or modified through these operations. The main extension of these idea in object oriented programming is that a given datatype (say, t1) may be a subtype of another (say, t2), and so access the operations and data of that other type. If an operation (or data field) is defined for both t1 and t2, in different ways, the version for t1 overrides the version for t2, and is the one used.

This simple feature introduces a great deal of flexibility, and more importantly makes it relatively easy to re-use code in different applications, where minor variations of a basic approach are required. Libraries of generally useful datatypes (objects) are made available, but the programmer can easily define special customised versions with slightly different behaviour.

Key Concepts and Terminology

In object oriented programming, the datatypes are referred to as classes, and variable of a given type are referred to as instances of the class, or just objects. Operations are referred to as methods and we sometimes talk of sending a message to an object, rather than just calling a method. Subtypes are referred to as subclasses. A subclass relation exists between a class and its subclass, and the set of all these relations defines a class hierarchy (basically, a tree structure of classes). Family tree terminology is used to refer to relations between classes (e.g., parent class). Where the methods or data of a parent class are used by an object (say, the object is an instance of class C1, which is a subclass of class C2) we say that it inherits the methods of data.

The key concepts in object oriented programming are encapsulation and inheritance. Encapsulation has been met before for ADTs - the internals of the data representation should be hidden, and all communication with the object should be via the specified methods. Inheritance is where methods and data of parent (or ancestor) classes are used by instances of a given class.

Object Oriented Programming in Turbo Pascal: Basics

It is probably best to look at concrete examples of object oriented programming in Turbo pascal before going into further theory or concepts. We'll consider two kinds of examples. The first will involve graphical and user interface objects, the second will involve reworking some old list ADTs as objects.

Suppose we wanted to write a simple drawing program that could (among other things) display different shapes on the screen. The shapes to handle are circles and rectangles. They each have a position and size, though this will be defined differently for the different shapes. A circle will be defined by center point and radius, rectangle by a center point and the length and width. Shapes also have various characteristics such as colour.

Useful methods for these objects are: Initialise; Move; SetColour; Draw. Initialise should set the position and size, while Move, SetColour and Draw should do the obvious. It should be clear that Initialise and Draw will be different for circles and rectangles, while SetColour and Move could be defined in a shape-independent way.

The first thing to do in object oriented programming is decide on a class hierarchy - that is, a hierarchy of kinds of objects, from very general ones to very specific kinds. In this simple example we can have a general class of 'shapes' and more specific classes for 'circles' and 'rectangles' (a realistic program would of course have many more such classes). A particular circle with specific size, colour etc will be an instance of the circle class. We can now decide on which operations and datafields can be defined for the most general class (shape). The following gives the type definition, and an example method.

type 
  Shape = object
            procedure Move(x,y: integer);
            procedure SetColour(col: ColourType);

          private
            xpos,ypos: integer;
            colour: colourtype;
          end;

procedure Shape.SetColour(col: ColourType);
begin
  colour := col
end;
Note that in the SetColour procedure 'colour' is not treated as a global variable, but as the value of the colour field of the object whose colour is being set. We can also access these fields when calling further procedures within the procedure (e.g., we could call a procedure MovePen(xposn, yposn) and the x and y position of the object in question would be accessed, without these needing to be specified as arguments to the procedure.) Note too that the position and colour of the shape are `private' and can only be accessed or modified via any defined operations.

Now, a circle and a rectangle are both kinds of shape. To specify this we put 'Shape' in brackets after 'object'. Class definitions for Circle and Rectangle could be:

type 
  Circle = object(Shape)
             procedure Initialise(posn, rad: integer);
             procedure Draw;

            private
              radius: integer;
            end;

  Rectangle = object(Shape)
                procedure Initialise(posn, width, length: integer);
                procedure Draw;
              private
                width, length: integer;
              end;
Circle and rectangle would inherit the procedures and datafields from Shape, so it is not necessary to repeat them. I'll just assume that the relevant procedures can be straightforwardly written. Note that the class 'Shape' is not a class that you'd have instances of. It is just defined so that specific shapes like circle and rectangle can be defined much more simply. Such classes are referred to as abstract classes.

If we now had an object instance Ob of type Circle, and wanted to set its colour, we could call Ob.SetColour. It would use the SetColour procedure inherited from Shape. We can think of this as sending a message SetColour tothe object Ob.

Now, suppose we wanted to deal with a specialised kind of rectangle: say, a rounded rectangle (one with rounded corners). This can use the standard rectangle's initialise method, but will need a special draw method. It will also need an extra field to specify the radious of the roundedness, so to speak:

type  
  RoundedRectangle = object(Rectangle)
                       procedure Draw; override;
                     private
                       roundedness: integer;
                     end;
Note that we explicitly state that the new Draw procedure should override, or replace, the one that would be inherited from the standard rectangle object.

This example pretty much covers the basics of object oriented programming. The last assessed exercise involves developing the example. However, there are many subtleties that must be mentioned. These will be discussed next, and then we'll see how some of the List ADT stuff could be reworked as objects with inheritance.

Further Reading

Standish has a reasonable chapter on OOP. Many other books also have introductions. One quite good one is `Software Design and Data Structures in Turbo Pascal' by Koffman & Maxim, Addison Wesley, 1994.

Lists as Objects: Exploiting Inheritance

In this section I'll consider how the development of a range of list datatypes can be made simpler if we exploit inheritance, by making stacks and queues subclasses of a more general linear list datatype. The general linear list datatype will provide a variety of means to access and modify the list datastructure. Stacks and queues are more restricted and specialised in that they only allow items to be removed from the front, and added to front (for stacks) or rear (for queues). The details of this example are largely taken from the Koffman and Maxim book cited above, but with some changes to be more consistent with DS & A 1.

An Initial Class Hierarchy

To start with we'll just assume a hierarchy with a parent class link_list, and two `children' stack and queue. The class link_list will have the following operations - these allow movement back to the beginning of the list, so is more general than a one way linear list. One example procedure is given, but the implementation should be fairly standard based on your knowledge of OWL lists.
link_list = object
             procedure Init;
             function IsEmpty: boolean;
             function IsFull: boolean;
             function Length: integer;
             procedure Insert(el: item; var success: boolean);
                       {Inserts element at current position}
             procedure InsertAtEnd(el: item; var success: boolean);
                       {Inserts element at end}
             procedure AccessFirst;
             procedure AccessNext;
                       {Advances current position}
             procedure AccessLast;
             procedure GetValue(var el:item; var success: boolean);
                       {Accesses contents of current node}
             procedure Delete;
                       {Deletes current node}
             procedure Clear;
           private
             start, current, previous, end: listpointer; 
             size: integer;
           end;

procedure link_list.InsertAtEnd(el: item; var success: boolean);
{ Inserts item el as the contents of new node at end of list, and
  makes the current node be this new one.
  Pre: List is initialised
  Post: If `start' previously nil, it now points to new element else
        New element is successor of one previously pointed to be 'end'}
begin
  new(node);                        {Create a new node}
  if start=nil then begin
     node^.successor := nil;
     start := node; end := node
  end else begin
     end^.successor := node;
     node^.successor := nil;
     end := node;
   end;
   node^.contents := El;
   size := size+1;
end;
Have defined a fairly general purpose linked list object, we can now define a stack object, making the linked list its parent. Again, I'll give the type declaration, and an example procedure.
type stack = object(link_list)
              procedure Push(x: item);
              procedure Pop(var x: item)
             end;

procedure stack.push(x: item);
{ Pushes x onto stack
  Pre: Stack is initialised
  Post: First & Current point to new node containing x, which has the
        previous first node as its successor}
begin
   AccessFirst;
   Insert(x)
end;
Note that useful stack operations like Init, IsEmpty, IsFull and the data fields are all inherited from link_list, and further that push (and pop) can make use of link_list operations to simplify their implementation. Although we can achieve this sort of code-reuse using Pascal units, it is cleaner through objects.

A queue object can be defined in a similar manner. This is left as an exercise for the reader.

In the example above, none of the linear list methods are overridden -- new ones are just added. However, it may be appropriate to redefine some of the methods. For example, GetValue might be redefined in a stack/queue implementation where the first item could be examined without it being removed. For ordered lists (mentioned below), insert might be redefined to insert a new item such as to retain the order of items in the list.

There is one (arguable) problem with the approach outlined -- that is, that all of the linear list's other methods are inherited by stacks and queues, not just the ones we really want. But if we make these methods private then the link_list is less useful in itself (we might want to define instances of linear list, and require all the operations). We could get around it by re-organising things so that link_list is an abstract class with (mainly) private methods, and, say, owl_list is another descendant.

More List Objects?

In this course we've come across a number of other list like things: generalised lists; double-ended queues; ordered lists; circular lists; priority queues and so on. These could be added to a list class hierarchy. The following is a first pass at what such a thing might look like, along with a few other datastructures. A complete object library of lists (and other datastructures) would be based on a more sophisticated version of a hierachy of this sort.

[tex2html_wrap1324]

A C++ or Turbo Pascal programmer would normally have access to class libraries providing a range of datatypes such as those above. The programmer could then develop specialised versions of a particular datastructure by creating a new class within the hierarchy. This requires knowledge of the classes available in the library (or browsing/search tools to rapidly navigate the hierarchy), an understanding of the underlying (documented) implementation, and an understanding of inheritance.

Object Browsers

Before getting on to more theory, this may be a good point to talk about object browsers. If extending an object-oriented program, or making use of a class library, it is extremely useful if you have a graphical object browser. An object browser will display graphically the hierarchy of objects in your compiled program (including any in `used' units). A typical hierarchy is given below. This is what you get in Turbo Pascal if you select `object' from the `search' menu, having loaded some of the class libraries in Turbo Pascal. Well, in fact it's what you get in MY version of Turbo Pascal (version 7). It looks like the version in the labs may not have an object browser.

Now, if you DID have one, then you could double click on any of the objects in the hierarchy, and get information about that object. The information would normally include the declarations for all the objects methods and data fields, with an indication of where they were inherited from (and perhaps further info, such as whether they are declared virtual methods). If the source code was available, then by selecting one of the methods you should be taken directly to the place in the source code where that method is defined.

An object browser allows you to rapidly move within the text files defining a large object oriented system. If, say, you're editing a procedure which calls the `draw' method, and want to check out/modify that method, you need to go straight to just that version of the draw method that would be used there. Lets say the procedure you're editing is ob1.display. You'd find ob1 in the graphical class hierarchy, select that, find its draw method, select that, and then be at the appropriate draw method. Simple search tools (that may allow you to search for the procedure draw's definition) are just not adequate. There may be many versions of draw, and ob1 may inherit its version from any of its ancestors.

Object browsers allow static browsing of the classes. However, many systems also provide more run-time debugging tools, showing the object instances created at run-time, their inherited methods, and their location inthe hierarchy.

Static and Virtual Methods (and some other things)

At this point I'll shift to a small artificial test example to illustrate a number of other features of OOP in Turbo Pascal. (Similar features and distinctions exist in C++). The test program is given in the figure.

[figure410]
Figure 6.1: Test program illustrating Static methods

Static and Virtual Methods

This example program has the display method calling a `writeout' method. There are two versions of `writeout', one for the parent class and one for the child class. A child object is created and initialised, and the display method called. (Note: I'm using the names `parent' and `child' for objects so it is clear which is the parent! You'd never use these names in a real program.) Anyway what happens when the program is run is the following is output:
------------
Parent Value: 3
------------
This is perhaps unexpected, as it was a child object that was used. You might hope that the `child' writeout method would be used. But no. This is because writeout is (by default) a static method. This means that which version is used at any point is fully determined at compile time. And when a procedure, say, P1 is called from within a procedure Ob.P2 the compiler determines that it is Ob.P1 that should be used at that point. In this case, within parent.display, the compiler determined that the call to writeout should really be parent.writeout. This process is sometimes referred to as early binding. The called routine is bound to the calling method at run-time.

Now if we want the method used to always that appropriate to the actual instance we can define that method to be virtual. Then the choice of which method is used will be made at run-time, based on the class of the actual instance. Declaring a method to be virtual involves modifying the declarations in the type definitions for parent and child as follows:

procedure WriteOut;  virtual;
(Declarations for all occurences of Writeout must be modified). If you do this, and run the program then you get the expected output:
------------
Child Value: 3
------------
In fact, for virtual methods to work another small change has to be made to the program. We have to declare the initialisation procedure to be a `constructor' and ensure that it gets run before the virtual methods are accessed. When the constructor procedure is run, some initialisation is done to ensure that the virtual methods get set up correctly. Anyway, the new `virtual method' program would be:
type
  parent = object
           constructor init(value: integer);
           procedure Display;
           procedure WriteOut; virtual;
        private
           x: integer;
        end;

  child = object(parent)
          procedure WriteOut; virtual;
         end;

constructor parent.init;
begin
     x := value
end;
This process of determining which method is to be called at run time is referred to as late binding.

Some other things

While we're looking at this example, I'll introduce some other features. First, what if we want the child to have a specialised display method that does what the parent display method does PLUS some other things. There are two ways we could write such a method:
procedure child.display;
begin
  parent.display;
  writeln('Some extra stuff for child objects')
end;

OR

procedure child.display;
begin
  inherited display;
  writeln('Some extra stuff for child objects')
end;
For this example either of these would result in the output:
------------
Child Value: 3
------------
Some extra stuff for child objects
The former version would be used if you specifically wanted the method associated with a particular class. The latter would be used if you just wanted to use first the method that would normally be inherited by the child (from its parents or ancestors), but then add some more stuff.

A final basic concept to introduce here is the special identifier self. As things stand, within a procedure (or function) there is no way to get hold of the actual object instance which the method is being applied to. This contrasts with non-OOP where the `object' would be passed explicitly as a parameter. Normally this doesn't matter, as we're interested in the data fields (and methods) of the object, and the method will always access the data fields and methods of the instance in question. But occasionally we DO want to have a variable whose value is the object itself. The indentifier `self' can be used in these situations. Then self.x could refer to the x datafield of the relevant instance, rather than just `x'. Or we can set a local variable to `self', as illustrated below, where it is used in a procedure to write out the elements of a simple linear list datastructure defined as an object.

procedure list.writeels;
var temp: list;
begin
  temp := self;
  while (not temp=nil) do begin
     write(temp.value);
     temp := temp.successor
  end;
end;

More Data Structures as Objects: A Graph Library?

As some more examples of how our old data structures would look in a new (improved?) OO framework, here's a simple adjacency matrix graph. Labelled graph is made a subclass of graph. We could additionally have, say, weighted graph as a subclass. Note how `labelledgraph.creategraph' specialises the method of the normal graph initialisation to do some extra stuff too. (Other methods have been omitted, as hardly changed from non-object version - obviously now SetLabel is called labelledgraph.SetLabel, and so on; and datafields previously accessed by, e.g., g.nodearray can now be accessed just as nodearray).
type
    graphlabel = string[20];
    node = 1..maxnodes;
    graph = object
       procedure CreateGraph(size: integer);
       procedure AddEdge(n1, n2: node);
       procedure RemoveEdge(n1, n2: node);
       function EdgeExists(n1, n2: node): boolean;
    private
      numnodes: integer;
      nodearray: array[node,node] of boolean;
    end;

    labelledgraph = object(graph)
       procedure creategraph(size: integer);
       procedure SetLabel(l: graphlabel; n: node);
       function GetLabel(n: node): graphlabel;
    private
       labels: array[node] of graphlabel;
    end;

implementation

procedure graph.CreateGraph(size: integer);
var i, j: integer;
begin
  numnodes := size;
  for i := 1 to size do
      for j := 1 to size do
          nodearray[i,j] := false;

end;

procedure labelledgraph.creategraph(size: integer);
var i: integer;
begin
  inherited creategraph(size);
   for i := 1 to size do
      labels[i] := emptylabel
end;
Exercise for reader: Consider how to create A) Undirected graphs and B) Weighted graphs as subclasses of graph.

Dynamic Objects

So far, all the examples have used static instances of object types. When an instance of a given class is declared (in a var statement) then the memory required for that object is assigned on the heap (the bit of memory where such data gets put). However, it is also possible to declare a variable as a pointer to an object, and assign the memory dynamically (as the program is run) using new (just as we did for list datastructures etc.). It turns out that this results in a useful feature: rather than, say, in a shapes program declaring variables of type circle, triangle, square etc, and then only being able to use that variable to describe the stated type of object, we will be able to have a variable of type `shape' which can take values corresponding to any specific kind of shape.

Anyway, we can modify existing programs very simply to use dynamic objects, without having to make any changes in the actual object declarations:

program drawtriangles;

uses shapes;

var t1: ^triangle;            {t1 is a pointer to a triangle object}
    x,y: integer;
begin
     win := createsimplewin;
     x := 100; y := 100;
     new(t1);                 {allocate memory for a triangle}
     t1^.create(x, y, 50);    {Create a triangle (note the ^)}        
     t1^.changecolour(red);
     t1^.changelinestyle(dashed);
     t1^.draw(win);
     dispose(t1);             {De-allocate the memory}
     stopwin;
end.
If a dynamic object contains virtual methods then the object's constructor must be called after the new procedure allocates space, but before any method calls are made. To simplify this the `new' procedure has been modified to allow the constructor call as a second parameter. So the following is legal, and does the obvious thing:
new(t1, create(x,y,50))
When using virtual methods you should also really declare one of the methods as a destructor. When that is invoked the virtual method stuff for that object is `switched off'. Like new, dispose can be given an extra argument to specify the destructor method:
triangle = object(shape)
            destructor destroy;
            .....
.....           
dispose(t1, destroy)

Typecasting

Before getting on to more interesting things, we need to review typecasting in Pascal. This is just how we can translate things from one type into another, such as a character into an integer. Standard pascal functions for such things include chr and ord. But in Turbo Pascal a much wider range of type conversions is possible (between any enumerated type, such as integer, char, boolean or user defined types) with the name of the type used as the conversion function. e.g.,:
type colour = (red, green, blue);
....
Typecast         Result
integer(green)   1
integer('a')     97
boolean('red')   False
colour(True)     green
char(100)        d
Turbo Pascal also permits Typecasts of record and object datatypes. However, the results may be meaningless unless the `shapes' of the datastructures are the same (there is a a requirement that the total size of the two datastructures is the same, but not that the individual field sizes match - odd results occur if they don't).
type rec1 = record
              a: integer;
              b: char
             end;
      ob2 = object
              c: integer;
              d: char;
      end;

var r1: rec1; r2: ob2;

begin
   r1.a := 1; r1.b := 'a';
   r2 := ob2(r1);          {Covert r1 into something of type ob2 and assign to r2}
   writeln(r2.c, r2.d);

   ob2(r1).c := 2;         {Convert r1 into an ob2 and put 2 into field c}
   rec1(r2) := r1;         {Convert r2 into a rec1 and out r1 in it}
   writeln(r2.c, r2.d);
end.
Typecasting can be used in this way to convert between any two objects, of different classes. It doesn't matter if some of the fields of the object are inherited, but it DOES matter that the resulting field sizes match up. This is a severe restriction, but fortunately got around through the use of dynamic objects.

Typecasting and Dynamic Objects

If we use pointers to objects, then everything becomes much nicer. If something is declared a pointer to an object of a given class, then it can also validly point to any subclass of that class. So the following is legal:
type squareptr: ^square;
     shapeptr: ^shape;

var square1: squareptr; shape1: shapeptr;

begin
    square1^.create;
    shape1 := square1
end;
We can also use typecasting when allocating space on the heap (using new) so that, when allocating space for a `shape' it in fact sneakily allocates some extra space so that a `square' will fit. So really now the shape pointer points to something which is a square!

The following illustrates this:

var shape1: shapeptr;

begin
   new(squareptr(shape1));         {Create space for a square}
   shape1^.create(100,100,100);     {Shape1 is now square shaped! So create it}
   shape1^.draw;                    {And draw the square}
end.
The typecast operation means that first shape1 is converted to a square pointer, and then the appropriate space on the heap is allocated for the square.

Alternative valid forms of the first two statements above include:

shape1 := new(squareptr);
shape1^.create;

and

new(squareptr(shape1), create(100,100,100));
This new capability of using a general shape pointer to store squares is jolly handy. For example, we could let the user choose the shape to display, and then dynamically create the appropriate datastructure to handle it:
var s: shapeptr;
    choice: char;

begin
write('Choose a shape: ');
readln(choice);
if choice = 's' then new(squareptr(s))
else if choice = 'c' then new(circleptr(s))
else if choice = 'r' then new(rectangleptr(s));
....
s^.draw;
this is developed a bit more in Standish (but ignore what he says about handles, which is specific for a different version of Pascal which uses objects).

Note: In order for the right method to be used when a variable may point to subclasses of the defined type, it is necessary for the methods in question to be dynamic methods. At compile time there is no way of knowing in examples like the above whether 's' is a square or a circle or a triangle, so the right method cannot be determined.

Generalised Lists

Using dynamic objects, it is possible to develop a fairly neat implementation of generalised lists. The basic idea is to define a parent class generalised list, with subclasses list and atom. A list has datafields first and rest, which each contain items of type generalised list (or rather, pointers to generalised lists). An atom just has a datafield `contents'. A variable of type generalised list will also be able to take values corresponding to pointers to subclasses of generalised list (ie, list or atom), so it will be possible to have both list and atomic list items. As the detailed working out of all this is moderately complex in Turbo Pascal, here's just the general flavour:
     genlist = object
                  ...
     atom = object(genlist)
            ...
            private
              contents: atomtype;
            end;

     list = object(genlist)
            ...
            private
              first: ^genlist;
              rest: ^genlist;

Building Systems using OOP

This section summarises the equivalent section in Standish, looking at a couple of case studies in object oriented design.

A Drawing Tool

Suppose you want to develop a drawing tool, allowing different shapes to be selected from a menu, graphically positioned and sized, modified (changing their colour etc), filled, and so on. This is a natural extension of the examples being used in the text so far, and in exercise 3, so it should be clear how OO programming applies. A list of (pointers to) object instances could be maintained, corresponding to all the objects on the screen, created by the user. As they select new objects and position them, this list would be extended. As the user chooses to modify an object, its datafields would be altered, and hence how it is drawn. Although for this application there is clearly a fair amount of work on the basic interface, it should be clear that an object oriented approach is reasonable for maintaining and manipulating the shapes themselves.

A special purpose design tool might make more, um, special purpose `shapes' available. For example, a tool for office layout might have built in classes for desk, filing cabinet and so on. The user would be able to manipulate their attributes (size, colour) but would have available standard methods for displaying, moving etc the objects.

File and Printing Systems

Standish talks about this in detail. The general idea is that almost all computer applications require facilities to save stuff to disk, and print things out (consider: spreadsheets, drawing packages, netscape, programming language compilers,..). Getting this right in all its details is very hard, and might take someone size months to program from scratch. Yet most print/save facilities are pretty much the same, with just minor details. So the programmer will want to re-use code from existing applications (or special purpose libraries). Program re-use is what OOP is good at. The programmer can import classes that have all the basic print/save functionality built in (procedures to save files while making various checks to ensure other files not overwritten, or data lost if disk full or system crashes etc). Very little additional code will have to be written ( basically, code to covert the data format of the application into a form suitable for printing or writing to a disk). And if the application programmer doesn't like some of the functionality that the library provides (maybe they don't like having a dialogue box popping up asking if you REALLY want to print this file) then they can create subclasses of the library classes and override any procedures they don't like with new ones.

Window and User Interface Systems

Another thing required by almost all applications these days is a nice graphical user interface (GUI). This is regarded as so important that most professional programming environments provide some kind of interface toolkit. (Turbo Pascal provides a `resource workshop' for this purpose, but it is not well integrated with the Pascal language - probably links in some C code or something..). Anyway, an interface toolkit will normally provide a library of user interface objects, and some kind of pallette/menu through which the user can select the interface objects they want and arrange them in a new window. The figure below gives the general idea. The left hand box contains a selection of user interface objects: some text, a clickable `radio button', a check box (with text), a scroll bar, and a button. The `programmer' can pick these up with their mouse, move them into the other window (right box), and change their attributes (e.g., the text that goes with them; what procedure to call when a button is pressed). As selecting some of the interface objects will result in code being run (e.g., procedure called when button pressed), the interface can be easily linked in with the main program.

Generally the interface objects themselves are defined as object classes. This provides advantages for the developer of the interface toolkit, as inheritance can be exploited both to simply the programming and also (as an added extra) encourage some uniformity between different user interface objects. Typically user interface objects such as buttons and check boxes (as illustrated above) are subclasses of the class of `dialog item'. All dialog items share the code concerning how they are positioned in a window, sized, have colour set, and so on. A particular dialog item (e.g., button) will have additional data fields (e.g., the procedure to call) and special display/draw methods.

[tex2html_wrap1326]

Dialog items themselves are often a subclass of a a more general class, which might include (among its subclasses) the class of windows. A window, like buttons and things, can be positioned, has a particular text font, and so on. It also needs special code to allow windows to be opened and closed, to be brought to the front, iconified, etc. A particular toolkit might also provide special purpose subclasses for, say, a text window or a graphics window.

Although these different user interface objects are defined as classes, using inheritance, a naive user of a given toolkit may only be vaguely of the class hierarchy, just selecting options from a menu/pallette to create their application interface. However, a more sophisticated applications programmer would probably want to get under the surface a bit, and be able to create special purpose subclasses with slightly different behaviour to the that provided in the toolkit/library. The underlying OO implementation of the interface objects should make that easy.

Other Object Oriented Languages

Here we'll VERY briefly look at a couple of other OO languages. First C++. C++ is not so different to the stuff we've been talking about. Concepts like virtual vs static methods; constructors and destructors; dynamic memory allocation are all relevant.

Here's an example C++ class, defining an array-based stack:

class Stack 
  {
    private:
      itemtype *stack         {The Stack data field is a pointer to
                               something of type itemtype}
      int p;                  {p is an integer indicating position in stack}
    public:
      stack(int max=100)
        { stack = new itemType[max]; p=0}
                              {stack is the constructor proc. It allocates
                               a new array, and sets the position to 0}
      ~stack()
        { delete stack;}      {~stack is the destructor, and deletes it}
      void push(itemtype v)
        { stack[p++] = v}     { push increments p, and sets the array item to v}
      itemtype pop()
        { return stack[p--] } { pop decrements p and returns the array el}
   };
It should be reasonably clear that this is pretty much the same as the equivalent Pascal version, just with different syntax.

As a contrast, I'll give an example of OOP in CLOS (Common Lisp Object System).

(defclass square(shape)
  ((size  :initform 100)           
   (name  :initform ``A Square'')))

(defmethod printshape(s square)          {a print procedure for squares}
    (print (slotvalue s 'name)))
The syntax here is very different, but we still define square a subclass of shape, give it various data fields (with initial values), and define methods that are used for instances of a given class.

Multiple Inheritance

A full object oriented programming language allows classes to have more than one parent class. For example, in the graph library example we might want a class that is both undirected AND labelled. It would seem to make sense to make its parents include both the undirected graph class and the labelled graph class, and have it inherit both the graph label code and the modified AddEdge procedure (which would add edges in both directions). Turbo Pascal does NOT allow this, but most object-oriented languages do. Indeed, some view it as a defining characteristic of object-oriented languages, as other OO features have equivalent mechanisms in other paradigms.

Multiple inheritance is a mixed blessing, as there are often conflicts concerning which version of a procedure to inherit. Suppose X is a subclass of both Y and Z. Both Y and Z have different `display' procedures. Which one should be used by X? Or maybe Y inherits its method from another class Q. One way to determine which class to inherit from is to use topological sorting described earlier in the course. Suppose we have the following class hierarchy:

         S
       /  \
      Q    R  
      |    |
      Y    Z
       \  /
        X
A topological sort of these nodes would give: X, Y, Q, Z, R, S. Or equally well: X, Z, R, Y, Q, S or X,Y,Z,Q,R, S. There has to be some mechanism to choose between these alternatives, and the details depend on the particular OO language. One language uses the following: Each subclass must give parent classes in a precedence order (e.g., X might have Y before Z). Use this order to decide between alternatives. If this isnt enough, use the following rule: select the class which has a subclass rightmost in the sorted list so far. These two rules would result in the first of the alternatives above.

This sorted list is called the class precedence list. If the system is deciding which version of a method (or data field) to use, it will use the one associated with the first class it comes across in this precedence list that has the relevant method/field. Sp, if Q, Z and S all had a display method, X would inherit Q's version.

Conclusion

Advantages and Disadvantages of OOP

(this just summarises Standish).

OOP has the following advantages over conventional approaches:

However, there is a slight cost in terms of efficiency. As objects are normally referenced by pointers, with memory allocated dynamically, there is a small space ovearhead to store the pointers, and a small speed overhead to find space on the heap (at run-time) to store the objects. For dynamic methods there is an additional small time penalty, as the method to choose is found at run time, by searching through the object hierarchy (using the class precedence list for the object). These days, these efficiency penalties seem to be of relatively little importance compared with the software engineering benefits.

Summary

Object oriented programming focusses on defining a hierarchy of classes of objects, each with associated data fields and methods. Subclasses inherit data fields and methods from their parent and ancestor classes, but can override these with more specific versions. When one method can override another it is a good idea to define it as a virtual method, which means that the version chosen will determined at run-time. Otherwise the wrong version might be used when, for example, one method is called from within another. When using virtual methods, you also need to define constructor and destructor methods. By invoking these, virtual method code is switched on/off.

Objects are normally accessed via pointers, with memory allocated dynamically. Using this approach, a variable of a given (pointer) type can contain pointers to subclasses of the declared type. Explicit typecasting may be needed to allow this. Dynamic memory allocation also means that space can be allocated and deallocated in a flexible way. (You could have memory allocated and de-allocated for an object by declaring the object variable locally within a procedure - when that procedure exited the memory would be de-allocated. But this is less flexible).

Object oriented programming is generally considered good for software re-use and maintenance, as objects with clearly defined interfaces can be used and modified with relatively little effort. It has been particularly widely used in creating user interface libraries and toolkits. Although there are some time penalties in OOP, these are becoming less significant as machine speed increases.

However, to navigate round a large object-oriented program requires the use of good object browsers. It is very difficult to follow a large OO program simply by inspecting the code.

...Lisp
Lisp is an almost-functional language used alot in AI programming, which has the list as its key data structure.

 

 



Alison Cawsey

Mon Mar 4 15:08:05 GMT 1996