"Linux Gazette...making Linux just a little more fun!"


A Glimpse of Icon

By Clinton Jeffery and Shamim Mohamed


Motivation


Many languages introduce special capabilities for specific kinds of applications, but few present us with more powerful control structures or programming paradigms. You may be comfortable sticking with a language you already know, but if you are challenged to write complex programs and are short on time, you need the best language for the job. Icon is a high-level programming language that looks like many other programming languages but offers many advanced features that add up to big gains in productivity. Before we get to all that, let us write the canonical first program:

procedure main()
   write("Hello, world!")
end
If you've installed Linux Icon, Save this in a file named hello.icn and run icont, the Icon translator on it:
icont hello -x
icont performs some syntax checking on hello.icn and transforms the code into instructions for the Icon virtual machine, which will be saved in hello. The -x option tells icont to execute the program also.

We are introducing many concepts, so don't expect to understand everything the first time through -- the only way to learn a language is to write programs in it; so get Icon, and take it for a test drive.

Genealogy

Despite its name, Icon is not a visual programming language -- its look-and-feel descends from Pascal and C. The important thing about Icon is not that its syntax is easy to learn. Icon's semantics, which generalize ideas from SNOBOL4, succeed in adding considerable power to the familiar notation found in most programming languages. This is noteworthy because most languages that add real power (APL, Lisp, and SmallTalk are examples) do so with a syntax that is so different that programmers must learn a new way of thinking to use them. Icon adds power `under the hood' of a notation most programmers are already comfortable with.

Icon was developed over several years at the University of Arizona by a team led by Ralph Griswold. Today, it runs on many platforms and is used by researchers in algorithms, compilers, and linguistics as well as system administrators and hobbyists. The implementation and source code are in the public domain.


Variables, Expressions, and Type


Icon's expression syntax starts out much as do most languages. For example, i+j represents the arithmetic addition of the values stored in the variables i and j, f(x) is a call to f with argument x, variables may be global or local to a procedure, and so on.

Variable declarations are not required, and variables can hold any type of value. However, Icon is a strongly typed language; it knows the type of each value and it does not allow you to mix invalid types in expressions. The basic scalar types are integers, real numbers, strings, and sets of characters (csets). Integers and strings can be arbitrarily large; and strings can contain any characters. There are also structured types: lists, associative arrays, sets and records. Icon performs automatic storage management.

Goal-directed Expression Evaluation

Icon's major innovation is its expression evaluation mechanism. It avoids certain problems widely found in conventional languages in which each expression always computes one result. In such languages, if no valid result is possible, a sentinel value such as -1, NULL, or EOF (end-of-file) is returned instead. The program must check for such errors using boolean logic and if-then tests, and the programmer must remember many different sentinel values used in different circumstances. This is cumbersome. Alternative ideas such as exceptions have been developed in some languages, but they introduce complexity and problems of their own.

Success and Failure

In Icon, expressions are goal-directed. When it is evaluated, every expression has a goal of producing results for the surrounding expression. If an expression succeeds in producing a result, the surrounding expression executes as intended, but if an expression cannot produce a result, it is said to fail and the surrounding expression cannot be performed and in turn fails. This powerful concept subsumes Boolean logic and the use of sentinel values, and allows a host of further improvements. As an example, consider the expression i > 3 -- if the value i is greater than 3 the expression succeeds, otherwise it fails.

Control structures such as if check for success, so

   if i > 3 then ...
does the expected thing. Since the expression semantics are not encumbered with the need to propagate boolean (or 0 and 1) values, comparison operators can instead propagate a useful value (their right operand), allowing expressions such as 3 > i > 7 which is standard in mathematics, but doesn't work in most languages.

Since functions that fail do not need to return an error code separately from the results, detecting cases such as end-of-file is simpler, as in:

   if line := read() then write(process(line))
On end-of-file, read() fails, causing the assignment expression tested in the if-part to fail. When the test fails, the then branch is not executed so the call to write() does not occur. Since failure propagates through an expression, the above example is equivalent to
   write(process(read())

Generators

Some expressions can naturally produce more than one result. These expressions are called generators. Consider the task of searching for a substring within a string, and returning the position at which the substring occurs, as in Icon's find() function:
   find("or", "horror")
In conventional languages, this would return one of the possible return values, usually the first or the last. In Icon, this expression is capable of returning all the values, depending on the execution context. If the surrounding expression only needs one value, as in the case of an if test or an assignment, only the first value of a generator is produced. If a generator is part of a more complex expression, then the return values are produced in sequence until the whole expression produces a value. In the expression
   find("or", "horror") > 3 
the first value produced by find(), a 2, causes the > operation to fail. Icon resumes the call to find(), which produces a 4, and the expression succeeds.

The most obvious generator is the alternation operator |. The expression

   expr1 | expr2
is a generator that produces its lefthand side followed by its righthand side, if needed by the surrounding expression. Consider f(1|2) -- f is first invoked with the value 1; if that does not produce a value, the generator is resumed for another result and f will be called again with the value 2. As another example of the same operator,
   x = (3 | 5)
is equivalent to but more concise than C's (x == 3) || (x == 5). When more than one generator is present in an expression, they are resumed in a LIFO manner.
   (x | y) = (3 | 5)
is the Icon equivalent of C's
   (x == 3) || (x == 5) || (y == 3) || (y == 5)

In addition to | , Icon has a generate operator ! that generates elements of data structures, and a generator to that produces ranges of integers. For example, !L generates the elements of list L, and 1 to 10 generates the first ten positive integers. Besides these operators that generate results, most generators in Icon take the form of calls to built-in and user-defined procedures. Procedures are discussed below.

Iteration

Icon has the ordinary while loop where the control expression is evaluated before each iteration. For generators, an alternative loop is available where the loop body executes once per result produced by a single evaluation of the control expression. The alternative loop uses the reserved word every and can be used in conjunction with the to operator to provide the equivalent of a for-loop:
   every i := 1 to 10 do ...
The point of every and to is not that you can use them to implement a for-loop; Icon's generator mechanism is quite a bit more general. The every loop lets you walk through all the results of a generator giving you iterators for free. And every isn't limited to sequences of numbers or traversals of specific data structures like iterators in some languages; it works on any expression that contains generators.
   every f(1 | 1 | 2 | 3 | 5 | 8)
executes the function f with the first few fibonacci numbers, and the example could be generalized to a user-defined generator procedure that produced the entire fibonacci sequence. Using generators requires a bit of practice, but then it is fun!


Procedures


Procedures are a basic building block in most languages, including Icon. Like C, an Icon program is organized as a collection of procedures and execution starts from a procedure named main(). Here is an example of an ordinary procedure. This one generates and sums the elements of a list L, whose elements had better be numbers (or convertible to numbers).

procedure sum(L)
   total := 0
   every total +:= !L
   return total
end

A user can write her own generator by including a

   suspend expr
in a procedure where a result should be produced. When a procedure suspends, it transfers a result to the caller, but remains available to continue where it left off and generate more results. If the expression from which it is called needs more or different results in order to succeed, the procedure will be resumed. The following example generates the elements from parameter L, but filters out the zeros.
procedure nonzero(L)
   every i := !L do
      if i ~= 0 then suspend i
end
The fail expression makes the procedure fail, i.e. causes control to go back to the calling procedure without returning a value. A procedure also fails implicitly if control flows off the end of the procedure's body.


String Processing


Besides expression evaluation, Icon offers compelling features to reduce the effort required to write complex programs. From Icon's ancestor SNOBOL4, the granddaddy of all string processing languages, Icon inherits some of the most flexible and readable built-in data structures found in any language.

Strings

Parts of Icon strings are accessed using the subscript operator. Indexes denote the positions between characters, and pairs of indexes are used to pick out substrings. If s is the string "hello, world" then the expressions
   s[7] := " linux "
   s[14:19] := "journal"
change s into "hello, linux journal", illustrating the ease with which insertions and substitutions are made. A myriad of built-in functions operate on strings; among them are the operators for concatenation (s1 || s2) and size (*s).

String Scanning

The string analysis facility of Icon is called scanning. A scanning environment is set up by the ? operator:
   s ? expr
A scanning environment has a string and a current position in it. Matching functions change this position, and return the substring between the old and new positions. Here is a simple example:
   text ? {
      while move(1) do
         write(move(1))
   }
move is a function that advances the position by its argument; so this code writes out every alternate character of the string in text. Another matching function is tab, which sets the position to its argument. String analysis functions examine a string and generate the interesting positions in it. We have already seen find, which looks for substrings. These functions default their subject to the string being scanned. Here is a procedure that produces the words from the input:
procedure getword()
   while line := read() do 
      line ? while tab(upto(wchar)) do {
         word := tab(many(wchar))
         suspend word
      }
end
upto(c) returns the next position of a character from the cset c; and many(c) returns the position after a sequence of characters from c. The expression tab(upto(wchar)) advances the position to a character from wchar, the set of characters that make up words; then tab(many(wchar)) moves the position to the end of the word and returns the word that is found.


Regular Expressions


The Icon Program Library (included with the distribution) provides regular expression matching functions. To use it, include the line link regexp at the top of the program. Here is an example of `search-and-replace':

procedure re_sub(str, re, repl)
   result := ""
   str ? {
      while j := ReFind(re) do {
         result ||:= tab(j) || repl
         tab(ReMatch(re))
      }
      result ||:= tab(0)
   }
   return result
end


Structures


Icon has several structured (or non-scalar) types as well that help organize and store collections of arbitrary (and possibly mixed) types of values. A table is an associative array, where values are stored indexed by keys which may be of arbitrary type; a list is a group of values accessed by integer indices as well as stack and queue operations; a set is an unordered group of values, etc.

Tables

A table is created with the table function. It takes one argument: the default value, i.e. the value to return when lookup fails. Here is a code fragment to print a word count of the input (assuming the getword function generates words of interest):
   wordcount := table(0)
   every word := getword() do
      wordcount[word] +:= 1
   every word := key(wordcount) do
      write(word, "\t", wordcount[word])
(The key function generates the keys with which values have been stored.) Since the default value for the table is 0, when a new word is inserted, the default value gets incremented and the new value (i.e. 1) is stored with the new word. Tables grow automatically as new elements are inserted.

Lists

A list can be created by enumerating its members:
   L := ["linux", 2.0, "unix"]
Lists are dynamic; they grow or shrink through calls to list manipulation routines like pop() etc. Elements of the list can be obtained either through list manipulation functions or by subscripting:
   write(L[3])
There is no restriction on the kinds of values that may be stored in a list.

Records and Sets

A record is like a struct in C, except that there is no restriction on the types that can be stored in the fields. After a record is declared:
record complex(re, im)
instances of that record are created using a constructor procedure with the name of the record type, and on such instances, fields are accessed by name:
   i := complex(0, 0)
   j := complex(1, -1)
   if a.re = b.re then ...

A set is an unordered collection of values with the uniqueness property i.e. an element can only be present in a set once.

   S := set(["rock lobster", 'B', 52])
The functions member, insert, and delete do what their names suggest. Set intersection, union and difference are provided by operators. A set can contain any value (including itself, thereby neatly sidestepping Russell's paradox!).


Graphs


Since there is no restriction on the types of values in a list, they can be other lists too. Here's an example of a how a graph or tree may be implemented with lists:

record node(label, links)
   ...
   barney := node("Barney", list())
   betty := node("Betty", list())
   bambam := node("Bam-Bam", list())
   put(bambam.links, barney, betty)

An Example

Let us now do a little example to illustrate the above concepts. Here is a program to read a file, and generate a concordance i.e. for every word, print a list of the lines it occurs on. We want to skip short words like `the' though, so we only count the words longer than 3 characters.
global wchar
procedure main(args)
   wchar := &ucase ++ &lcase
   (*args = 1) | stop("Need a file!")
   f := open(args[1]) | stop("Couldn't open ", args[1])
   wordlist := table()
   lineno := 0
   while line := read(f) do {
      lineno +:= 1
      every word := getword(line) do 
         if *word > 3 then {
            # if word isn't in the table, set entry to empty list
            /wordlist[word] := list()
            put(wordlist[word], lineno)
         }
   }
   L := sort(wordlist)
   every l := !L do {
      writes(l[1], "\t")
      linelist := ""
      # Collect line numbers into a string
      every linelist ||:= (!l[2] || ", ")
      write(linelist[1:-2])
   }
end

procedure getword(s)
   s ? while tab(upto(wchar)) do {
      word := tab(many(wchar))
      suspend word
   }
end
If we run this program on this input:
Sing, Mother, sing.
Can Mother sing?
Mother can sing.
Sing, Mother, sing!
the program writes this output:
Mother  1, 2, 3, 4
Sing    1, 4
sing    1, 2, 3, 4
While we may not have covered all the features used in this program, it should give you a feeling for the flavour of the language.


Co-expressions


Another novel control facility in Icon is the co-expression, which is an expression encapsulated in a thread-like execution context where its results can be picked apart one at a time. Co-expressions are are more portable and more fine-grained than comparable facilities found in most languages. Co-expressions let you `capture' generators and then use their results from multiple places in your code. Co-expressions are created by

   create expr
and each result of the co-expression is requested using the @ operator.

As a small example, suppose you have a procedure prime() that generates an infinite sequence of prime numbers, and want to number each prime as you print them out, one per line. Icon's seq() function will generate the numbers to precede the primes, but there is no way to generate elements from the two generators in tandem; no way except using co-expressions, as in the following:

   numbers := create seq()
   primes := create prime()
   every write(@numbers, ": ", @primes)
More information about co-expressions can be found at http://www.drones.com/coexp/ and a complete description is in the Icon language book mentioned below.


Graphics


Icon features high-level graphics facilities that are portable across platforms. The most robust implementations are X Window and Microsoft Windows; Presentation Manager, Macintosh, and Amiga ports are in various stages of progress. The most important characteristics of the graphics facilities are:

As a short example, the following program opens a window and allows the user to type text and draw freehand on it using the left mouse button, until an ESC char is pressed. Clicking the right button moves the text cursor to a new location. Mode "g" in the call to open stands for "graphics". &window is a special global variable that serves as a default window for graphics functions. &lpress, &ldrag, and &rpress are special constants that denote left mouse button press and drag, and right mouse button press, respectively. &x and &y are special global variables that hold the mouse position associated with the most recent user action returned by Event(). "\e" is a one-letter Icon string containing the escape character.
procedure main()
   &window := open("LJ example","g")
   repeat case e := Event() of {
      &lpress | &ldrag : DrawPoint(&x,&y)
      &rpress : GotoXY(&x,&y)
      "\e"    : break
      default : if type(e)=="string" then writes(&window, e)
      }
end
A complete description of the graphics facilities is available on the web at http://www.cs.arizona.edu/icon/docs/ipd281.html


POSIX Functions


An Icon program that uses the POSIX functions should include the header file posixdef.icn. On error, the POSIX functions fail and set the keyword &errno; the corresponding printable error string is obtained by calling sys_errstr().

Unix functions that return a C struct (or a list, in Perl) return records in Icon. The fields in the return values have names similar to the Unix counterparts: stat() returns a record with fields ino, nlink, mode etc.

A complete description of the POSIX interfaces is included in the distribution; an HTML version is available on the web, at http://www.drones.com/unicon/. We look at a few small examples here.


An Implementation of ls


Let us look at how a simple version of the Unix ls command may be written. What we need to do is to read the directory, and perform a stat call on each name we find. In Icon, opening a directory is exactly the same as opening a file for reading; every read returns one filename.

     f := open(dir) | stop(name, ":", sys_errstr(&errno))
     names := list()
     while name := read(f) do
          push(names, name)
     every name := !sort(names) do
          write(format(lstat(name), name, dir))
The lstat function returns a record that has all the information that lstat(2) returns. One difference between the Unix version and the Icon version is that the mode field is converted to a human readable string -- not an integer on which you have to do bitwise magic on. (And in Icon, string manipulation is as natural as a bitwise operation.)

The function to format the information is simple; it also checks to see if the name is a symbolic link, in which case it prints the value of the link also.

link printf
procedure format(p, name, dir)
   s := sprintf("%7s %4s %s %3s %8s %8s %8s %s %s",
	   p.ino, p.blocks, p.mode, p.nlink,
	   p.uid, p.gid, p.size, ctime(p.mtime)[5:17], name)

   if p.mode[1] == "l" then
      s ||:= " -> " || readlink(dir || "/" || name)

   return s
end


Polymorphism and other pleasant things


It's not just stat that uses human-readable values -- chmod can accept an integer that represents a bit pattern to set the file mode to, but it also takes a string just like the shell command:

     chmod(f, "a+r")
And the first argument: it can be either an opened file or a path to a file. Since Icon values are typed, the function knows what kind of value it's dealing with -- no more fchmod or fstat. The same applies to other functions -- for example, the Unix functions getpwnam, getpwuid and getpwent are all subsumed by the Icon function getpw which does the appropriate thing depending on the type of the argument:
     owner := getpw("ickenham")
     root := getpw(0)
     while u := getpw() do ...
Similarly, trap and kill can accept a signal number or name; wait returns human-readable status; chown takes a username or uid; and select takes a list of files.


Using select


The select() function waits for input to become available on a set of files. Here is an example of the usage -- this program waits for typed input or for a window event, with a timeout of 1000 milliseconds:

   repeat {
      while *(L := select([&input, &window], 1000)) = 0 do
         ... handle timeout
      if &errno ~= 0 then
         stop("Select failed: ", sys_errstr(&errno))

      every f := !L do 
         case f of {
            &input  : handle_input()
            &window : handle_evt()
      }
   }
If called with no timeout value, select will wait forever. A timeout of 0 performs a poll.


Networking


Icon provides a much simpler interface to BSD-style sockets. Instead of the four different system calls that are required to start a TCP/IP server using Perl, only one is needed in Icon--the open function opens network connections as well as files. The first argument to open is the network address to connect to -- host:port for Internet domain connections, and a filename for Unix domain sockets. The second argument specifies the type of connection.

Here is an Internet domain TCP server listening on port 1888:

procedure main()
     while f := open(":1888", "na") do
          if fork() = 0 then {
               service_request(f)
               exit()
          } else
               close(f)
     stop("Open failed: ", sys_errstr(&errno))
end
The "na" flags indicate that this is a network accept. Each call to open waits for a network connection and then returns a file for that connection. To connect to this server, the "n" (network connect) flag is used with open. Here's a function that connects to a `finger' server:
procedure finger(name, host)
     static fserv
     initial fserv := getserv("finger") |
          stop("Couldn't get service: ", sys_errstr(&errno))

     f := open(host || ":" || fserv.port, "n") | fail
     write(f, name) | fail
     while line := read(f) do
          write(line)
end
Nice and simple, isn't it? One might even call it Art! On the other hand, writing socket code in Perl is not much different from writing it in C, except that you have to perform weird machinations with pack. No more! Eschew obfuscation, do it in Icon.

UDP

UDP networking is similar; using "nu" as the second argument to open signifies a UDP connection. A datagram is sent either with write or send, and is received with receive. Here is a simple client for the UDP `daytime' service, something like rdate(1):
   s := getserv("daytime", "udp")
   
   f := open(host || ":" || s.port, "nu") |
      stop("Open failed: ", sys_errstr(&errno))

   writes(f, " ")

   if *select([f], 5000) = 0 then
      stop("Connection timed out.")

   r := receive(f)
   write("Time on ", host, " is ", r.msg)
Since UDP is not reliable, the receive is guarded with select (timeout of 5000 ms), or the program might hang forever if the reply is lost.


Icon and other languages


The popular languages Perl and Java have been covered in LJ, and we think it is worth discussing how Icon stacks up against these dreadnaughts.

Perl and Icon

Perl and Icon are both used for similar purposes. Both languages offer high-level data structures like lists, associative arrays, etc. Both make it easy to write short prototypes by not requiring extensive declarations; and both were intended by their designers to be `user friendly' i.e. intended to make programming easier for the user rather than to prove some theoretical point.

But when it comes to language design, Perl and Icon are not at all alike. Perl has been designed with very little structure -- or, as Larry Wall puts it, it's more like a natural language than a programming language. Perl looks strange but underneath the loose syntax its semantics are those of a conventional imperative language. Icon, on the other hand, looks more like a conventional imperative language but has richer semantics.

Advantages of Perl

Perl's pattern matching, while not as general a mechanism as Icon's string scanning, is more concise for recognizing those patterns that can be expressed as regular expressions. Perl's syntax looks and feels natural to long-time die-hard UNIX gurus and system administrators, who have been using utilities such as sh, sed, and awk. For some people, Perl is especially appealing because mastering its idiosyncracies places one in an elite group.

Some misfeatures of Perl

Let us look at some things that are (in our opinion) undesirable qualities of Perl. These problems do not negate Perl's ingenious features, they merely illustrate that Perl is no panacea.

Namespace confusion: it is a bad idea to allow scalar variables, vector variables and functions to have the same name. This seems like a useful thing to do, but it leads to write-only code. We think this is primarily why it's hard to maintain Perl programs. A couple of things are beyond belief -- $foo and %foo are different things, but the expression $foo{bar} actually refers to an element of %foo!

Parameter passing is a mess. Passing arrays by name is just too confusing! Even after careful study and substantial practice, we still are not absolutely certain about how to use *foo in Perl. As if to make up for the difficulty of passing arrays by reference, all scalars are passed by reference! That's very unaesthetic.

Why are there no formal parameters? Instead, one has to resort to something that looks like a function call to declare local variables and assign @_ to it. Allowing the parentheses to be left off subroutine calls is also unfortunate; it is another `easy to write, hard to read' construct. And the distinction between built-in functions and user-defined subroutines is ugly.

Variables like $` are a bad idea. We think of special characters as punctuation, we don't expect them to be (borrowing Wall's terminology) nouns. And the mnemonics that are required are evidence that these variables place an additional burden of memorization upon the programmer. (Quick, if you write Perl programs: What's `$('?)

The distinction between array and scalar contexts also leads to obfuscated code. Certainly after you've been writing Perl for a while, you get used to it (and might even like it), but again, this is just confusing. All the warnings in the Perl documentation about being certain you are evaluating in the right context is evidence of this.

Java and Icon

Java takes the middle road in between C/C++ and the class of `very high level languages' such as Icon and Perl. Java and Icon use a similar virtual machine (VM) model. Java's VM is both lower-level and more machine-independent than the Icon VM, but these are implementation artifacts and it would be possible to implement Java on the Icon VM or Icon on the Java VM.

The important differences between Java and Icon are differences of philosophy. The Java philosophy is that everything is an object, nothing is built-in to the language, and programmers should learn class libraries for all non-trivial structures and algorithms. Java's lack of operator overloading means that its object-oriented notation allows no "shorthand" as does C++. Java's simplicity is a welcome relief after C++, but its expressive power is so weak compared to Icon (and several other very high level languages) that it is hard to argue that Java can supplant these languages. Most of Java's market share is being carved out of the C and C++ industries.

How to Improve Java

Java has few bad mistakes. The Sumatra Project has itemized some of them in The Java Hall of Shame at http://www.cs.arizona.edu/sumatra/hallofshame/. Most of Java's misfeatures are sins of omission, because the language designers were trying to be elegant and minimal. We would like to see a Java dialect with features such as Icon's goal-directed evaluation, Perl's pattern matching, and APL's array-at-a-time numeric operators; a description of such a dialect is at http://segfault.cs.utsa.edu/godiva/.


Getting Icon


Users who become serious about the language will want a copy of `The Icon Programming Language', by Ralph and Madge Griswold, Peer-to-Peer Communications 1997, ISBN 1-57398-001-3.

Lots of documentation for Icon is available from the University of Arizona, at http://www.cs.arizona.edu/icon/ There is also a newsgroup on Usenet: comp.lang.icon.

The Icon source distribution is at:
ftp://ftp.cs.arizona.edu/icon/packages/unix/unix.tgz
The POSIX functions are in the following patch that you need to apply if you wish to build from sources:
ftp://ftp.drones.com/unicon-patches.tar.gz

Linux binaries (kernel 2.0 ELF, libgdbm 2.0.0, libX11 6.0, libdl 1.7.14, libm 5.0.0 and libc 5.2.18) for Icon (with X11 and POSIX support) are available at

ftp://ftp.drones.com/icon-9.3-3.i386.rpm Icon ftp://ftp.drones.com/icon-ipl-9.3-3.i386.rpm Icon Program Library
ftp://ftp.drones.com/icon-idol-9.3-3.i386.rpm Idol: Object-oriented Icon
ftp://ftp.drones.com/icon-vib-9.3-3.i386.rpm VIB: The Visual Interface Builder
ftp://ftp.drones.com/icon-docs-9.3-3.i386.rpm Documentation


Copyright © 1998, Clinton Jeffery and Shamim Mohamed
Published in Issue 27 of Linux Gazette, April 1998


[ TABLE OF CONTENTS ] [ FRONT PAGE ]  Back  Next