CHAPTER 5 Some sample programs

This chapter contains a small collection of very simple sample programs. These programs are not random. They are based on the set of simple language benchmark tests used in The Great Computer Language Shootout by Doug Bagley ( http://www.bagley.org/~doug/shootout/ ) and The Great Win32 Computer Language Shootout by Aldo Calpini ( http://dada.perl.it/shootout/ ). These programs have been chosen because at those sites you can view programs written to exactly the same specification in almost any programming language you are likely to know.

The specification of this benchmark suite demands that some of the programs are implemented the same way as they are implemented in the other languages. Others are merely required to do the same thing.

Many of the tests take a single optional command-line argument being the integer number of times some loop is to be repeated. This is typically obtained in each program with a line like:

Some tests expect input data, which is generally read from standard input. See the sites mentioned above for further details.

No comment will be made on code that should be unsurprising to someone who knows C.

Array access

This test must be implemented in the same way in all languages. It must first build an array full of integers, then repeatedly add them to a second array, with each loop running backwards through the array.

Notice the use of the build() function to make the first array. The "i" argument to build() causes the content to be auto-incrementing integers, the 1 is the start value.

The second call to build() makes an array of size n with each element initialised to 0. The "c" argument means "apply the initialiser(s) cyclically to leaf elements of the built data".

n = argv[1] ? int(argv[1]) : 1;

x = build(n, "i", 1);
y = build(n, "c", 0);

for (k = 0; k < 1000; ++k)
{
    for (i = n - 1; i >= 0; --i)
        y[i] += x[i];
}

printf("%d %d\n", y[0], y[n - 1]);

Count lines/words/characters

This test must count lines, words and characters from standard input and must do the same thing as the versions in other languages. However, it is not allowed to read the whole input at once, but must limit its read to no more than 4K bytes. There is no easy way to do this ICI except by reading lines.

Notice the use of the smash() function to get the words of each line. smash() is the most common method of breaking up strings. The \S+ pattern matches one or more non-whitespace characters. If we really wanted the words, the last argument to the smash() call would have been "\\&" (meaning append the matched portion to the array being built). However, we only want to count the words, so we just push empty strings. This saves the cost of actually extracting and creating the string.

The nels() function returns the number of elements in an array or string (or anything else).

nl = nw = nc = 0;
while (l = getline())
{
     ++nl;
    nc += nels(l) + 1;
    nw += nels(smash(l, #\S+#, ""));
}
printf("%d %d %d\n", nl, nw, nc);

Echo client/server

For this test, each language is required to do the same thing. The specification says it should fork a child process that repeatedly sends a message to the parent (server), which echoes it back to the child (client), which checks it is correct. Because fork() is only available in versions of ICI running on UNIX-like systems (in the sys module), we actually use a thread here.

This test uses the ICI net module, which provides sockets-based networking primitives (it is documented separately).

Notice the use of waitfor to wait for the child thread to finish. The status field of the child thread will be "active" until the echo_client function returns (or fails). The thread object istself is waited on. A wakeup is automatically done on any thread object on thread termnation.

Notice also that the iteration count n is implicitly created by simple assignment, but data is explicitly declared static. Implicit variable are always created in the innermost scope. At the file parse level there is a local "auto" scope which exists and is visible only at the file parse level, just as the "auto" variables of a function only exist and are visible within an invocation of a function. The n isn't visible to the running function echo_client and doesn't need to be. However data does need to be. The static declaration of data gives it a sufficiently outer scope to be visible inside the running function echo_client. (The function is also running in a separate thread, but that doesn't change the scoping at all.)

Finally, notice the use of the := operator to assign to sock in echo_client. This is the commonest way of introducing new local (auto) variables in a function. The := operator forces the assignment to be in the most-local scope, even if a variable of the same name already existed in an outer scope.

n = argv[1] ? int(argv[1]) : 1;
static data = "Hello there sailor\n";

static
echo_client(n, port)
{
    sock := net.connect(net.socket("tcp/ip"), port);
    for (i := 0; i < n; ++i)
    {
        net.send(sock, data);
        if ((ans := net.recv(sock, nels(data))) != data)
            fail(sprintf("received \"%s\", expected \"%s\"",
                ans, data));
    }
    net.close(sock);
}

ssock = net.listen(net.bind(net.socket("tcp/ip"), 0));
client = thread(echo_client, n, net.getportno(ssock));
csock = net.accept(ssock);
t = 0;
while (str = net.recv(csock, nels(data)))
{
    net.send(csock, str);
    t += nels(str);
}
waitfor(client.status != "active"; client)
    ;
printf("server processed %d bytes\n", t);

Exception mechanisms

For this test, each language is required to implement it the same way. The outer loop calls hi_function() which calls lo_function() which calls blowup(). The blowup() function throws two types of exceptions, one of which must be caught by lo_function() and the other by hi_function().

ICI cannot selectively catch exceptions, so in lo_function() we must catch and re-throw the exception that is not for us. ICI exceptions are very simple, just being a string. They really are intended just for errors, not as a general programming mechanism. (Although they are reasonably efficient.)

N = argv[1] ? int(argv[1]) : 1;

static HI = 0;
static LO = 0;

static
blowup(n)
{
    fail(n & 1 ? "low" : "hi");
}

static
lo_function(n)
{
    try
        blowup(n);
    onerror
    {
        if (error !~ #low#)
            fail(error);
        ++LO;
    }
}

static
hi_function(n)
{
    try
        lo_function(n);
    onerror
        ++HI;
}

static
some_function(n)
{
    try
        hi_function(n);
    onerror
        fail(error + " -- we shouldn't get here");
}

while (N)
    some_function(N--);

printf("Exceptions: HI=%d / LO=%d\n", HI, LO);

Hash (associative array) access

All languages must implement this test the same way. We store the integers from 1..N in an array indexed by the hex string of the integer, then access it with decimal strings. Only some of the decimal strings will strike values we stored under the hex string keys, we must print how many.

Notice that the "struct" is ICI's associative array type (a.k.a. hash, map, dict, etc).

n = argv[1] ? int(argv[1]) : 1;

x = struct();
for (i = 1; i <= n; ++i)
    x[sprintf("%x", i)] = i;

c = 0;
for (i = n; i > 0; --i)
    c += x[string(i)] != NULL;

printf("%d\n", c);

Hashes, part II

This is like the above, but isn't swamped by the time to make strings. The strings are made first, then used repeatedly.

Notice the use of the forall statement in the main loop. This could have been a for loop with some variable stepping from 0 to 10000. However, when looping over all the elements of an aggregate, a forall loop is generally clearer and faster. Notice that in this case there are two loop variables, v, the values in the aggregate, and k, the key (i.e. index) you would use to find that value.

n = argv[1] ? int(argv[1]) : 1;

h1 = struct();
for (i = 0; i < 10000; ++i)
    h1[sprintf("foo_%d", i)] = i;

h2 = struct();
for (i = 0; i < n; ++i)
{
    forall (v, k in h1)
    {
        if (h2[k] == NULL)
            h2[k] = 0;
        h2[k] += v;
    }
}

printf("%d %d %d %d\n", h1["foo_1"], h1["foo_9999"],
    h2["foo_1"], h2["foo_9999"]);

Heapsort

In this test each language is required to implement an in-place heapsort in the same way. Notice again the explicit declaration of some variables static to make them visible inside functions. Also notice the declaration of last as static inside the gen_random() function. This is almost completely pointless, as it gets exactly the same visibility as the ones declared outside the function.

static IM = 139968;
static IA = 3877;
static IC = 29573;

static
gen_random(max)
{
    static last = 42;

    return max * (last = (last * IA + IC) % IM) / IM ;
}

static
heapsort(n, ra)
{
    ir = n;
    l = (n >> 1) + 1;
    for (;;)
    {
        if (l > 1)
        {
            rra = ra[--l];
        }
        else
        {
            rra = ra[ir];
            ra[ir] = ra[1];
            if (--ir == 1)
            {
                ra[1] = rra;
                return;
            }
        }
        i = l;
        j = l << 1;
        while (j <= ir)
        {
            if (j < ir && ra[j] < ra[j+1])
                ++j;
            if (rra < ra[j])
            {
                ra[i] = ra[j];
                j += (i = j);
            }
            else
            {
                j = ir + 1;
            }
        }
        ra[i] = rra;
    }
}

N = argv[1] ? int(argv[1]) : 1;
ary = array();
for (i = 0; i <= N; ++i)
    ary[i] = gen_random(1.0);
heapsort(N, ary);
printf("%.10f\n", ary[N]);

List operations

All languages must implement this test the same way. In short, using a native data structure, make a list of integers from 1 through 10000, then copy it, then item by item transfer the head item to the end of a new list, then item by item transfer the end item of that list to the end of a new list, then reverse the new list.

For this test we use arrays, which can be efficiently pushed and popped at both ends. Notice the use of build() again to make the array of integers. There is no built-in reverse function in ICI, so it is done manually. Notice the use of the swap operator, <=>, in the reversal code.

NUM = argv[1] ? int(argv[1]) : 1;

static SIZE = 10000;

static
test_lists()
{
    li1 := build(SIZE, "i", 1);
    li2 := copy(li1);
    li3 := array();
    while(nels(li2))
        push(li3, rpop(li2));
    while (nels(li3))
        push(li2, pop(li3));
    n := SIZE / 2;
    for (i := 0; i < n; ++i)
        li1[i] <=> li1[SIZE - i - 1];
    if (li1[0] != SIZE || li1 != li2)
        return 0;
    return nels(li1);
}

for (i = 0; i < NUM; ++i)
    result = test_lists();
printf("%d\n", result);

Matrix multiplication

In this test each language is required to multiply two 30 x 30 matrices.

As it happens, the required data in the matricies are the numbers from 1 to 30 in row-column order. We can use the build() function to easily make these. This is a good illustration of the way the build function separates the structure being built, from the generation of the content used to fill leaf elements. This is a simple two-dimensional array, but more complex data structures can be built, also including nested structures.

Note that there are no true multi-dimensional arrays in ICI. Each matrix is a single 30 element array of 30 element sub-arrays.

The actual matrix multiplication might be most naturally done by three nested for loops over the array dimensions. However, forall loops have been used here because they turned out to be slightly faster. The two outer forall loops loop over the sub-arrays (columns) and the leaf values within them of the output matrix. This is a bit artificial in the middle loop because the loop variable val is not used. The inner-most loop foralls over one of the columns of the first input matrix, while it steps along the rows of the second matrix.

static
mmult(rows, cols, m1, m2, m3)
{
    forall (col, i in m3)
    {
        m1i := m1[i];
        forall (val, j in col)
        {
            val = 0;
            forall (m1ik, k in m1i)
                val += m1ik * m2[k][j];
            col[j] = val;
        }
    }
}

SIZE := 30;
n := argv[1] ? int(argv[1]) : 1;
m1 := build(SIZE, SIZE, "i", 1);
m2 := build(SIZE, SIZE, "i", 1);
mm := build(SIZE, SIZE);
for (i = 0; i < n; ++i)
    mmult(SIZE, SIZE, m1, m2, mm);
printf("%d %d %d %d\n", mm[0][0], mm[2][3], mm[3][2],
    mm[4][4]);

Method calls

Each language is required to implement this test in the same way. In short, we must make a class Toggle with a state member, and a sub-class NthToggle with additional count and count_max members. Toggle has an acivate() method that flips the state. But NthToggle's overridden activate() does an extra flip every count_max calls.

Instances of classes, and classes themselves, are just structs. They are in scope when executing methods. So note below in the activate() methods we can simply refer to state, count, and count_max, rather than this.state (which would also work). However, the only way to call a method is with the : or :^ operators, and they require an object on the left, so you have to use this when you call another method in your own class, even though the function itself is directly visible in your current scope.

An imporant thing to note here is the explicit calls to super-class functions with, for example, this:^new(). In general, the : operator is used to reference a function of an object and make a callable method, as in toggle:activate(). The :^ operator is used (in methods) to reference a function in a super class, as opposed to calling yourself again (or even worse, a sub-class function of the same name).

Notice that new() is a "class" method, while activate() is an "instance" method. There is no distinction in their declaration, it is just what they do that makes it so.

static Toggle = [class

    new(start_state)
    {
        t := this:^new();
        t.state := start_state;
        return t;
    }

    activate()
    {
        state = !state;
        return this;
    }
    
    value()
    {
        return state;
    }
];

static NthToggle = [class:Toggle,

    new(start_state, count_max)
    {
        t := this:^new(start_state);
        t.count_max := count_max;
        t.counter := 0;
        return t;
    }

    activate()
    {
        this:^activate();
        if (++counter >= count_max)
        {
            state = !state;
            counter = 0;
        }
        return this;
    }
];

n := argv[1] ? int(argv[1]) : 1;

toggle := Toggle:new(1);
for (i = 0; i < n; ++i)
    val = toggle:activate():value();
printf(val ? "true\n" : "false\n");

ntoggle := NthToggle:new(val, 3);
for (i = 0; i < n; ++i)
    val = ntoggle:activate():value();
printf(val ? "true\n" : "false\n");

Nested loops

Each language is required to implement this the same way. This is a very simple test, but here are two methods. The first is slightly faster.

n := argv[1] ? int(argv[1]) : 1;
x := 0;
z := build(n, "i");
forall (a in z)
    forall (b in z)
        forall (c in z)
            forall (d in z)
                forall (e in z)
                    forall (f in z)
                        ++x;
printf("%d\n", x);

The following is probably more natural.

n := argv[1] ? int(argv[1]) : 1;
x := 0;
for (a = n; a--; )
    for (b = n; b--; )
        for (c = n; c--; )
            for (d = n; d--; )
                for (e = n; e--; )
                    for (f = n; f--; )
                        ++x;
printf("%d\n", x);

Producer/consumer threads

Each language must implement this test in the same way. In this test two threads share the common data variable, which the producer uses to pass successive integers to the consumer. Access to the shared data variable is gated with a flag count.

This test illustrates the use of waitfor and wakeup(). Notice that waitfor has an expression that it waits to be true, and a second arbitrary object that is "waited" on. That is, if the expression is not true, it suspends execution until that object is "woken up", then it re-evaluates the expression. We have used the string "count change" as the object to wait on because it is clear, and, as strings are atomic, it will be the same string object wherever it is written. Everything in a waitfor statement is indivisible except the wait it does on the object. This includes the compound statement that it executes at completion when the condition is finally met.

Finally, note the method of waiting for each thread to finish. The thread object returned by thread() is woken up automatically when a the thread finishes. The status field of the thread object reveals its state.

static n = argv[1] ? int(argv[1]) : 1;
static count = 0;
static consumed = 0;
static produced = 0;
static data = 0;

static
producer()
{
    for (i := 1; i <= n; ++i)
    {
        waitfor (count == 0; "count change")
        {
            data = i;
            count = 1;
            wakeup("count change");
        }
        ++produced;
    }
}

static
consumer()
{
    do
    {
        waitfor (count != 0; "count change")
        {
            i = data;
            count = 0;
            wakeup("count change");
        }
        ++consumed;

    } while (i != n);
}

p := thread(producer);
c := thread(consumer);
waitfor (p.status != "active"; p)
    ;
waitfor (c.status != "active"; c)
    ;
printf("%d %d\n", produced, consumed);

Random number generator

Each language is required to implement this test the same way. The random number generator is exactly specified and is the same one used in the heapsort test above. The specification says we should use symbolic constants (and have maximum performance).

Notice the use of the $ pseudo-operator which we use to evaluate the symbolic names at parse-time (i.e. compile time). This gives the same run-time behaviour as if the numbers had been typed in directly, for a slight performance improvement. The $ operator can be used to evaluate any expression at "compile time". Like $sqrt(2.0).

static IM = 139968;
static IA = 3877;
static IC = 29573;
static last = 42;

static
gen_random(max)
{
    return max * (last := (last * $IA + $IC) % $IM) / $IM;
}

n = argv[1] ? int(argv[1]) : 1;
while (--n)
    gen_random(100.0);
printf("%.9f\n", gen_random(100.0));

Regular expression matching

Each language is required to implement this test the same way. A file of lines, some of which contain phone numbers, is loaded into an array of strings. Then they are repeatedly matched against a regular expression and the components of any phone number extracted. Matching numbers are printed in a normalised form on the last iteration.

Notice the use of gettokens() to read all the input as an array of lines. The functions gettokens() and gettoken() are one of the commonest and most efficient ways to read text files. The other is to read the entire file with getfile() and then break it up with smash(). This is also reasonably efficient as long as you don't mind reading the file all at once.

Notice both the operator ~~~ and the literal compiled regular expression enclosed in # characters. The ~~~ operator matches and extracts the matched sub-expressions. Notice that the regular expression was too long for one line, and so was broken in two. Both strings and regular expression literals can be broken up in this way (like string literals in C).

n := argv[1] ? int(argv[1]) : 1;
lines = gettokens(stdin, '\n', "");
j = 0;
while (n--)
{
    forall (l in lines)
    {
        a = l ~~~ #^[^\d(]*(?:\((\d\d\d)\)|(\d\d\d)) #
                  #(\d\d\d)[ -](\d\d\d\d)(?:\D|$)#;
        if (n == 0 && a)
            printf("%d: (%s%s) %s-%s\n", ++j, a[0], a[1],
                a[2], a[3]);
    }
}

Reverse a file

In this test, each language is required to do the same thing -- reverse lines from standard input to standard output.

Notice the use of smash() to break the whole file into an array of lines. The call to smash() repeatedly matches the regular expression against the input file. The string "\\&" instructs it to push each matched portion onto the new array it will return. The smash() function is often the first step in text parsing. Its tokenising ability is limited only by the complexity of the regular expression you need to write.

f = smash(getfile(), #[^\n]*\n#, "\\&");
while (nels(f))
    put(pop(f));

Below is an alternative version that is slightly faster because it avoids the many one line calls to put(). Instead it builds a new array with the elements in reverse, then uses implode() to concatenate all the strings in that array into a single string for output.

f = smash(getfile(), #[^\n]*\n#, "\\&");
r = array();
forall (l in f)
    rpush(r, l);
put(implode(r));

Spell checker

Each language is required to implement this test the same way. In short, load a dictionary of words, then read words, one per line, from stdin, and print the ones that aren't in the dictionary.

Notice the use of a set to store the words in. A set is simply an unordered collection (hash table) of objects where the only thing you are interested in is whether an object is in the set or not. Compared with, say, a struct where there is also an associated value.

dict := set();
forall (w in gettokens(fopen("Usr.Dict.Words"), "\n", ""))
    dict[w] = 1;

while (w = getline())
{
    if (!dict[w])
        printf("%s\n", w);
}

Statistical moments

For this test, each language is required to do the same thing. In short, we must read numbers from standard input, then compute a bunch of statistics on them in double precision.

ICI's "float" type is always double precision. Overall this code is unremarkable. Note the use of sort() to sort the list and find the median. Here it is used with a default comparison function, but an explicit comparison function can be given.

sum := 0.0;
nums := array();
forall (f in gettokens(stdin, "\n", ""))
{
    push(nums, f = float(f));
    sum += f;
}

n := nels(nums);
mean := sum / n;

deviation := 0.0;
average_deviation := 0.0;
standard_deviation := 0.0;
variance := 0.0;
skew := 0.0;
kurtosis := 0.0;

forall (num in nums)
{
    deviation = num - mean;
    average_deviation += abs(deviation);
    variance += (t := deviation * deviation);
    skew += (t *= deviation);
    kurtosis += (t *= deviation);
}
average_deviation /= n;
variance /= (n - 1);
standard_deviation = sqrt(variance);

if (variance > 0.0)
{
    skew /= n * variance * standard_deviation;
    kurtosis = kurtosis / (n * variance * variance) - 3.0;
}

sort(nums);
mid := n / 2;
if (n % 2 == 0)
    median = (nums[mid] + nums[mid - 1])/2;
else
    median = nums[mid];

printf("n:                  %d\n", n);
printf("median:             %f\n", median);
printf("mean:               %f\n", mean);
printf("average_deviation:  %f\n", average_deviation);
printf("standard_deviation: %f\n", standard_deviation);
printf("variance:           %f\n", variance);
printf("skew:               %f\n", skew);
printf("kurtosis:             %f\n", kurtosis);

String concatenation

Each language is required to implement this test the same way. In short, start with an empty string, then, n times, append the string "hello\n".

This illustrates the use of non-atomic (mutable) strings in ICI. Strings are almost invariably atomic (immutable) objects in ICI. Their use as variable names is based on this. However mutable strings can be created with the strbuf() function. These can be modified by assigning to individual characters and grown by appending, as is done here, with the strcat() function. (Note that a non-atomic string will not access the same element of a struct as an atomic string of the same value. They must be eq to access the same element, not just equal as in the == operator.)

n := argv[1] ? int(argv[1]) : 1;
s := strbuf();
for (i = 0; i < n; ++i)
    strcat(s, "hello\n");
printf("%d\n", nels(s));

Non-atomic (mutable) strings are important for the efficiency of operations like this. The following implementation of this test uses ordinary atomic strings. But this method will be O(n 2), and given this is normally run with 40,000 iterations, the performance will be very bad:

n := argv[1] ? int(argv[1]) : 1;
s := "";
for (i = 0; i < n; ++i)
    s += "hello\n";   /* Don't do this for large n */
printf("%d\n", nels(s));

In the above version, each time the += is done, a new string is formed.

Sum a column of integers

Each language is required to implement this test the same way. In short, use built-in line-oriented I/O to sum a column of integers in constant space.

The only aspect of note here is the int() function to convert the string to an integer. The float() and string() functions allow similar simple conversions.

count := 0;
while (l = getline())
    count += int(l);
printf("%d\n", count);

Word frequency count

For this test, each language is required to do the same thing. In short, from standard input, extract all the words, convert them to lowercase, and count their frequency. The program should run in constant space (i.e. not read the whole file at once). The output is lines of counts and words sorted in descending order.

This test shows that ICI has no built-in transliteration function.

The main loop is unremarkable. The smash() function is used to get words from each line. We only call our tolower() function when there are upper-case letters in the word for efficiency.

The tolower() function itself shows the commonest way of dealing with a string at the character level. That is, first of all explode() it into an array of integers, then manipulate the array or make a new one (which can be a mixture of integer character codes and strings) then implode() the array back into an atomic string. Notice the use of $ to evaluate the sub-expression once only when the statement is parsed.

The program finishes by pushing the output lines onto an array. The forall over the counts struct will produce output in (pseudo) random order. The array is then sorted with sort(). However the default comparison function won't do because the result must be descending. We don't bother to declare a separate named function for this, but just put an unnamed function literal in-line as an argument. We could have declared a named function in the normal manner and placed it's name as the second argument to sort() with the same effect.

static counts = struct()

static
tolower(s)
{
    s = explode(s);
    forall (c, i in s)
    {
        if (c >= 'A' && c <= 'Z')
            s[i] += $('a' - 'A');
    }
    return implode(s);
}

while (l = getline())
{
    forall (w in smash(l, #\w+#, "\\&"))
    {
        if (w ~ #[A-Z]#)
            w = tolower(w);
        if (counts[w] == NULL)
            counts[w] = 1;
        else
            ++counts[w];
    }
}

out = array();
forall (c, w in counts)
    push(out, sprintf("%7d\t%s\n", c, w));
sort(out, [func(a, b){return a > b ? -1 : a < b;}]);
put(implode(out));