CHAPTER 3 A brief tutorial for C programmers

This chapter is intended as a quick tutorial introduction to ICI for programmers familiar with C or C++. It does not dwell on formal definitions and exceptions. For precise definitions, see the next chapter: ICI Language Reference . Because ICI's syntax and flow control constructs are based on those of C, a C programmer has a particular advantage in learning to use ICI. This tutorial will take advantage of that and move quickly through areas that are unsuprising to C programmers.

This tutorial will also occasionally allude to how things work inside the interpreter as, to a programmer, this can aid comprehension and give an idea of the implications of using certain constructs.

Variables and arithmetic

Because ICI is a dynamically typed language, the nature of a variable is of course different from those of C. But for typical arithmetic the differences are invisible. All ICI variables refer to storage that records both the type and data of the variable's current value . Thus we can say:

x = 1;

which makes x refer to the integer one. Then

x = 1.0;

which updates x to refer to the float one. Then

x = "one";

which updates x to refer to the string "one".

As a C programmer, you can consider all ICI variables to be typless pointers to objects that record both type and value. But because ICI's built in operators know this is the case, they read and generate the pointed-to values automatically. Thus ordinary arithmetic is unsurprising:

fahr = 100.0;
celsius = (5.0 / 9.0) * (fahr - 32.0);
printf("%g deg. F is %g deg. C\n", fahr, celsius);

works as one would expect. So most of the time you don't need to consider this at all. All objects in ICI are subject to automatic garbage collection, so no explicit freeing is required.

Because ICI variable are dynamically typed, you don't need to declare them. But ICI supports hierarchical modularisation and it is often desirable to declare at what scope a variable lives. Thus we have:

extern xxx; /* xxx is visible to other files. */
static sss; /* sss is visible in this file. */
auto aaa;    /* aaa is visible in the local scope. */

The word static is used in the C sense of the value being persistent. This variable will exist with persistent value as long as functions in this module are still callable. Extern variables are also persistent, they just have more global scope. Consider:

static
func(arg)
{
    auto  local;

    local = 10;
    for (i = 0; i < local; ++i)
        printf("%d\n", i * arg);
}

This function (which is declared static) has an auto variable. Auto variables are, as in C, the variables that spring into existence (on the stack) for the duration of a single execution of a function. The function also uses the variable i . If an undeclared variable is assigned to, it is implicitly declared auto. That can be dangerous in large programs with many variables of more global scope that may already exist, so as a style rule, implicit autos are normally kept to one or two characters, and more global variables should not be.

Auto variables, and their implicit declaration, also work at the file level. They have a similar (in a sense) semantics. While the file is being parsed, they exist. But they evaporate afterwards. They are not visible to functions defined within the file. We used implicit auto variables in our fahrenheit to celsius conversion above.

Lexicon, syntax and flow control statements

ICI's lexicon is (basically) the same as C's. Same tokens, comments (including //) and literal data values. Sorry, no preprocessor.

ICI's syntax is, wherever possible, the same as C's. Naturally differences arise due to the different nature of the environment, as we have seen above.

As we have seen, expressions are as in C. There are of course additional data types, literals, and operators, but these build from the initial C compatible set.

The flow control constructs if-else, while, for, do-while, switch (including case and default), continue, break and return all have the same basic syntax and semantics as C. But there is no goto.

In addition to these classic C statements forms, ICI has forall, try-onerror, waitfor, and critsect. But before considering these, we will look at aggregate data types and the nature of objects, which is the one aspect a C programmer needs to understand before writing effective ICI code.

Aggregate data types and the nature of objects

ICI supports a number of "aggregate" data types. Principly:

array: Simple ordered collections of values that can be indexed by integers. The first item is at index 0. They can be efficiently pushed and popped at either end.
struct: Mappings from an index (any object) to a value. Also known as associative arrays, dictionaries, maps, hashes, etc in other languages. Adding entries, lookup and deletion are all efficient operations irrespective of the complexity of the objects involved.
set: Simple unordered collections of values.

Each of these hold a collection of references to other objects. There is a significant distinction between these aggergate types and the simple types such as int, float, and string. These simple types have no modifiable internal structure. They are read-only. In fact, when an object of one of these types is required (say as the result of some arithmetic operation) it is looked for in a hash table of all such objects, and the entry found there is used. It is created and added if it does not exist.

Thus we can see that all strings "xyz" in an ICI program are just pointers to the same single object in memory. The same is true for integers (which are 32 bit signed values) and floats (which are double precision values). An object that has been resolved to its single unique (read-only) version is said to be atomic.1

Aggregates, on the other hand, are internally modifiable in-place.

In ICI, "indexing" an aggregate is the most primitive way of accessing internal elements. But we use the term indexing in a more general sense than simple array indexing. For example, array indexing is unsurprising, so:

a[0] = 3;

sets the first element of an array a to 3. With a sruct s we might say:

s.value = 3;

which sets the value field of the struct to 3. But this is just an "indexing" operation on the struct. In fact it is just a syntactic variation on:

s["value"] = 3;

Arrays, structs and sets are all objects that support indexing to refer to internal values (i.e. object references) for read or write. Each varies only in how they are structured internally, and how they interpret the "key", or index, applied to them.

Arrays are growable circular buffers of object references that can only be indexed by integers, which are interpreted as an offset from the first element.
Structs are hash tables that map one object reference to another. The index reference itself is the basis for indexing, not the details of the index object (that is, the indexing operation only looks at the index as a pointer, not at what it points to). But because ints, strings, floats, etc are already resolved to unique pointers based on their values, this behaviour is indistinguishable from full value hashing and comparison for simple (atomic) types.
Sets are hash tables that merely record the presence or absence of an object in the same manner as structs, but they have no associated value. Although they have an "implicit" value of 1 if the object is in the set.

Arrays, structs and sets all return the special object NULL if the key is not in their current domain.

Making and manipulating aggregates

The simplest ways to make aggregates are the functions array(), set() and struct(). For example:

a = array(1, 2.5, "hello");
b = set("bye", 5.5, 9);
c = struct("a", 12, "b", 13);

The struct() function interprets its arguments pair-wise as key-value pairs. If, after executing the above code, we do:

printf("a[2] = %s\n", a[2]);
if (b[9])
    printf("The set b contains 9.\n");
printf("c.a = %d\n", c.a);

we will see:

a[2] = hello
The set b contains 9.
c.a = 12

It is equally common to see these functions used to make empty aggregates that are then added to through further code. For example:

things = array();
while ((thing = get_next_thing()) != NULL)
	push(things, thing);

Or:

node = struct();
node.name = name;
node.left = a;
node.right = b;

Literal data items

ICI supports in-line literal aggregates. That is, like an initialised structure in C, but instead of being tied to a variable declaration, they are self-describing, and can be used anywhere. For example:

[array 1, 2, 3]

is a term in an expression. Just like a literal string in C:

"Hello world.\n"

the compiler builds the data structure in memory somewhere and the term evaluates to a reference to it. Examples of array, set, and struct literals are:

a = [array 1, 2.5, "hello"];
b = [set "bye", 5.5, 9];
c = [struct a = 12, b = 13];

Arrays and sets have syntax almost identical to the run-time functions that create the same types. But structs have a more convinient syntax for the commonest activity; associating values with named keys.

Be careful not to confuse literals with the run-time functions of the same name. Confusion often arises because at the file level where a statement is parsed, then immediately executed, there isn't much effective difference. But in a loop or function there is a very big difference.

Other operations and core functions

Common to all dynamically typed interpreted languages, execution speed is very different from fully compiled statically typed languages. Achieving useful performance relies heavily on the use of operations and functions that perform the "inner loops" of a program, but are fully compiled and carefully optimised.

ICI is no exception to this principle. So it is wise to be aware of the full repertoire of operations, core functions, and extension modules available. However, in this brief tutorial we won't attempt to enumerate all such features. They are listed in subsequent chapters, and a skim through Operators in the ICI Language Reference chapter, the Core language functions, chapter, and Some extension modules is recommended. Having said that, a few of the commonest non-C features and idioms are worth illustrating here.

Regular expressions

Regular expressions are "simple" (atomic) types in ICI, just like ints, floats and strings. A literal regular expression is delimited by # characters (like a string is delimited by " characters). For example:

while ((line = getline()) != NULL)
{
    if (line ~ #^abc#)
        printf("%s\n", line);
}

will print all lines starting with abc. The ~ operator is read as "matches" and !~ is read as "doesn't match". Other operators exist which extract sub-matches. Regular expressions can be very useful for avoiding character-by-character operations on text. They are a very efficient way of matching and breaking up text.

For example, one of my first resorts in dealing with some new regular text file is to load the entire file, use a function called smash() to break it up into lexical units based on regular expressions, then rearrange the result into the data I want. Consider doing this to load a "CSV" file (Comma Separated Fields - each line is comma separated fields, each field optionally surrounded by double quotes).

/*
 * Smash the file into fields and separators. Each
 * seperator is either a "," or a "\n". Fields are
 * either plain or quoted, but the quotes
 * are removed. Notice the regular expression is
 * broken into two parts for clarity.
 */
csv = smash
(
    getfile(f),                        /* The file. */
    #(([^,"\n]*)|"([^"\n]*)")#         /* ... or "..." */
        #([,\n])#,                     /* then , or \n */
    "\\2\\3",                          /* For each.. */
    "\\4"                              /* ..push these*/
);
/*
 * Re-build the linear array into an array of arrays
 * based on the "\n" seperators.
 */
a = array(aa = array());
while (nels(csv) > 0)
{
    push(aa, rpop(csv));
    if (rpop(csv) == "\n")
        push(a, (aa = array()));
}

1. This type of mechanism is typical for dynamically typed interpretive langauges such as ICI. Although it is less common to apply it to uniformly for all data types of the language, even numbers.