# Yet another Forth structures package

Many ways to add a feature like C's `struct` or Pascal's `RECORD` have been presented and discussed in the Forth community. One of them was posted on the USENET newsgroup comp.lang.forth in 1989 (unfortunately I don't remember, by whom; possibly John Hayes) and convinced me with its simplicity, elegance, and power (in a word, with its Forth-ness).

I have used this basic approach ever since, e.g, in the parser generator Gray [ertl89][ertl97]. It also inspired my approach for an object-oriented Forth extension. The package I present here adds automatic handling of alignments, a bit of syntactic sugar, and optimization of fields with offset 0.

## Why explicit structure support?

If we want to use a structure containing several fields, we could simply reserve memory for it, and access the fields using address arithmetic. As an example, consider a structure with the following fields
a
is a float
b
is a cell
c
is a float
Given the (float-aligned) base address of the structure we get the address of the field
a
without doing anything further.
b
with `float+`
c
with `float+ cell+ faligned`
It is easy to see that this can become quite tiring.

Moreover, it is not very readable, because seeing a `cell+` tells us neither which kind of structure is accessed nor what field is accessed; we have to somehow infer the kind of structure, and then look up in the documentation, which field of that structure corresponds to that offset.

Finally, this kind of address arithmetic also causes maintenance troubles: If you add or delete a field somewhere in the middle of the structure, you have to find and change all computations for the fields afterwards.

So, instead of using `cell+` and friends directly, how about storing the offsets in constants:

```0 constant a-offset
0 float+ constant b-offset
0 float+ cell+ faligned constant c-offset
```
Now we can get the address of field `x` with ```x-offset +```. This is much better in all respects. Of course, you still have to change all later offset definitions if you add a field. You can fix this by declaring the offsets in the following way:
```0 constant a-offset
a-offset float+ constant b-offset
b-offset cell+ faligned constant c-offset
```
Since we always use the offsets with `+`, using a defining word `cfield` that includes the `+` in the action of the defined word offers itself:
```: cfield ( n "name" -- )
create ,
@ + ;

0 cfield a
0 a float+ cfield b
0 b cell+ faligned cfield c
```
Instead of `x-offset +`, we now simply write `x`.

The structure field words now can be used quite nicely. However, their definition is still a bit cumbersome: We have to repeat the name, the information about size and alignment is distributed before and after the field definitions etc. The structure package presented here addresses these problems.

## Usage

You can define a structure for a (data-less) linked list with
```struct
cell% field list-next
end-struct list%
```
With the address of the list node on the stack, you can compute the address of the field that contains the address of the next node with `list-next`. E.g., you can determine the length of a list with:
```: list-length ( list -- n )
\ "list" is a pointer to the first element of a linked list
\ "n" is the length of the list
0 begin ( list1 n1 )
over
while ( list1 n1 )
1+ swap list-next @ swap
repeat
nip ;
```

You can reserve memory for a list node in the dictionary with `list% %allot`, which leaves the address of the list node on the stack. For the equivalent allocation on the heap you can use `list% %alloc` (or, for an `allocate`-like stack effect (i.e., with ior), use `list% %allocate`).

Note that in ANS Forth the body of a `create`d word is `aligned` but not necessarily `faligned`; therefore, if you do a

```create name foo% %allot
```
then the memory alloted for `foo%` is guaranteed to start at the body of `name` only if `foo%` contains only character, cell and double fields.

You can also include a structure `foo%` as field of another structure, with:

```struct
...
foo% field ...
...
end-struct ...
```
Instead of starting with an empty structure, you can also extend an existing structure. E.g., a plain linked list without data, as defined above, is hardly useful; You can extend it to a linked list of integers, like this:
```list%
cell% field intlist-int
end-struct intlist%
```
`intlist%` is a structure with two fields: `list-next` and `intlist-int`.

This feature is also known as extended records [wirth88]. It is the main innovation in the Oberon language; in other words, adding this feature to Modula-2 led Wirth to create a new language, write a new compiler etc. Adding this feature to Forth just requires a few lines of code.

You can specify an array type containing n elements of type `foo%` like this:

```foo% n *
```
You can use this array type in any place where you can use a normal type, e.g., when defining a `field`, or with `%allot`.

The first field is at the base address of a structure and the word for this field (e.g., `list-next`) actually does not change the address on the stack. You may be tempted to leave it away in the interest of run-time and space efficiency. This is not necessary, because the structure package optimizes this case and compiling such words does not generate any code. So, in the interest of readability and maintainability you should include the word for the field when accessing the field.

## Naming convention

The field names that come to (my) mind are often quite generic, and, if used, would cause frequent name clashes. E.g., many structures probably contain a `counter` field. The structure names that come to (my) mind are often also the logical choice for the names of words that create such a structure.

Therefore, I have adopted the following naming conventions:

• The names of fields are of the form `struct-field`, where `struct` is the basic name of the structure, and `field` is the basic name of the field. You can think about field words as converting converts the (address of the) structure into the (address of the) field.
• The names of structures are of the form `struct%`, where `struct` is the basic name of the structure.
This naming convention does not work that well for fields of extended structures; e.g., the integer list structure has a field `intlist-int`, but has `list-next`, not `intlist-next`.

## Implementation

The central idea in the implementation is to pass the data about the structure being built on the stack, not in some global variable. Everything else falls into place naturally once this design decision is made.

The type description on the stack is of the form align size. Keeping the size on the top-of-stack makes dealing with arrays very simple.

`field` is a defining word that uses `create` and `does>`. The body of the field contains the offset of the field, and the normal `does>` action is

```@ +
```
i.e., add the offset to the address, giving the stack effect `addr1 -- addr2` for a field.

This simple structure is slightly complicated by the optimization for fields with offset 0, which requires a different `does>`-part (because we cannot rely on there being something on the stack if such a field is invoked during compilation). Therefore, we put the different `does>`-parts in separate words, and decide which one to invoke based on the offset. For a zero offset, the field is basically a noop; it is immediate, and therefore no code is generated when it is compiled.

## Glossary

`%align ( align size -- )`
align the data space pointer to the alignment `align`.
`%alignment ( align size -- align )`
extract the alignment of a type descriptor.
`%alloc ( align size -- addr )`
allocate `size` address units with alignment `align`, giving a data block at `addr`; `throw`s an ior code if not successful.
`%allocate ( align size -- addr ior )`
allocate `size` address units with alignment `align`, similar to `allocate`.
`%allot ( align size -- addr )`
allot `size` address units of data space with alignment `align`; the resulting block of data is found at `addr`.
`cell% ( -- align size )`
`char% ( -- align size )`
`create-field ( align1 offset1 align size "name" -- align2 offset2 )`
`name` execution: `-- addr`

like `field`, but without the `does>`-part.

`dfloat% ( -- align size )`
`double% ( -- align size )`
`end-struct ( align size "name" -- )`
`name` execution: `-- align size2`

`size2` is `size` aligned with `align`; this ensures that all elements of an array of `name` elements have alignment `align`.

`field ( align1 offset1 align size "name" -- align2 offset2 )`
`name` execution: `addr1 -- addr1+offset1`

create a field `name` with offset `offset1`, and the type given by `size align`. `offset2` is the offset of the next field, and `align2` is the alignment of all fields.

`float% ( -- align size )`
`nalign ( addr1 n -- addr2 )`
`addr2` is the aligned version of `addr1` wrt the alignment `n`.
`%size ( align size -- size )`
extract the size of a type descriptor.
`sfloat% ( -- align size )`
`struct ( -- align size )`
an empty structure, used to start a structure definition.

## References

[ertl89] M. Anton Ertl.
GRAY - ein Generator f\"ur rekursiv absteigende Ybersetzer. Praktikum, Institut f\"ur Computersprachen, Technische Universit\"at Wien, 1989. In German.
[ertl97] M. Anton Ertl.
GRAY - ein Generator f\"ur rekursiv absteigende Ybersetzer. In Forth-Tagung, Ludwigshafen, 1997. In German.
[wirth88] Niklaus Wirth.
From Modula to Oberon. Software--Practice and Experience, 18 no. 7 pp. 661-670, July 1988.