struct
or Pascal's
RECORD
have been presented and discussed in the Forth
community. One of them was posted on the
USENET newsgroup comp.lang.forth in
1989 (unfortunately I don't remember, by whom; possibly John Hayes)
and convinced me with its simplicity, elegance, and power (in a word,
with its Forth-ness).
I have used this basic approach ever since, e.g, in the parser generator Gray [ertl89][ertl97]. It also inspired my approach for an object-oriented Forth extension. The package I present here adds automatic handling of alignments, a bit of syntactic sugar, and optimization of fields with offset 0.
float+
float+ cell+ faligned
Moreover, it is not very readable, because seeing a
cell+
tells us neither which kind of structure is
accessed nor what field is accessed; we have to somehow infer the kind
of structure, and then look up in the documentation, which field of
that structure corresponds to that offset.
Finally, this kind of address arithmetic also causes maintenance troubles: If you add or delete a field somewhere in the middle of the structure, you have to find and change all computations for the fields afterwards.
So, instead of using cell+
and friends directly, how
about storing the offsets in constants:
0 constant a-offset 0 float+ constant b-offset 0 float+ cell+ faligned constant c-offsetNow we can get the address of field
x
with x-offset
+
. This is much better in all respects. Of course, you still
have to change all later offset definitions if you add a field. You can
fix this by declaring the offsets in the following way:
0 constant a-offset a-offset float+ constant b-offset b-offset cell+ faligned constant c-offsetSince we always use the offsets with
+
, using a defining
word cfield
that includes the +
in the
action of the defined word offers itself:
: cfield ( n "name" -- ) create , does> ( name execution: addr1 -- addr2 ) @ + ; 0 cfield a 0 a float+ cfield b 0 b cell+ faligned cfield cInstead of
x-offset +
, we now simply write x
.
The structure field words now can be used quite nicely. However, their definition is still a bit cumbersome: We have to repeat the name, the information about size and alignment is distributed before and after the field definitions etc. The structure package presented here addresses these problems.
struct cell% field list-next end-struct list%With the address of the list node on the stack, you can compute the address of the field that contains the address of the next node with
list-next
. E.g., you can determine the length of a list
with:
: list-length ( list -- n ) \ "list" is a pointer to the first element of a linked list \ "n" is the length of the list 0 begin ( list1 n1 ) over while ( list1 n1 ) 1+ swap list-next @ swap repeat nip ;
You can reserve memory for a list node in the dictionary with
list% %allot
, which leaves the address of the list node
on the stack. For the equivalent allocation on the heap you can use
list% %alloc
(or, for an allocate
-like stack
effect (i.e., with ior), use list% %allocate
).
Note that in ANS Forth the body of a create
d word is
aligned
but not necessarily faligned
;
therefore, if you do a
create name foo% %allotthen the memory alloted for
foo%
is
guaranteed to start at the body of name
only if
foo%
contains only character, cell and double fields.
You can also include a structure foo%
as field of
another structure, with:
struct ... foo% field ... ... end-struct ...Instead of starting with an empty structure, you can also extend an existing structure. E.g., a plain linked list without data, as defined above, is hardly useful; You can extend it to a linked list of integers, like this:
list% cell% field intlist-int end-struct intlist%
intlist%
is a structure with two fields:
list-next
and intlist-int
.
This feature is also known as extended records [wirth88]. It is the main innovation in the Oberon language; in other words, adding this feature to Modula-2 led Wirth to create a new language, write a new compiler etc. Adding this feature to Forth just requires a few lines of code.
You can specify an array type containing n elements of
type foo%
like this:
foo% n *You can use this array type in any place where you can use a normal type, e.g., when defining a
field
, or with
%allot
.
The first field is at the base address of a structure and the word
for this field (e.g., list-next
) actually does not change
the address on the stack. You may be tempted to leave it away in the
interest of run-time and space efficiency. This is not necessary,
because the structure package optimizes this case and compiling such
words does not generate any code. So, in the interest of readability
and maintainability you should include the word for the field when
accessing the field.
counter
field. The structure names
that come to (my) mind are often also the logical choice for the names
of words that create such a structure.
Therefore, I have adopted the following naming conventions:
struct-field
, where
struct
is the basic name of the structure, and
field
is the basic name of the field. You can
think about field words as converting converts the (address of the)
structure into the (address of the) field.
struct%
, where
struct
is the basic name of the structure.
intlist-int
, but has list-next
, not
intlist-next
.
The type description on the stack is of the form align size. Keeping the size on the top-of-stack makes dealing with arrays very simple.
field
is a defining word that uses create
and does>
. The body of the field contains the offset
of the field, and the normal does>
action is
@ +i.e., add the offset to the address, giving the stack effect
addr1 -- addr2
for a field.
This simple structure is slightly complicated by the optimization
for fields with offset 0, which requires a different
does>
-part (because we cannot rely on there being
something on the stack if such a field is invoked during
compilation). Therefore, we put the different does>
-parts
in separate words, and decide which one to invoke based on the
offset. For a zero offset, the field is basically a noop; it is
immediate, and therefore no code is generated when it is compiled.
%align ( align size -- )
align
.
%alignment ( align size -- align )
%alloc ( align size -- addr )
size
address units with alignment
align
, giving a data block at addr
;
throw
s an ior code if not successful.
%allocate ( align size -- addr ior )
size
address units with alignment align
, similar to allocate
.
%allot ( align size -- addr )
size
address units of data space with alignment
align
; the resulting block of data is found at addr
.
cell% ( -- align size )
char% ( -- align size )
create-field ( align1 offset1 align size "name" -- align2 offset2 )
name
execution: -- addr
like field
, but without the does>
-part.
dfloat% ( -- align size )
double% ( -- align size )
end-struct ( align size "name" -- )
name
execution: -- align size2
size2
is size
aligned with
align
; this ensures that all elements of an array of
name
elements have alignment align
.
field ( align1 offset1 align size "name" -- align2 offset2 )
name
execution: addr1 -- addr1+offset1
create a field name
with offset offset1
,
and the type given by size align
. offset2
is
the offset of the next field, and align2
is the alignment
of all fields.
float% ( -- align size )
nalign ( addr1 n -- addr2 )
addr2
is the aligned version of addr1
wrt the alignment n
.
%size ( align size -- size )
sfloat% ( -- align size )
struct ( -- align size )