-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME
293 lines (233 loc) · 12.9 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
Copyright 2012, Robert Bieber
col is a simple programming language, inspired by John Backus' FP language,
although not strictly adhering to it. Currently the interpreter only supports
file execution, a full REPL environment will come later.
-------------------
BUILDING COL
-------------------
To build the col interpreter, extract the project files into a directory and
cd to it. Then run the following commands:
mkdir build
cd build
cmake ..
make
You can install col with the command:
make install
If you modify any of the primitive functions or functional forms in
primitives.h/c and forms.h/c, you will also need to run the script
utils/gendoc.py
Be sure to execute this script from the project root directory. It will update
auto-generated function pointer tables in the src/gen directory as well as the
PRIMITIVE_FUNCTIONS and FUNCTIONAL_FORMS documentation files in the root
directory.
-------------------
RUNNING COL
-------------------
To execute a file, simply run
colint [-v] <source file> [optional command line arguments]
and the interpreter will load the code in program.col and execute its main
function. The main function is called with any command-line arguments passed
to it as a sequence of strings (see language reference for details). If the -v
flag is passed, the interpreter will print debugging information before and
after running the program.
-------------------
LANGUAGE REFERENCE
-------------------
1 - Constants and Identifiers
Allowable constants in col are numbers, characters, strings, bottom,
true or false, and sequences.
A constant number is any integer or floating point number, with an optional
+/- in front of it. Numbers may also be specified in scientific notation.
Examples of valid numerical constants:
5
5.2
+5.2
-5.2
0.2
.2
5.2e12
5.2E12
A character is any character enclosed in single quote marks. Alternatively,
some two-character escape sequences are allowed using the backslash. Valid
escape sequences are listed below:
\\ - Backslash
\' - Single quote
\n - Newline
\t - Tab character
A string is any series of characters enclosed in double quote marks. Strings
also support two-character escape sequences using the backslash. All of the
escape sequences used in characters are supported with the exception of the
single quote, which need not be escaped in a string. Instead, the double
quote may be escaped in strings. Valid escape sequences are listed below:
\\ - Backslash
\" - Double quote
\n - Newline
\t - Tab character
Bottom represents an undefined value. Sequences and functions in col are
bottom-preserving. This means that any sequence which contains bottom as an
element is equivalent to bottom, and any function that accepts bottom as an
argument will return bottom as its result. Bottom is simply written in
lowercase text as follows:
bottom
Logical true and false are simply written in lowercase text, as follows:
true
false
A sequence is any series of valid constants surrounded by angle brackets and
separated by commas. A sequence may be any length, and its elements need not
be of the same type. Examples of valid sequence constants:
<1, 2, 3>
<"Sequence", "of", "strings">
<"Nested", <"Sequence", true>>
<>
An identifier is any combination of lower or uppercase letters, numbers, and
the symbols +, -, *, /, !, _, and -, with some caveats. An identifier must
not be of a form that could be read as a number: any text which is a valid
numerical constant will be read as one, and never as an identifier.
Identifiers also cannot use the reserved words bottom, true, or false, and
using the name of a built-in function or functional form as an identifier will
cause an error. Examples of valid identifiers:
my-function
my_function
1+
!
25!
In the interest of readability, it is advisable to avoid using identifiers
that might be too easily mistaken for numbers, but any combination of the
allowable characters that cannot be read as a number will be read as a valid
identifier.
2 - Comments
The # symbol marks the beginning of a comment. The lexer will disregard any
text between the # and the following newline.
3 - Data Types
Types are never written explicitly in a col program, but col is a strongly
typed language, and primitive functions will normally only behave correctly
when operating on a limited number of types. To represent the listed types as
constants, see section 1.
Integers: An integer is a positive or negative whole number. They are
represented internally using C's int type, which will define the limitations
on their size.
Floating Point Numbers: Floating point numbers are represented internally
using C's double type, with corresponding limitations. When reading numeric
constants, the interpreter will regard any number with a decimal point or
using scientific notation as floating point.
Characters: ASCII characters, represented internally by C's char type.
Strings: Any series of characters. Note that strings are not equivalent to
sequences of characters.
Boolean Values: Either true or false. Note that false is not equivalent to
bottom.
Bottom: Bottom represents an undefined value. All functions and sequences in
col are bottom-preserving, meaning that any function which receives bottom as
an argument will return it as a result, and any sequence containing bottom is
equivalent to bottom. Bottom is commonly returned as a result of failed
computation.
Sequence: A sequence is an ordered collection of values. The elements of a
sequence may be of any type, including sequences. The elements may also be
heterogeneous. The empty sequence must always be written "<>", there is no
special symbol for it, and it is distinct from logical false and bottom.
4 - Functions
Every function in col accepts a single argument and produces a single return
value. col functions, with the exception of I/O operations, cannot (at
present) have any side-effects. Aside from I/O, every col function will
always maintain referential transparency.
col functions are always defined in terms of other functions, combined using
functional forms (defined below). The language provides a set of primitive
functions, and more complex functions may be constructed from them.
col functions do not have type signatures. A function may accept any value as
an argument, and some primitive functions may accept many possible input types
and return multiple corresponding output types. If a function receives as
input a value that it cannot logically handle, it will return bottom.
Functions are normally represented simply by their identifier, but some
primitive functions may also accept arguments at load-time (think of these as
similar to templates in C++ or generics in Java) which alter their behavior.
These are called specializers, and they must be constant values. As an
example, consider the function const, which always returns the same value
regardless of its input (unless its input is bottom, of course). Because
including a separate const function for every possible constant value would be
impractical, the const function accepts a specializer which specifies the
constant that it should return. For instance, the primitive function written
const(15)
will always return the value 15, regardless of its input. At this time, only
primitive functions can accept specializers.
5 - Functional Forms
Functional forms are the tools with which functions are combined to create
more complex functions. A functional form accepts one or more functions as
arguments, and creates a function that differs from the original(s) in some
useful way. Take, for instance, the compose functional form. It accepts
two or more functions and creates a new function which accepts an argument,
feeds it to the last function in the argument list, feeds that function's
result to the next-to-last function in the list as its argument, and so on
until it reaches the first function in the list, whose argument becomes the
return value of the combined function. Consider the following function
definition (more on function definitions later):
inc-then-print = compose{ print, str, 1+ }
The primitive function 1+ increments a numeric value by one, the str function
converts it to a string, and the print function writes a string out to the
screen. The result of combining the three with the compose form is a function
which first increments the value passed to it and then prints the incremented
value to the screen.
6 - Function Definitions
A program in col is written as a series of function definitions. The
interpreter reads the input file, loads the definition, and then runs the
function called main (which must be present). It passes as arguments to main
a sequence of strings containing the command-line arguments given to the
interpreter, or the empty sequence if none were present.
An actual function definition is written as a valid identifier followed by the
assignment operator, '=', followed by either the name of a function to
duplicate or, more likely, a functional form combining one or more existing
functions into a new, more useful function. Functional forms may be nested,
so a function definition can be as complex as necessary to achieve the
desired functionality. As an example, consider the following function
definition for double, a function which accepts a numerical value and doubles
it.
double = compose{ *, construct{ id, const(2) }}
We have already defined the compose functional form and the const primitive
function. The id primitive function simply returns its input as its output.
The construct functional form accepts one or more functions and creates a
function which creates a sequence from the results of applying each of its
argument functions to its input. The * primitive function accepts a sequence
of numerical values and multiplies them together.
To understand the operation of this function, consider its application to the
number 3. The compose functional form will first pass this value to its
last argument, which is
construct{ id, const(2)}
The construct function will in turn pass that value to each of its argument
functions, and return a sequence from the results. The id function always
returns its argument, so it will return 3. The function const(2) always
returns the number 2, so it will return 2. The construct form combines them
into the sequence
<3, 2>
The compose form will then take this result and pass it to its next function
argument to the left, in this case *. * accepts the sequence and multiplies
its elements together, returning 6. Because * is the first function in
compose's argument list, it is returned as the result of the combined
function. The end result is that the number 3 has been doubled to yield 6.
The arguments to functional forms can be either primitive or user-defined
functions, so more complicated functions can be broken down into more simple
sub-functions, and existing functions can be reused in more complicated ones.
Because all function definitions are loaded before executing the main
function, their order in the source file is inconsequential. If there are any
references to functions that don't exist, an error will be triggered at
runtime.
7 - Grammar
What follows is the grammar of the col language in Backus-Naur Form (BNF).
This should match the textual description given in section 6. The
<identifier> and <value> nonterminals are lexed according to the
descriptions given in section 1. Comments are not included in the grammar,
but discarded by the lexer before parsing.
program ::= <function-definition>
| <function-definition> <program>
<function-definition> ::= <identifier> "=" <function-expression>
<function-expression> ::= <identifier>
| <identifier> "(" <constant-arguments> ")"
| <identifier> "{" <function-arguments> "}"
<constant-arguments> ::= <constant>
| <constant> "," <constant-arguments>
<function-arguments> ::= <function-expression>
| <function-expression> "," <function-arguments>
<constant> ::= <value>
| "<" <constant-arguments> ">"
8 - Further Reference
All of the available primitive functions and functional forms are documented
in the files FUNCTIONAL_FORMS and PRIMITIVE_FUNCTIONS in the project root
directory. They are listed in alphabetical order with a definition and
description of each provided.