title |
---|
Casperlang |
Properties of casper:
- a pure functional language, inspired by Haskell
- a focus on simplicity, with a minimal number of language concepts
;
and=
are operators- pattern matching instead of traditional typing
- dynamic dispatch, allowing OOP-style patterns without explicit language support
- indents are only significant for statements (function defintions and
import
) - JSON is valid syntax
This page acts as a reference document for the concrete compiler implementation (wip).
In the REPL you can type:
echo "Hello world"
echo
is a low-level builtin function and takes a single String
value as an argument.
Functions are called without parentheses. The arguments are separated by whitespace.
An example of a hello world script:
# main.cas (comments start with the # symbol)
main = echo "Hello world"
Here main
is a function which takes no arguments.
Note that the echo
function actually returns an IO
object, which acts as an OS action.
Literal strings are denoted by "..."
(double quotes). Single quotes and backticks aren't used for literal strings (and could be made available as operators).
Literal strings support expression substitution. Stringable expressions can be inserted into a literal string using ${...}
.
The \
character is used for escaping:
- newline character
\n
- tab character
\t
- the
\
character itself - the
$
character - the
"
character
Contrary to many other languages, a String
in casper isn't treated as a list of char-like integers. Strings are treated as primitives in their own right.
casper has some builtin functions to be able to manipulate strings.
Length:
len "hello world" # 11
Concatenation:
"hello " + "world"
Getting a character:
echo "hello world".0 # "h"
echo "hello world".0. 0 # also "h"
# without the whitespace we would be calling . "hello world" Float
Convert to a list of integers:
toInts "hello world" # [104,101,108,108,111,32,119,111,114,108,100]
Convert a list of integers to a string:
toString [104,101,108,108,111,32,119,111,114,108,100] # "hello world"
casper has builtin support for integers and floating-point numbers (Int
and Float
respectively).
Int
represents an underlying 64-bit signed integer.
Literal integers can be written as follows:
1
0b1 # binary
0x01 # hex
0o1 # octal
Float
represents an underlying 64-bit double precision floating-point number.
Literal floats can be written as follows:
1.0
1e0
0.1e1
0.01e+2
10.0e-1
1e-0
The .
by itself is used as an operator, and for convenience the floating part of literal floats can't start or end with a .
(in those cases the .
would be interpreted as an operator symbol, and the number part would end up as an Int
).
The scientific notation exponent uses an e
but not an E
(nowadays E
seems to be less common anyway).
Many core mathematical functions are available as builtin functions (+
, -
, *
, /
, sqrt
, pow
, exp
, log
etc.)
A Bool
value in casper is a builtin enum-like type. A Bool
value can be instantiated using the True
or False
constructor functions.
Bool
values are returned by comparison functions >
, >=
, <
, <=
, ==
, !=
. The &&
operator should be used for logical and, the ||
operator should be used for logical or.
Literal lists are denoted by [...]
. The list items are comma separated expressions.
Due to immutability a list can also be handled as a tuple (useful in pattern-matching).
Length:
len [1,2,3,4] # 4
Concatenation:
[1,2] + [3,4]
Reduction:
fold \($+$) 0 [1,2,3,4] # sum==10
Indexing:
[1,2,3,4].0 # 1
[1,2,3,4].(-1) # 4
Map:
map \($*2) [1,2,3,4] # [2,4,6,8]
Sorting:
sort \($<$) [3,2,4,1] # [1,2,3,4]
Literal dicts are denoted by {...}
. The dict items are comma separated key:value
pairs.
Keys can be words or literal strings, and always end up as strings. Keys can't be expressions. Dict values can be arbitrary expressions.
The colon, separating key from value, is parsed as a context specific symbol, and not an operator, so dict values can contain other colons as actual operators.
Dicts in casper preserve the order of their entries.
Length:
len {qty: 100, price: 0.9} # 2
Merging:
{qty: 100} + {price: 0.9}
There is a builtin if
function. It's first argument is a Bool
value, it's second argument is a value returned upon a True
condition, it's last argument is the else condition value:
max a b = if (a>b) a b
Functions can be defined at file scope with function statements.
The most basic function is a function that doesn't take any arguments:
greeting = "hello world"
casper isn't really a typed language and you can't specify the return type of a function. Function arguments support pattern-matching though:
# no pattern-matching on argument
greeting name = "hello " + name
#
# constructor pattern-matching on argument
greeting name::String = "hello " + name
Pattern-matching can be nameless:
# convert a Bool into a String
show True = "true"
show False = "false"
The function definition body is always a single expression. This expression doesn't necessarily need to appear on the same line as the =
symbol though. It can appear indented on the next line:
# returns an IO object
main =
echo "hello world"
The function defintion body ends when all groups are matched and the indent-level of the next line is the same as the indent-level of the function header.
Multiple indented expressions are chained together using the ;
operator:
main =
# syntactic sugar for (echo "message 1\n"; echo "message 2\n"):
echo "message 1\n";
echo "message 2\n"; # final semicolon is optional (ignored by compiler)
Note there is no automatic semicolon insertion in casper (read more)
Consider the following simple program:
# greeter program
main =
echo "what is your name?";
name = readLine;
echo "hello " + name
The =
symbol is treated as a ternary operator at the same precedence level as the ;
symbol. Both are right-to-left assiociative. So this simple program is actually syntactic sugar for:
# pseudo code!
(echo "what is your name?"); (= (readLine) \(name = $; echo "hello " + name))
Note that the readLine
function is an asynchronous OS function that calls a callback upon completion. readLine
's callback is everything following the assignment expression.
Now take a look at this example:
calcDistance x0::Float y0::Float x1::Float y1::Float =
dx = x1 - x0;
dy = y1 - y0;
sqrt dx*dx + dy*dy
There is no IO, but we are using intermediate variables for better readability/DRYness. calcDistance
's body is syntactic sugar for:
calcDistance x0::Float y0::Float x1::Float y1::Float =
# pseudo code!
(= (x1 - x0) \(dx = $; = (y1 - y0) \(dy = $; sqrt dx*dx+dy*dy)))
The =
operator here is simply defined as:
= a fn::\1 = fn a
You might've noticed anonymous functions several times above. They are denoted by \(...)
, and contain $
's where arguments are substituted.
An anonymous function that adds two numbers, concatenates two strings/lists, merges two dicts, etc.:
\($ + $)
The arguments can be numbered:
\($1+$2) # all (or none) must be numbered
Because the types of the arguments of an anonymous function are unknown, nested pattern-matching of an anonymous function's arguments isn't possible. We can only pattern-match the number of arguments. \1
matches an anonymous function that takes 1 argument, \2
matches 2 arguments, etc. There is no \0
anonymous function and there is no pattern for an arbitrary number of arguments.
An anonymous function can reference external variables (i.e. can be a closure).
Functions that start with capital letters are constructors. Each constructor can only be defined once (i.e. constructors can't be overloaded).
Constructors add tags to their return values, and values can have a hierarchy of attached tags depending on how constructor calls are nested. Constructor tags can be used for pattern-matching.
There are some primitive tags that don't have an associated constructor:
Int
Float
String
[]
{}
In order to construct a value with any of these primitive tags you must use literals or builtin readers/converters.
Any
is a special builtin constructor. All values have Any
as a top-level tag.
The builtin definition of Bool
might look like:
# Note it is legal but nonsensical to call Bool directly
Bool = Any
True = Bool
False = Bool
Constructor arguments can be used in pattern-matching:
# the arguments don't need to be used in the rhs
Pair Any Any = Any
#
# stringify a Pair
show (Pair a b) = (show a) + " " + (show b)
casper doesn't have support for interfaces or type classes. But something analogous can be achieved by defining panicking methods dispatched on a base type:
Number = Any
== Number Number = panic "not yet implemented"
+ Number Number = panic "not yet implemented"
- Number Number = panic "not yet implemented"
- Number = panic "not yet implemented"
* Number Number = panic "not yet implemented"
Stringable = Any
show Stringable = panic "not yet implemented"
Implementing multiple interfaces isn't (yet) possible as the needed type-union operator could be abused and result in unclear function dispatching (see diamond problem).
IO
actions are usually client-server style queries, which might be performed asynchronously. The result of an IO
function call is unwrapped using the =
operator. The result can be an Error
, Ok
, void, or some non-error data.
File
and Dir
, both inheriting from Path
, are wrappers for String
that dispatch filesystem-specific functions.
Overview of builtin IO
functions:
echo
, print to stdout, returnsIO <void>
readLine
, read a single line from the terminal stdin, returnsIO String
readArgs
, returns a list of command-line arguments, including the name of the script (but not the name of the parser), returnsIO ([] String)
read File
, returnsIO (Error String)
orIO String
write File
, returnsIO (Error String)
orIO Ok
send (HttpReq method::String url::String payload::String)
, performs an http request, returnsIO (Error String)
if the status isn't 200 or timeout etc., orIO data::String
Pattern-matching is a powerful alternative to overloading functions based on argument types.
The most basic pattern creates some named arguments. Argument variable names must start with lowercase letters:
add a b = a + b
Named pattern-matching can be nested (which is called destructuring):
# [a,b] and [c,d] are destructured tuples
dot [a,b] [c,d] = a*c + b*d
#
echo (show (dot [1,2] [3,4])) # 11
#
# with piping
dot [1,2] [3,4] | show | echo # 11
The _
name acts as sink and doesn't create a variable:
first [a,_] = a
second [_,b] = b
Note that instead of _
you could also use Any
:
first [a,Any] = a
# or
first [a,_::Any] = a
# or
first [a::Any,_::Any] = a
Tags can be matched without creating a variable, as shown in this example:
# card suit enum
Suit = Any
Club = Suit
Heart = Suit
Spade = Suit
Diamond = Suit
# eq
== Club Club = True
== Heart Heart = True
== Spade Spade = True
== Diamond Diamond = True
== Suit Suit = False
Sometimes we might want to match a tag and create a variable. The ::
operator is used for that:
# card constructor
Card Int Suit = Any
# return the winning card
max trump::Suit a::(Card ia sa) b::(Card ib sb) =
if (sb == sa) (
if (ib > ia) b a
) ( # else
if (sb == trump) b a
)
There are 2 container types:
[]
(i.e. lists){}
(i.e. dicts)
We can match lists and dicts with arbitrary content by using the empty brackets or braces:
isList [] = True
isList Any = False
isDict {} = True
isDict Any = False
The empty brackets or braces can be given a parameter, in order to match specific content:
isStringList ([] String) = True
isStringList ([] Any) = False
isStringListDict ({} ([] String)) = True
isStringListDict ({} Any) = False
Note that the parentheses can only be used for argument/parameter matching and can't be used to arbitrarily nest pattern-matching expressions.
Destructuring can be used inside list or dict pattern-matches:
# the pattern-matching turns a and b into lists
flatten ([] (Pair a b)) = fold \($+$) [] (zip a b)
The content of tuples, or dicts with known entries, can also be destructured:
mag [a::Float,b::Float] = sqrt a*a + b*b
# the price variable shadows function definitions
price {qty: _, price: price} = price
Note that all lists can be treated as tuples.
Non-constructor functions can be defined multiple times with different arguments. The function that is eventually dispatched depends on the following:
- first the functions are filtered that have the correct number of arguments
- then (nested) pattern-matching generates vectors of scores, the lower the score the better the match
- the function with the longest score vector is selected
- if there is a tie, then the function with the best score for every score component is selected
- remaining ambiguity throws an error
Each score component corresponds to a single tag match. For example a value constructed with True
matches the True
tag pattern with a score of 0
, the Bool
tag pattern with a score of 1
, and the Any
tag pattern with a score of 2
.
Another example:
Vec2 a::Float b::Float = [a,b]
#
# would match (Vec2 1.0 1.0) with score [0,0,0]
mag (Vec2 a::Float b::Float) = sqrt a*a + b*b
#
# would match (Vec2 1.0 1.0) with score [0,2,2]
mag (Vec2 a b) = sqrt a*a + b*b
#
# would match (Vec2 1.0 1.0) with score [1,0,0]
mag [a::Float, b::Float] = sqrt a*a + b*b
Note that an ambiguity error would be thrown if the first version of mag
wasn't defined.
All source files in the same directory are part of the same module.
Functions defined in any file of a given module can be called from any other file in the same module, without the need for an import
statement.
Non-constructor functions are attached to locally defined constructor tags, or to the module scope in case those tags were defined outside that module.
Import statements must appear first in source files:
import "./<sub-module-relative-to-current-dir>"
import "/<sub-module-relative-to-project-root>"
Errors are thrown when modules try to import themselves, or when circular module dependencies are encountered.
Functions and constructors with a _
prefix are never exported, and are private to a module.
External modules are declared in a package.json
file in the project root:
{
"dependencies": {
"std": {
"version": "0.1.2",
"url" : "github.com/openengineer/casperlang-stdlib"
}
}
}
In this example the casper standard library can be imported as follows:
#main.cas
import "std"
#
main =
println "the std library contains many basic functions"
Note that there is no namespacing mechanism, and libraries have to be designed carefully to avoid name-conflicts when being imported.
It is also possible to only access sub-modules of external modules:
import "std/css"
#
genStyleSheet = (
h = 4:rem;
CSS (
body (
nav (
flex "hsc";
h1 (lineHeight h; fontWeight "normal");
id "app-links" (
height h; flex "hsc";
a (
cursor "pointer";
hasAttr "active" (fontWeight "bold"; color "black"; hover (color "black"))
)
)
)
)
)
)
Significant whitespace inside expressions could be implemented as ASI and automatic parenthesis insertion.
This might allow getting rid of some ugly parentheses, but inlining chained expressions would be less apparent. And in some fringe cases it would be hard for the user to guess where semicolons and parentheses would be inserted.
Significant whitespace doesn't seem to provide an overwhelming advantage for expressions, so that's why I've decided to make casper a mostly non-significant-whitespace language (indents are only significant for import
statements and function bodies).
For comparison I've written a version of the css example using significant whitespace inside expressions:
import "std/css"
# example with automatic semicolon and parentheses insertion
genStyleSheet =
h = 4:rem
CSS
body
nav
flex "hsc"
h1
lineHeight h
fontWeight "normal"
id "app-links"
height h
flex "hsc"
a
cursor "pointer"
hasAttr "active"
fontWeight "bold"
color "black"
hover
color "black"