-
Notifications
You must be signed in to change notification settings - Fork 61
KAST
This is an old page; see https://github.com/kframework/k/wiki/KAST-and-KORE.
#Introduction
This proposal is about the syntax of KAST, which is the fragment of K where sentences
only use K-style ASTs, that is, they do not use concrete syntax. Recall that K in its
full generality requires a two-phase parser: one parsing the syntax declarations and
generating a parser for the rules (or contexts, macros, etc.), and the other parsing
the rules using the parser generated in phase one. KAST is a fragment of K meant to
be very easy to parse, eliminating the need for a two-phase parser. From a user
perspective, the difference between K and KAST is that instead of writing terms which
need a concrete-syntax parser to parse, such as A + B
, one writes terms using the
KAST syntax, such as _+_(A,B)
(or, to avoid ambiguities, `_+_`(A,B)
, where
`_`
is the bracket of K; this is explained below).
The idea is that KAST should be nothing but a textual reflection of the data-structure instance (a Java object) holding the K definition, the same way the KAST of a program is nothing but a textual reflection of the AST data-structure instance (also a Java object) of that program. We are going to have a trivial "loader" of a textual KAST definition, which in principle is a parser, but a very simple one, much simpler and faster than the current K parser.
- Dorel: Here we speak for the first time of Java classes (implementation). These classes and their relation with KAST is clear for familiar with the current version of K, bur for a beginner it could be obscure. I think that KAST is defined independent of its implementation but having in mind an OO (object-oriented) implementation such that syntactical constructs in KAST are in correspondence 1-1 with the classes from the implementation. It would be interesting to see if we can make transparent this relationship (may be in a different document).
There are LOTS of advantages for doing this. For example, this gives us a convenient
separation of concerns, where the complex K parser for concrete syntax is completely
separated out from the kompile tool, so both of them can then be designed and improved
in isolation. Further, we hope that the syntax of KAST will be simple enough that we
will not need to change it as we change the support for parsing in K, so middle-end
and back-end developers will not depend on other crazy syntactic tricks that we may
apply to the front-end (e.g., sort inferencers, syntactic conventions for builtins
like using X |-> I
in cells, etc.).
Let us start with the generic syntax of K, that is, the K syntax for defining
language syntax and semantics which is independent upon the particular language
that is being defined. We refer to this syntax as the outer syntax. For
simplicity, we do not include the syntax for comments or for documentation,
because these can be designed separately; in general, we assume usual many-line
comments surrounded with /*
and */
, and single-line comments starting with
//
, and allow them to appear anywhere in a definition; also, documentation is
added by using /*@
(for main text) or /*!
(for preamble) instead of /*
,
and similarly //@
or //!
instead of //
. A parser for the outer syntax
would put all the K terms into "bubbles". In what follows, we adopt the K
notation for defining syntax, extended with a few constructs which we may
adopt in K, too, such as Sort1 | Sort2
for on-the-fly disjunction of two
sorts, List{Sort}{separator}
for non-empty separator
-separated lists of
elements of sort Sort
, etc.:
syntax String ::= usual double-quoted strings
syntax NAME ::= any sequences of characters; the context may further
restrict them (e.g., variables/sorts start with capital
letter, etc.); we are well aware that is is just too
little said to get us any close to an actual parser,
but we prefer to keep it this way for now, so we can
focus on naming conventions (for modules, for files,
for labels, etc.) later.
syntax K // The outer syntax parser does not care about the concrete
// or abstract syntax; all syntactic terms are parsed as
// K "bubbles" at this stage.
syntax Definition ::= Requires Modules
syntax Requires ::= List{Require}{""}{""}
syntax Require ::= "require" String
syntax Modules ::= List{Module}{""}{""}
syntax Module ::= "module" NAME Imports Sentences "endmodule"
syntax Imports ::= List{Import}{""}{""}
syntax Import ::= "import" NAME
syntax Sentences ::= List{Sentence}{""}{""}
syntax Sentence ::= Configuration | Syntax | Context | Rule
syntax Configuration ::= "configuration" K OptionalAttributes
// cells/configurations represented as K terms
syntax Syntax ::= "syntax" Sort OptionalAttributes
// sort declarations, possibly with attributes
| "syntax" Sort "::=" List{Production}{"|"}
// non-empty list
| "syntax" Sort "::=" SpecialProduction OptionalAttributes
// at most one allowed for a Sort
// We should also keep the operator priority declarations
// in KAST; I'll add those as soon as we are done with
// the first round of the design.
syntax Sort ::= NAME
syntax Production ::= SymbolsOrToken OptionalAttributes
syntax SymbolsOrToken ::= Symbols | Token
syntax Symbols ::= List{Symbol}{""} // non-empty list
syntax Symbol ::= String | Sort
syntax Token ::= "Token" "{" String "}"
// with default follows restrictions
| "Token" "{" String "}" "{" String "}"
// with custom follows restrictions
syntax SpecialProduction ::= List
syntax List ::= "List" "{" Sort "}" "{" String "}" "{" String "}"
// Like for Token, we may add shortcuts for lists
// with default unit (.Sorts)
// and/or default separator ('_,_); but this is not
// urgent, it can wait.
syntax Context ::= "context" K OptionalAttributes
syntax Rule ::= "rule" OptionalName K OptionalAttributes
syntax OptionalName ::= "[" NAME "]" ":"
// has priority over user-defined syntax
syntax OptionalAttributes ::= "" | "[" Attributes "]"
syntax Attributes ::= List{Attribute}{","}
// (non-empty list) Not sure about this,
// e.g., Latex macros, etc. Still open.
syntax Attribute ::= NAME AttributesOrNot
| String
// E.g., unstructured Latex macros, etc.
syntax AttributesOrNot ::= "" | "(" Attributes ")"
The K terms marked as "bubbles" by the outer parser need to be further parsed with the inner parser (either KAST or Concrete). Note that each such K term bubble appears in a construct that also allows optional attributes. One of those attributes can, for example, state the specific parser to be used to parse the corresponding K term bubble.
-
Brandon: We discussed attributes quite a bit. The most relevant conclusion is that because the basic parser is already implemented and works, we can implement and test KAST for most serialization purposes without needing to redesign the syntax. Second, we all like the idea of doing more structured parsing eventually. Dwight is currently cleaning up attribute-handling code, and will tell us next week what kind of parsing attributes currently do, so we'll have a more grounded idea. Third, we think Latex and everything can be handled nicely just by adding
syntax Attribute ::= String | MarkdownString
to the above, so unstructured leaf nodes can be written as doublequoted or backticked strings. However, this will certainly require fixing all latex attributes, if only to add two backticks, and Dwight will have more information next week so we're not quite ready to propose this. -
Grigore: There is no basic parser anymore :). We should, however, take this opportunity and redesign the outer parser as well, to fit the syntax above, and while doing so, to also refine the syntax until we reach something we like. The fact that we have something implemented should not be used as an argument to not improve it. I added String as an option for Attribute above. However, I don't think that we should allow MarkdownString, because we are going to use the backticks as "the brackets of K", and the brackets are not meant to allow unstructured or unparsable things inside.
####Important Note
The outer parser should be as simple and efficient as possible. It should not attempt to parse any K terms at all (except for taking into account the K brackets, explained below) and it should give very nice error messages. The error messages will be, of course, language-independent; that's because the syntax of the defined language will not be known to the outer parser. The inner parser will only be concerned with parsing the K term bubbles (possibly in parallel).
####Important Restriction
The users therefore add their custom names (modules, sorts, terminals,
etc.) by two basic means: NAME
s and String
s. In order to keep parsing
efficient and error messages easy to understand, for the time being we are
going to reserve one character to K: the bactick `
. The backtick
will be used by K for grouping, e.g., `1+2 => 3` ~> HOLE*4 ~> ...
,
and K's grouping must unambiguously work regardless of the particular syntax
of the defined language. It is still possible to use backtick in your
language syntax, but each use of it must be escaped with a \
in the K
definitions. For example, abc\`def
is how to refer to the name
abc`def
in one's definition, abc\\`def
is how to refer to
abc\`def
, etc., and similarly for backticks appearing in strings.
There are ways to avoid escaping the backtick (see the Kuestions below),
which will probably be eventually supported if there is sufficient user
pressure, but for now let us simply assume that the backtick is reserved
to K.
-
Brandon: Didn't multiple backticks replace every other way of escaping backticks?
-
Grigore: No. We use the backticks as possibly nested brackets and we do not want to worry about dealing with nested backticks which are not brackets. Once this becomes clear, I will make it into a Kuestion. See also the existing Kuestions and let me know if they still don't clarify this issue.
#Inner KAST Syntax
KAST adds to the outer syntax a more specific syntax for K terms, but one
which is still very easy to parse (without requiring 2 phase parsing).
Specifically, it adds the following: the abstract syntax of K terms;
support for variables, sort memberships and cells; the
rewrite relation _=>_
; support for brackets `_`
(i.e., K's way
of grouping); and the HOLE. We are likely going to use the K brackets
(grouping) more than needed in what follows; hopefully the new parser
will be smart enough to not need all these explicit grouping, but we
prefer to overuse the K brackets here because we want to thoroughly
illustrate how they work.
Here is the abstract syntax of K ground terms, which can, in particular, be used to represent ASTs of programs:
syntax KLabel ::= NAME
// only constants; see below for details on KLabel
// names corresponding to concrete syntax.
syntax KToken ::= NAME
// basic tokens; see below for details on KToken
// names corresponding to concrete syntax.
| #klabel(KLabel)
// allows to regard KLabels as tokens, to pass around.
syntax KItem ::= KToken | KLabel "(" KList ")" // generic
syntax KList ::= List{K}{","}{".KList"} // generic
syntax K ::= List{KItem}{"~>"}{".K"} // generic
The biggest change is that atomic values like tokens are now directly
subsorted into KItem
, rather than into KLabel
and then applied to
an empty list like it used to be. There are only constant KLabel
s and
KToken
s, and there is no other way to add more of these except by adding
them as constants. Some of the KLabel
s can be tagged with the attribute
function
, but that changes nothing in what regards KORE parsing.
Since rewrite rules will only be allowed to rewrite K
terms to K
terms
(see below), functions are implicitly only allowed to take KList
s and
return K
terms.
-
Brandon: this grammar should describe the heavyweight forms of labels and tokens
-
Grigore: Why? The above is enough to allow users to define any language they want. For example:
syntax KToken ::= "zero" syntax KLabel ::= "succ" | "plus" rule plus(zero,N) => N rule plus(succ(M),N)) => succ(plus(M,N))
I understand your concern, namely that you may have an infinite number of tokens (and even of labels), and so we need some generic, heavy mechanism to encode them. Well, I don't think we need. In any given semantics we are going to only have a finite number of them, which we can simply define as constants. For example
syntax KToken ::= "0" | "main" | "Object"
Yes, programs can use infinitely many of them, but I still fail to see where the problem is. The parser will know how to deal with them. Remember that we are going to keep the syntax declarations of the form
syntax KToken ::= Token{[1-9][0-9]*}
as a means to define infinitely many
KToken
constants (and we may even adopt a similar one if we decide we want infinitely manyKLabel
constants, too). As a parenthesis, we should also fix the representation problem for tokens that we have right now, where we make a distinction between"main"
andToken{main}
. There should be no difference. They should just be both the same constant token. I would prefer the lighter notation, generalized with regular expressions. That is,syntax Token ::= "[1-9][0-9]*"
, etc. And this should work not only for constants of sortKToken
, but for constants of any sort, that is,syntax Number ::= "[1-9][0-9]*"
should generate tokens and not
KLabels
. In other words, tokens are regarded as special labels that take no arguments, which we prefer to write ast
instead oft(.KList)
. Finally, if we want to also have follows restrictions for a token, then we can use attributes for those.
Recall that K allows concrete syntax productions and rules of the form:
syntax Int ::= Token{"[1-9][0-9]*"}
syntax Exp ::= Int
| Exp "+" Exp
syntax Ints ::= List{Int}{","}{".Ints"}
syntax Exps ::= List{Exp}{","}{".Exps"}
rule 0 + E => E
rule E + 0 => E
but that all of these eventually yield KToken
s and KLabel
s, which
then can be used to write semantic rules. What are the names of these
special KToken
s and KLabel
s? Any names would work, provided that
they contain the sequence of characters that the KToken
or the KLabel
comprises, as well as the sort/type information. For uniformity and for
conceptual back-compatibility with the K concrete parser, we propose the
following syntax for both KLabel
s and KToken
s:
NAME "::" Type
where we will refer to NAME
as the root of the token or label
(escape mechanisms will be needed to deal with whitespaces, double colons,
etc.) and where Type
is either Sort
(for tokens) or
Sort1*Sort2*....*Sortn->Sort
(with no spaces) for labels. Additionally,
for each label label::Sort1*Sort2*...*Sortn->Sort
we also automatically
add its root label label
to KLabel
s, which for all rewriting purposes
can be regarded as the top-lattice variant of the corresponding operation,
that is, label:K*K*...*K->KItem
; conceputally, it stands for all the
overloaded operations with the same root. For example, the above would
be equivalent to the following KToken
s, KLabel
s and rules:
syntax KToken ::= "[1-9][0-9]*::Int" // or Token{[1-9][0-9]*::Int}
syntax KLabel ::= "_+_::Exps*Exps->Exps"
| "_+_"
| "_,_::Ints*Ints->Ints"
| "_,_"
| ".Ints::->Ints"
| ".Ints"
| "_,_::Exps*Exps->Exps"
| ".Exps::->Exps"
| ".Exps"
rule _+_::Exps*Exps->Exps(0::Int, E:K) => E:K
rule `_+_::Exps*Exps->Exps`(E:K, `0::Int`) => E:K
// As a variation, we used K's brackets (described below).
// These can be useful to avoid ambiguities.
These rules can also be written as (see the discussion below):
rule _+_(0::Int, E:K) => E:K
rule `_+_`(E:K, `0::Int`) => E:K
The "::" notation is consistent with the static cast convention in Full K.
There is a subtlety regarding overloaded KLabel
s, which in fact
determined the decision to add the root labels as well, but that is
more a concern of the backend rewrite engines. Suppose that you want
to define a function length
for lists of expressions, here using
concrete syntax:
syntax Int ::= length(Exps) [function]
rule length(.Exps) => 0
rule length(E:Exp,Es:Exps) => 1 + length(Es)
The frontend will need to be smart enough to add a common KToken
for
the unit of the overloaded list, say ._,_
, and add the ktoken(._,_)
attribute to both .Exps
and .Ints
, and then translate the above
into the following KAST (provided that we continue to work with cons-lists;
things would be different with associative lists):
syntax Int ::= length(Exps) [function]
rule `length`(`._,_`) => `0::Int`
// using K's brackets (backticks) for clarity (explained below)
rule `length`(`_,_`(E:Exp,Es:Exps)) => `_+_`(`1::Int`, `length`(Es))
Since the rewrite engine calculates the least sort of a term anyway, the
above rules would safely work, because _,_
without sorting information
is the same as _,_::K*K->K
.
Adding support for dynamic sorting/casting and variables:
syntax Variable ::= NAME
// possibly restricted to start with capital letter
syntax KLabel ::= Variable ":" KLabel
syntax KItem ::= Variable
| KItem ":" Sort // KItem must obey its sort at runtime
// sorting arbitrary KItems; Sort can also be
// K, KItem, KToken, but not KList
In KAST, we want all the variable occurrences to be sorted. Although our
syntax above allows for unsorted variables to keep the number of
productions small, the KAST parser will reject definitions where variables
are not unsorted. Also, the same variable should have the same sort in all
occurrences in the same sentence. If we want to not enforce a runtime
sort membership check for a variable, for performance reasons, then all we
have to do is to sort is with the sort K, e.g., T:K
. Since any term has
the sort K anyway, backends will likely not insert an unnecessary runtime
check in this case.
Note that the syntax above suffices to declare variables of any of the
sorts KItem
, KList
and K
! This may look wrong or tricky at first
sight, because one would expect a term/variable sorted as a KList
to
be a KList
term and not a KItem
, but it is in fact correct; that's
because all those sorts include KItem
. For example, the term
L:KLabel(KL1:KList,K:KItem,Kl2:KList)
is a well-formed KItem
term. It is likely that a backend stores and handles Kl:KList
as a
term of sort KList
, but that is an orthogonal issue; our aim here is
to devise a simple syntax for KORE, independently from how backends may
store or handle it.
##Cells and Configurations
In the long run we want cells and configurations to be embedded into K
the same way we decided to embed the builtins (integers, maps, etc.).
Probably <k> T </k>
will become something of the form <k>_</k>(T)
,
or to avoid ambiguities, `<k>_<.k>`(T)
. However, for now we
decided to make a notational exception and allow cells to be written
in an XML-ish notation in KAST (this is orthogonal to how cells are
represented in the KAST data-structures):
syntax K ::= Cells
syntax Cells ::= List{Cell}{""} // non-empty list
syntax Cell ::= "<" NAME OptionalTags ">" K "</" NAME ">"
// the two NAMEs should be the same
syntax OptionalTags ::= List{NAME "=" String}{""}{""}
-
Brandon: We need to allow rewrites and bracketing as entries in a Cells
-
Grigore: Right ... and I don't like it. Maybe we should take this as an opportunity to also make the cells be K terms, and thus allow cells to be normally passed and handled like any other K terms?
##Rewriting and HOLE
For now, we believe that allowing rewrites only between terms of sort K should be sufficient:
syntax K ::= K "=>" K // should be greedy
| "`" K "`" // bracket of K, used for grouping;
// multi-backticks allowed (discussed shortly)
| "HOLE" // further restricted, only in some contexts
| K "requires" K
| K 'ensures" K
syntax KLabel ::= "`" KLabel "`" // bracket for KLabels
A rewrite arrow =>
, or a requires
, or an ensures
construct is not
allowed to appear as a descendant of any other rewrite arrow, or requires
or ensures
construct. Any rule must contain one or more rewrite arrow
constructs =>
, and the requires
or ensures
constructs can only appear
at the top of the K term in the rule (in any order and with any multiplicities).
For example, the following is a correct rule at this stage (additional checks
may reject it for other reasons):
rule L1(KL1,`K1 => K1'`, KL2, L2(K2 => K2'))
requires K3
ensures K4
requires K5
ensures K6
The ensures
construct is typically useful to state constraints over
fresh variables introduced by the right-hand-side of the rule.
Note that we do not want to add parentheses as a bracket in KAST, because we know it that those can be in conflict with parenthesis constructs in the user-defined syntax; we were not able to handle the syntax of Scheme, for example, before.
-
Brandon: HOLE should probably be a KItem like other variables, putting it here disallows freezing one item out of a KSequence as in
label(HOLE~>Rest)
. -
Grigore: I don't understand this.
label(HOLE~>Rest)
is aKItem
whenHOLE
is a K term, so I don't see what the problem is here.
To avoid conflicts with user syntax and to allow nested brackets, we plan to
adopt the Markdown
code convention for brackets. In short, that says that you can use arbitrarily
many backticks as open bracket and precisely the same number of backticks as
close bracket, and the nested brackets should use fewer ticks. For example,
`` `_+_`(1,2) => 3 ``
represents a top bracket enclosing a rewrite rule
whose LHS is the the label _+_
applied to 1 and 2 and whose RHS is 3.
Approximating the convention with context free productions, K brackets include the following
syntax K ::= "`" K "`" | "``" K "``" | "```" K "```" | ....
The rationale for adopting Markdown's convention is twofold:
-
It is a widely used approach to mark bubbles of text within text, designed by experts in this area with millions of users in mind. There are not too many complaints posted on the internet about Markdown's backtick convention, so it is probably quite good. We need precisely the same characteristics for our brackets/bubbles in K, so why reinvent the wheel? Our best shot here is to take over Markdown's convention as is, without changing anything in it, because any change we make can have unexpected consequences. Think about it this way: is there anything that Markdown tried to achieve with its bubbles that we would't need? I cannot think of anything. We want a bit more than Markdown, though, namely to have nested brackets (i.e., bubbles in bubbles), but that does not seem to be a problem because we can just apply the same mechanism inside the bubbles.
-
It only reserves one character, the backtick, so users will only need one thing to worry about (escaping that one character only); this is obviously a consequence of the overall good design of Markdown.
The outer parser will consider brackets when parsing a K bubble. This allows
the object language syntax to contain keywords like rule
which would
otherwise end the bubble, by enclosing them in a bracket.
The outer parser scans the text immediately following the current outer syntax keyword until the next keyword is encountered, skipping all the brackets. The beginning and the end of brackets are very easy to identify because they consist of the same number of backticks; since nested brackets should contain fewer backquotes, the text inside the bracket can be very effectively skipped.
#Example: IMP in KAST
Below is IMP defined in the KAST language above, including all the builtins
that it needs. For simplicity, it uses the root-only syntax for KLabel
s,
namely _+_
instead of _+_::Exp*Exp->Exp
.
###IMP Syntax and Builtins
We can define the syntax either using sorts and productions, which are
easier to read and hold more information that may be useful in some backends,
or using only KLabel
s, which are more minimal but also lose sort information.
Here is the syntax using sorts and productions. This is almost identical to the IMP syntax in K, except for purely parsing-related features, such as bracket productions, operation priorities, prefer and avoid attributes, etc.:
module BUILTINS
syntax Int [builtin(Integer)]
syntax Bool [builtin(Boolean)]
syntax Id [builtin(Identifier)]
syntax Int ::= Int "/Int" Int [builtin(IntegerDivision)]
| Int "+Int" Int [builtin(IntegerAddition)]
syntax Bool ::= Int "<=Int" Int [builtin(IntegerLeq)]
| Int "=/=Int" Int [builtin(IntegerNeq)]
| "notBool" Bool [builtin(BoolNot)]
syntax Map ::= ".Map" [builtin(MapEmpty)]
| lookup(Map) [builtin(MapLookup)]
| update(Map,K,K) [builtin(MapUpdate)]
| addBinding(Map,K,K) [builtin(MapAddBinding)]
syntax Bool ::= K "in" Map [builtin(MapMembership)]
syntax K ::= keys(Map) [builtin(MapGetKeys)]
endmodule
module IMP-SYNTAX
imports BUILTINS
syntax AExp ::= Int | Id
| AExp "/" AExp [strict]
| AExp "+" AExp [strict]
syntax BExp ::= Bool
| AExp "<=" AExp [seqstrict]
| "!" BExp [strict]
| BExp "&&" BExp [strict(1)]
syntax Block ::= "{" "}"
| "{" Stmt "}"
syntax Stmt ::= Block
| Id "=" AExp ";" [strict(2)]
| "if" "(" BExp ")" Block "else" Block [strict(1)]
| "while" "(" BExp ")" Block
| Stmt Stmt
syntax Pgm ::= "int" Ids ";" Stmt
syntax Ids ::= List{Id}{","}{".Ids"}
endmodule
Here is the KLabel
alternative of the syntax above, which may be preferred in some cases:
module BUILTINS
syntax Int [builtin(Integer)]
syntax Bool [builtin(Boolean)]
syntax Id [builtin(Identifier)]
// Should we also preserve the subsorting relation, just in case
// the rewrite engine needs to calculate the least sort of a term
// dynamically? It would be nice to not have to, if possible.
syntax KLabel ::= "_/Int_" [builtin(IntegerDivision)]
| "_+Int_" [builtin(IntegerAddition)]
| "_<=Int_" [builtin(IntegerLeq)]
| "_=/=Int_" [builtin(IntegerNeq)]
| "notBool_" [builtin(BoolNot)]
| "lookup_" [builtin(MapLookup)]
| "update_" [builtin(MapUpdate)]
| "addBinding_" [builtin(MapAddBinding)]
| "_in_" [builtin(MapMembership)]
| "keys_" [builtin(MapGetKeys)]
syntax KToken ::= ".Map" [builtin(MapEmpty)]
// All constant labels can be tokens
endmodule
module IMP-SYNTAX
imports BUILTINS
syntax KLabel ::= "_/_" [strict]
| "_+_" [strict]
| "_<=_" [seqstrict]
| "!_" [strict]
| "_&&_" [strict(1)]
| "{_}"
| "_=_;" [strict(2)]
| "if(_)then_" [strict(1)]
| "while(_)_"
| "__"
| "int_;_"
| "_,_" [assoc]
// not sure: we may want an attribute to allow label
// take arbitrarily many arguments.
syntax KToken ::= "{}"
| ".Ids"
endmodule
Recall that sorted variables belong to the syntactic category KItem
, which
only means that they can appear anywhere a KItem
variable can appear, that is,
anywhere except as a KLabel
, and that unsorted variables belong to either the
KLabel
or the KItem
syntactic categories; this yields no ambiguity, because
the context quickly tells which is which. The internal representation of the K
definition can and probably should change the sort of each variable to whatever
it wants. Our objective here was to have a very simple and efficient KAST syntax
and parser. For clarity, we backtick all the KLabel
s and KToken
s, although
a good parser should probably not need all these K brackets, hopefully even none:
module IMP
imports IMP-SYNTAX
rule isKResult(I:Int) => `true::Bool`
rule isKResult(B:Bool) => `true::Bool`
configuration <T>
<k> $PGM:Pgm </k>
// or $PGM:K if one prefers to pass other syntactic
// categories (e.g., Stmt, AExp, etc)
<state> `.Map` </state>
</T>
rule <k> X:Id => `lookup`(Sigma,X:Id) ...</k> <state> Sigma </state>
rule `_/_`(I1:Int,I2:Int) => `_/Int_`(I1:Int,I2:Int)
requires `_=/=Int_`(I2:Int,`0::Int`)
rule `_+_`(I1:Int,I2:Int) => `_+_`(I1:Int,I2:Int)
rule `_<=_`(I1:Int,I2:Int) => `_<=_`(I1:Int,I2:Int)
rule `!_`(T:Bool) => `notBool_`(T:Bool)
rule `_&&_`(`true::Bool`),B:BExp) => B:BExp
rule `_&&_`(`false::Bool`,_) => `false::Bool`
rule `{}`(.KList) => .K [structural]
rule `{_}`(S:Stmt) => S:Stmt [structural]
rule <k> `_=_;`(X:Id,I:Int) => .K ...</k>
<state> Sigma => `update`(Sigma,X:Id,I:Int) </state>
rule `__`(S1:K,S2:K) => S1:K ~> S2:K [structural]
rule `if(_)_else_`(`true::Bool`,S:Stmt,_) => S:Stmt
rule `if(_)_else_`(`false::Bool`,Bool),_,S:Stmt) => S:Stmt
rule `while(_)_`(B:BExp,S:Stmt)
=> `if(_)_else_`(B:BExp,
`{_}`(`__`(S:Stmt,`while(_)_`(B:BExp,S:Stmt))),
`{}`) [structural]
rule <k> `int_;_`((`_,_`(X:Id, Xs:Ids) => Xs:Ids), _) </k>
<state> Sigma:Map
=> `addBinding`(Sigma:Map, X:Id,`0::Int`) </state>
requires `notBool`(`_in_`(X:Id, `keys`(Sigma:Map)))
rule `int_;_`(`.Ids`,S:Stmt) => S:Stmt [structural]
endmodule
#Kuestions, Koncerns, etc.
Below we compiled a list of questions that were directly or indirectly asked at various moments. Please feel free to ask more questions, even if you are not sure what you are asking precisely :) The questions help us all nail down the corner cases and perhaps even come up with something better, if needed.
In case we decide to disallow the unescaped use of the special backquote symbol in user syntax, wouldn't it make sense to implement a simple bracket checker, to ensure that brackets are properly nested?
Answer: This would be very useful and easy to implement (linear?). Users would be given insightful error messages very quickly. Moreover, the checker can put the (nested) bracket contents in special intermediate data-structures to make the outer and the inner parsers more efficient.
How can we use the new K bracket to tell the outer parser that the entire
contents of a rule should be considered as a bubble, and thus prevent it from
matching rule
or syntax
or other of its keywords in that rule's contents?
Answer: Suppose that your rule is rule contents [attributes]
and that
contents
happens to have no K brackets in it; then you can simply write the
rule as rule `contents` [attributes]
. Now suppose that contents
happens
to have some subterms containing brackets with both one and two backticks, say
`k1`
and ``k2``
; then just use three backticks as a bracket for
contents
in the rule: rule ```contents``` [attribute]
. And so on and
so forth. The point is that once we have so easy to parse brackets in K, there
is no need to have heredocs or other similar mechanisms to mark the rule contents.
Aren't we now asking too much from the outer parser? It now has to know how to parse Markdown bubbles, which is kind of non-trivial.
Answer: Not really. Very little needs to change in the outer parser compared to, say, our previously discussed approach based on heredocs. First, note that you can obtain precisely the same effect as that obtained with heredocs by adding a bracket with enough backticks around the contents of the rule, in the same style as we did in the answer of the question above. Since Markdown enforces only brackets with fewer backticks inside brackets with more backticks, scanning the contents of the rule is equally easy now to how it was with heredocs, and easier than how it would have been with ordinary nested parentheses. So nothing is lost in terms of capabilities, on the contrary.
One may argue that something is lost in terms of user-experience in the presence of errors, specifically that outer parsing errors may be now harder to detect. Consider, for example, a definition snippet of the form
syntax Exp ::= rule(Exp)
rule `_+_`(1,rule(2)) => 3 // rule appears unbracketed in K term
rule `_+_`(1,`rule(2)) => 3 // the close bracket forgotten in `rule(2)
rule `_+_`(1,"rule(2)") => 3 // rule appears in a string in K term
Each of the three rules above would yield admittedly horrible outer parsing errors. However, with the capability that we can pretty-print the current definition at each step, one should be able to see what happened. Users will just have to learn that if they use outer syntax keywords as terminals in their syntax, then they just have to properly bracket the rules that use those keywords. The same happens if you forget to put the heredocs around such rules, so there isn't really much of a difference in terms of user-experience.
The described approach of how the outer parser deals with brackets seems to work well only under the assumption that users cannot use unescaped backquotes in their syntax. Would it extent easily if we later on decide to allow users to use backquotes in their syntax?
Answer:
We think so, but it of course depends on how we decide to handle the user
backquotes, and what expectations we want our users to have (a user study may
be needed). For example, we may want writers and especially readers of K
definitions to have to worry only about their own language inside of bubbles
(and the K constructs added to let them write rules). The outer parser cannot
always recognize brackets in precisely the same place as the inner parser
(because it must not depend on the object language, so a token like
"ab`cd"
which the inner parser treats as a unit may look like it opens or
closes a bubble to the outer parser). The reader may only want to look at the
first character of the rule to see whether they may need to keep in mind the
outer parser's weaker lexical idea of brackets, in addition to the object
language grammar. In that case, if the contents of the bubble begins with a
backquote, the outer parser finds the end of the bubble by parsing a sequence
of K brackets and unbracketed text until it finds an outer syntax keyword outside
of a bracket. If the bubble does not begin with a backquote, then the outer
parser doesn't try to find brackets starting later, and just ends the bubble
wherever it appears an outer syntax keyword appears (which may even be inside
a user-defined token like "a rule b"
, because it doesn't know the user syntax).
It is nice that with one reserved character, the backtick `
, we solved
both the problem of brackets in K and the problem of bubbles in the outer parser.
But does that mean that users cannot use the backtick in their syntax anymore?
What if I want to have backtick-enclosed strings in my language? Wouldn't I run
into a nasty syntactic conflict with K's brackets, which I wouldn't have with
heredocs?
Answer: Yes, you may run into parsing conflicts if you use the backtick in your
language's syntax, but note that the same would happen no matter what brackets you
use for K! And we do need brackets in K. So this is a problem related to brackets,
and not to heredocs. That is, by adopting heredocs for expressing rule bubbles
would only solve the problem of rule contents possibly containing restricted
keywords/characters for the outer parser, but not in general for K, not even for
KAST. We had exactly that problem previously with the K ::= "(" K ")"
brackets,
which conflicted very badly with user-defined parentheses, especially when they were
not defined as brackets in the user-defined syntax. This is a common problem with
string-like or bracket-like notations. For example, if you want a string to contain
the string delimiter as a character, then that character needs to be escaped
(e.g., "\""
is the string containing only one character, "
). Therefore, we
decided for the time being to adopt a similar escaping convention, and thus require
users to escape all the backticks that are part of their syntax. For example,
they would have to always write \`abc\`
for their backtick-enclosed strings.
Ugly, but not completely unexpected. After all it is only one character that K
asks from its users for itself, which is not that much to remember.
Nevertheless, some users will likely complain with the escaping solution above, because they want to see their syntax unaffected by artifacts of K. To keep such users happy, we believe that it would be nice to allow for local renaming of syntax in K, something like this:
...
rename K_BRACKET to $
...
rule $_+_$($`my_`::MyString$, $`string`::MyString$) => $`my_string`$
// exaggerating the use brackets on purpose
...
rename K_BRACKET to `
...
Regardless of what solutions we pick, I want to reemphasize that this problem is a problem of brackets and not of trying to replace the heredocs with something else. Having heredocs would not help at all this problem, because all the conflicts above would happen inside the heredoc anyway. And once we solve the problem for brackets, which we have to do anyway, we get a solution for our heredoc problem for free. So heredocs are not needed.
###Kuestion
The bracket does not really solve the problem of KLabel
s, because
`_+_`
can also parse as +
applied to two anonymous variables using the
concrete syntax of the language, so the concrete syntax parser for K may report
an ambiguity anyway in terms like `_+_`(1,2)
.
Answer: This is not a problem of KAST. It is a problem that the concrete syntax
parser needs to deal with. A simple solution is to have the K parser consider the
KLabel
and KToken
constants to have the prefer
attribute. That means that
if constants which can be KLabel
s or KToken
s are involved in any ambiguity,
then we prefer them to be KLabel
s or KToken
s and not something else. The
rationale behind this decision is that users are given other means to state their
defined operations in K, for example using whitespaces or their explicit KLabel
representation.
If the prefer
solution above is adopted, then the number of bacticks appearing
in rules will be significantly reduced anyway, because in that case there is only
one way to parse _+_(1,2)
, the desired one.
###Kuestion
Hold on, does that mean that if I have an operation in prefix notation whose
KLabel
has the same name, then I will never get a chance to refer to my
operation because the KLabel
takes over? For example:
syntax KLabel ::= "f"
syntax Exp ::= f(Exp) [klabel(f)]
rule f(3) => 0 // I want f::Exp->Exp here and not f::KLabel!
Answer: This is again a problem with concrete syntax parsing in K and not
with KAST, but since KAST will be smoothly extended by K, and so everything
working in KAST should work unchanged when we regard it as a K definition, we
address this issue here, too. Yes, you are right, there does not seem to be
any way to refer to your original f::Exp->Exp
with this convention.
Moreover, you have a parsing ambiguity on f(3)
.
We can think of two ways to deal with this problem, which are open for further
discussions. One approach is that the concrete K parser should not report
ambiguities in which both parses yield the same AST; if there is no ambiguity
anyway, why would you care whether f
parses as your original operation or as
its label? Another approach is to regard the user productions as explicit
mechanisms for making syntax that looks like an application, so maybe we can let
those productions take priority over label application, especially because you
can always write `f`(Args)
or maybe even ``f``(Args)
, if there's
an explicit production f(Exps)
that prevents f(Args)
from being a label
application (and perhaps the production f(Exps)
is explicitly assigned a
different KLabel
- maybe fExp
to separate it from a similar-looking
production in another sort).
BTW, if f
is used on a different number of arguments than declared, for
example
syntax KLabel ::= "f"
syntax Exp ::= f(Exp) [klabel(f)]
rule f(3,4) => 0 // I want f::Exp->Exp here and not f::KLabel!
then an error should be reported, regardless of the fact that we currently
allow KLabel
s to apply to as many arguments as they want to. Some people
think that we should disallow that in K anyway, that is, that we should
enforce each label to declare how many arguments it takes. But that is
another discussion.
###Kuestion
What would some other reasonable possibilities for representing basic
(i.e., ones generated from user-defined concrete syntax productions)
KLabel
s and KToken
s be? The concern here is that maybe the backticked
notation gets in some unexpected conflicts with the user-defined syntax and
the users have no escape mechanism, no matter how heavy that would be.
Answer: We are contemplating the possibility to allow both the lightweight backticked notation and a more heavyweight syntax. Exact lexical details still open, but something like this:
Lightweight | Heavyweight
------------------------------------------------------
`_+_::Exp*Exp->Exp` | #label("Exp*Exp->Exp","_+_")
`12::Int` | #token("Int","12")
Since we allow multi-backtick brackets now, the heavy notation may not be necessary after all. But we keep the possibility open.
Note a lightweight label and token can be distinguished by the arrow even
if the label doesn't take arguments: pi::->Float
is a label,
12.0::Float
is a token.
###Kuestion:
Should there be a .Cells
for empty cells?
Should we allow rewrites to be written aside cells, as in
<k>'spawn(P)~>_</k> (.K => <thread><k>P</k></thread>)
(or maybe (.Cells => <thread><k>P</k></thread>)
Answer:
Not sure about this. We would like cells to be just K terms in the long
run, in which case we would prefer .K
(which we will rename to .
probably) instead of .Cells
.
###Kuestion:
Do we need a more elaborate syntax for declaring configurations and cells, or
we can limit ourselves to KItem
s? How about modularity, where we want to
name and reuse configuration fragments?
Answer: I think this can be addressed separately once we are done with the
first version of KAST. That is, we can keep configurations and cells as are
for now, using the XML notation which is easy to parse, and we can worry about
the general case later. The general solution could, indeed, be to turn cell
and configuration constructs into special KLabel
s, say ones with an
attribute configuration
. For example:
syntax KLabel ::= "'<T>_</T>" [configuration]
syntax KLabel ::= "'<k>_</k>" [configuration]
Then the rewrite engine / backend should recognize these special labels and interpret them accordingly, for example by applying the configuration concretizers to them.
Anyway, we still have to discuss and design the modular configurations.
###Kuestion:
If we reduce configurations and cells to ordinary KLabel
s like above, then
don't we need to allow rewrite rules to have the syntax
KList ::= KList "=>" KList
in order to allow rewrites that take sets of
cells to sets of cells?
Answer: Hm, maybe. We have to think about this some more.
###Kuestion: Shouldn't we keep all the syntax declarations in KAST like in K, so that we can use the priority, prefer, bracket, etc., information for unparsing?
Answer: Probably yes.
###Kuestion
Don't we need static cast KItem "::" Sort
in KAST?
Answer: Doesn't seem so. The syntactic casts are useful in the front-end
to tell the concrete syntax parser which KLabel
to generate for the parsed
concrete syntax, but once all the KLabel
s are in place we don't need the
syntactic casts anymore.
###Kuestion:
Rules of the form rule k1 => k2[X]
turned out to be a source of problems
until now, because [X]
is interpreted as the optional attribute of the
rule and sometimes you want it to be interpreted as concrete language syntax,
for example an array index.
Answer: Yes, that is a problem, but a problem of the front-end and not of
KAST. The current solution in the front end is to use parentheses:
rule (k1 => k2[X])
. But we should at the same time improve the front-end
to give good error messages. And we should allow the outer parser to show
its output to the user, so one would immediately know when [X]
is parsed
as an attribute.
###Kuestion:
How about rules between KLabel
terms, as we can do with our current
concrete parser, for example:
syntax KLabel ::= "foo" | "bar" | "baz"
rule foo => bar
rule Context[(bar => baz)(kl)]
// kl is some KList term, and Context some K holed context
Answer: We found that in all cases we looked at we can complete/change the rules to apply between K terms:
rule foo(KL:KList) => bar(KL:KList)
rule Context[bar(kl) => baz(kl)]
We need to develop such automatic translators as part of the front-end.
###Kuestion:
So now we can wrap a KLabel
as a KToken
and then pass it around like
any KItem
. But the whole point of KLabel
s is that they can be applied
to KList
s. How can you apply a wrapped KLabel
, say #klabel('_+_)
,
to a KList
?
Answer:
We can easily define a KLabel
, say #kapply
, together with a rule
rule #kapply(#klabel(L:KLabel), Kl:KList) => L:KLabel(Kl:KList)
which can now be used to apply KLabel
s represented as KToken
s to KList
s.
For example, if you want to apply the wrapped label '_+_
to the list 1,2
,
then you can write #kapply(#klabel('_+_),1,2)
. However, as explained below
in answers to other questions, you may want to also wrap the KList
s that you
want to pass around with a KLabel
, say #klist
. In that case, you would write
the rule above as
rule #kapply(#klabel(L:KLabel), #klist(Kl:KList)) => L:KLabel(Kl:KList)
and the particular example as #kapply(#klabel('_+_),#klist(1,2))
.
###Kuestion:
Can we support complex constructors for KLabel
s in KAST, like we can currently
do in K with concrete syntax? For example:
syntax KLabel ::= foo(KLabel)
rule Context[foo(L:KLabel)(KL:KList)]
syntax KLabel ::= bar(KList)
rule Context[bar(KL1:KList)(KL2:KList)]
Answer: No problem. One way to do it is to regard them as ordinary KLabel
s
that yield KItem
s meant to be (wrapped) KLabel
s, and then use #kapply
(and #klist
) as we did above. For example:
syntax KLabel ::= "foo"
rule Context[#kapply(foo(#klabel(L:KLabel)), #klist(KL:KList))]
syntax KLabel ::= "bar"
rule Context[#kapply(bar(#klist(KL1:KList)), #klist(KL2:KList))]
So #kapply
needs not always be used as a "function"; it can also be used as
a "constructor". I quote these words because KAST does not provide any specific
support for them. They are so only in our mind. From a KAST perspective they
are just KLabel
s which are matched and rewritten.
###Kuestion: KAST should support anything we want to do in K with concrete syntax, such as, for example, "functions" of the form
syntax KList ::= rev(KList)
rule rev(K,KL) => rev(KL),K
rule rev(.KList) => .KList
How can we encode this in KAST?
Answer: Again, same basic principle like above: we add the various
operations as KLabel
s, and then we add corresponding rules between
corresponding KItem
terms. For some of them we also need to get some
special/additional labels involved, like #klist
, #kapply
, etc.
For example:
syntax KLabel ::= "rev"
rule rev(#klist(K:K,Kl:KList)) => #klist.snoc(rev(#klist(Kl:KList)),K:K)
rule rev(#klist(.KList)) => #klist(.KList)
where #klist.snoc
(snoc
is the reverse of cons
) is a special
KLabel
"function" defined as follows:
rule #klist.snoc(#klist(Kl:KList),K:K) => #klist(Kl:KList,K:K)
Note that we cannot simply make rev
a KLabel
and nothing else, because
then we have strange situations like this one that Traian noticed:
rev(rev(1,2)) => rev(1,2) => 2,1
. The trick with #klist
achieves the
desired innermost evaluation strategy. We will have to find smart ways to
automatically generate such KAST rules from the original concrete syntax K
rules as part of the front-end.
###Kuestion One of the major claims of K is that it offers both the core and the meta-level (in Maude terminology) at the same, core level. If KAST is powerful enough, then it should be able to define generic up and down functions that take a K term and lifts it to its meta-representation (also a K term), and, respectively, take a meta-representation of a K term down to a K term.
Answer: Here it is. The code below is manually adapted, to fit the new KAST, from a K working definition (on March 30, 2014), so there could be typos:
module META_K
syntax KLabel ::= "#kapply" [function]
rule #kapply(#klabel(L:KLabel),#klist(Kl:KList)) => L:KLabel(Kl:KList)
syntax KLabel ::= "#klist.cons" [function] | "#kseq.cons" [function]
rule #klist.cons(K:K,#klist(Kl:KList)) => #klist(K,Kl:Klist)
// sorting K:K needed, otherwise default sort would be KList
rule #kseq.cons(K:KItem,#kseq(Kl:KList)) => #kseq(K:KItem,Kl:KList)
syntax KLabel ::= "#ktoken" | "#kapp" | "#klist" | "#kseq"
// constructors
syntax KLabel ::= "#upKItem" [function]
| "#upK" [function]
| "#upKList" [function]
syntax KLabel ::= "#down" [function]
rule #upKItem(T:KToken) => #ktoken(T:KToken)
rule #upKItem(L:KLabel(Kl:KList))
=> #kapp(#klabel(L:KLabel),#upKList(#klist(Kl:KList)))
rule #upKList(#klist(K:K,Kl:KList))
=> #klist.cons(#upK(K:K),#upKList(#klist(Kl:KList)))
rule #upKList(#klist(.KList)) => #klist(.KList)
rule #upK(K:KItem ~> K':K)
=> #kseq.cons(#upKItem(K:KItem),#upK(K':K))
rule #upK(.K) => #kseq(.KList)
rule #down(#ktoken(T:KToken)) => T:KToken
rule #down(#kapp(#klabel(L:KLabel),MKl:KItem))
=> #kapply(#klabel(L:KLabel),#down(MKl))
rule #down(#klist(.KList)) => #klist(.KList)
rule #down(#klist(K:K,Kl:KLabel))
=> #klist.cons(#down(K:K),#down(#klist(Kl:KList)))
rule #down(#kseq(.KList)) => .K
rule #down(#kseq(Ki:KItem,Kl:KList))
=> #down(Ki:KItem) ~> #down(#kseq(Kl:KList))
endmodule
If we can use KAST for serialization between compiler passes, how does it preserve calculated information on AST nodes that later compiler passes might need?
-
Brandon: We agreed that any information can be expressed as attributes, but it's not clear what syntax could work, even if we don't try to provide the term-metadata syntax in the concrete syntax.
-
Grigore: To keep things simple, we should not worry about any other meta-data for terms. Of course, we need sort memberships, but those are already allowed as part of KAST, using the syntax
term:Sort
. We will continue to keep the meta-data for rules within rule attributes. If we need to report an error message regarding a particular position of a particular syntactic construct, then: 1) if that error happened during the concrete syntax phase then the concrete parser has the location information that it needs; 2) if that error happened during the kompilation, then the location of the rule and the KAST syntax of the term where the error happened are good enough for the beginning (and probably for good).
We allow the text meant to parse as K to use backticks `
. A backtick tells the GeneriK parser that the next character is part of the K bubble, so it should not treat it as part of any of its kewords. This feature is useful when the defined language has syntax that overlaps with that of GeneriK, and thus may confuse the GeneriK parser with regards to where the bubbles start and where they end. For example, the GeneriK parser should parse the following module
module TRY
syntax Sort ::= "rule" | "syntax" | "configuration"
configuration <T> <k> $PGM </k> </T>
rule <k> (`rule => configu`ration) ~> `syntax ~> rule ...<k> [attribute]
endmodule
as something like
module TRY
syntax Sort ::= "rule" | "syntax" | "configuration"
configuration GeneriK_BUBBLE(<T> <k> $PGM </k> </T>)
rule GeneriK_BUBBLE(<k> (`rule => configu`ration) ~> `syntax ~> )
rule GeneriK_BUBBLE(...<k>) [attribute]
endmodule
and a corresponding error message will be then given by the bubble parser (full K or the KORE). Once one adds a backtick before any of the four characters of the last rule
token, say before the last one, then the module would be parsed by the GeneriK parser as
module TRY
syntax Sort ::= "rule" | "syntax" | "configuration"
configuration GeneriK_BUBBLE(<T> <k> $PGM </k> </T>)
rule GeneriK_BUBBLE(<k> (`rule => configu`ration) ~> `syntax ~> rul`e ...<k>) [attribute]
endmodule
end eventually kompiled correctly. In practice, we will probably put the backtick at the beginning of the token, but it is simpler and more general to allow it anywhere.