-
Notifications
You must be signed in to change notification settings - Fork 61
KAST
#Introduction
This proposal is about the syntax of KAST, which is the fragment of K where sentences only use K-style ASTs, that is, they do not use concrete syntax. Recall that K in its full generality requires a two-phase parser: one parsing the syntax declarations and generating a parser for the rules (or contexts, macros, etc.), and the other parsing the rules using the parser generated in phase one. KAST is a fragment of K meant to be very easy to parse, eliminating the need for a two-phase parser. From a user perspective, the difference between K and KAST is that instead of writing terms which need a concrete-syntax parser to parse, such as A + B
, one writes terms using the KAST syntax, such as _+_(A,B)
(or, to avoid ambiguities, `_+_`(A,B)
, where `_`
is the bracket of K; this is explained below).
The idea is that KAST should be nothing but a textual reflection of the data-structure (a Java object) holding the K definition, the same way the KAST of a program is nothing but a textual reflection of the AST data-structure (also a Java object) of that program. We are going to have a trivial "loader" of a K definition in KAST format, which in principle is a parser, but a very simple one, infinitely simpler and faster than the current K parser.
There are LOTS of advantages for doing this. I'll only mention the beautiful separation of concerns that we will achieve, where the complex K parser is completely separated out from the kompile tool, so both of them can then be designed and improved in isolation. Further, we hope that the syntax of KAST will be simple enough that we will not need to change it as we change the support for parsing in K, so middle-end and back-end developers will not depend on other crazy syntactic tricks that we may apply to the front-end (e.g., sort inferencers, syntactic conventions for builtins like using X |-> I
in cells, etc.).
Let us start with the generic syntax of K, that is, the K syntax for defining language syntax and semantics which is independent upon the particular language that is being defined. We refer to this syntax as the outer syntax. For simplicity, we do not include the syntax for comments or for documentation, because these can be designed separately; in general, we assume usual many-line comments surrounded with /*
and */
, and single-line comments starting with //
, and allow them to appear anywhere in a definition; also, documentation is added by using /*@
(for main text) or /*!
(for preamble) instead of /*
, and similarly //@
or //!
instead of //
. A parser for the outer syntax would put all the K terms into "bubbles". In what follows, we adopt the K notation for defining syntax, extended with a few constructs which we may adopt in K, too, such as Sort1 | Sort2
for on-the-fly disjunction of two sorts, List{Sort}{separator}
for non-empty separator
-separated lists of elements of sort Sort
, etc.:
syntax String ::= usual double-quoted strings
syntax NAME ::= any sequences of characters; context may further restrict them
(e.g., variables/sorts start with capital letter, etc.); we are well aware
that is is just too little said to get us any close to an actual parser,
but we prefer to keep it this way for now, so we can focus on naming conventions
(for modules, for files, for labels, etc.) later.
syntax K // The outer syntax parser does not care about the concrete or abstract syntax;
// all syntactic terms are parsed as K "bubbles" at this stage
syntax Definition ::= Requires Modules
syntax Requires ::= List{Require}{""}{""}
syntax Require ::= "require" String
syntax Modules ::= List{Module}{""}{""}
syntax Module ::= "module" NAME Imports Sentences "endmodule"
syntax Imports ::= List{Import}{""}{""}
syntax Import ::= "import" NAME
syntax Sentences ::= List{Sentence}{""}{""}
syntax Sentence ::= Configuration | Syntax | Context | Rule
syntax Configuration ::= "configuration" K OptionalAttributes
// cells/configurations represented as K terms
syntax Syntax ::= "syntax" Sort OptionalAttributes // sort declarations, possibly with attributes
| "syntax" Sort "::=" List{Production}{"|"} // non-empty list
| "syntax" Sort "::=" SpecialProduction OptionalAttributes
// at most one allowed for a Sort
// We should also keep the operator priority declarations in KAST; I'll
// add those as soon as we are done with the first round of the design.
syntax Sort ::= NAME
syntax Production ::= SymbolsOrToken OptionalAttributes
syntax SymbolsOrToken ::= Symbols | Token
syntax Symbols ::= List{Symbol}{""} // non-empty list
syntax Symbol ::= String | Sort
syntax Token ::= "Token" "{" String "}" // with default follows restrictions
| "Token" "{" String "}" "{" String "}" // with custom follows restrictions
syntax SpecialProduction ::= List
syntax List ::= "List" "{" Sort "}" "{" String "}" "{" String "}"
// like for Token, we may add shortcuts for lists with default unit (.Sorts)
// and/or default separator ('_,_); but this is not urgent, it can wait.
syntax Context ::= "context" K OptionalAttributes
syntax Rule ::= "rule" OptionalName K OptionalAttributes
syntax OptionalName ::= "[" NAME "]" ":" // has priority over user-defined syntax
syntax OptionalAttributes ::= "" | "[" Attributes "]"
syntax Attributes ::= List{Attribute}{","} // (non-empty list) Not sure about this, e.g., Latex macros, etc. Still open.
syntax Attribute ::= NAME AttributesOrNot
syntax AttributesOrNot ::= "" | "(" Attributes ")"
The K terms marked as "bubbles" by the outer KAST parser need to be further parsed with specialized parsers (e.g., the KORE parser, or the parser for Full K, etc.; see next). Note that each such K term bubble appears in a construct that also allows optional attributes. One of those attributes can, for example, state the specific parser to be used to parse the corresponding K term bubble (in case the default one, probably the one for Full K, generates ambiguities).
- Brandon: We discussed attributes quite a bit. The most relevant conclusion is that because the basic parser is already implemented and works, we can implement and test KAST for most serialization purposes without needing to redesign the syntax. Second, we all like the idea of doing more structured parsing eventually. Dwight is currently cleaning up attribute-handling code, and will tell us next week what kind of parsing attributes currently do, so we'll have a more grounded idea. Third, we think Latex and everything can be handled just nicely just by adding
syntax Attribute ::= String | MarkdownString
to the above, so unstructured leaf nodes can be written as doublequoted or backticked strings. However, this will certainly require fixing all latex attributes, if only to add two backticks, and Dwight will have more information next week so we're not quite ready to propose this.
####Important Note
The GeneriK parser should be as simple and efficient as possible. It should not attempt to parse any K terms at all (except for taking into account the K brackets, explained below) and it should give very nice error messages. The error messages will be, of course, language-independent; that's because the syntax of the defined language will not be known to the GeneriK parser. Any other subsequent parsers for K will only be concerned with parsing the K term bubbles (possibly in parallel).
####Important Restriction
The users therefore add their custom names (modules, sorts, terminals, etc.) by two basic means: NAME
s and String
s. In order to keep parsing efficient and error messages easy to understand, for the time being we are going to reserve one character to K: the bactick `
. The backtick will be used by K for grouping, e.g., `1+2 => 3` ~> HOLE*4 ~> ...
, and K's grouping must unambiguously work regardless of the particular syntax of the defined language. It is still possible to use backtick in your language syntax, but each use of it must be escaped with a \
in the K definitions. For example, abc\`def
is how to refer to the name abc`def
in one's definition, abc\\`def
is how to refer to abc\`def
, etc., and similarly for backticks appearing in strings. There are ways to avoid escaping the backtick (see the Kuestions below), which will probably be eventually supported if there is sufficient user pressure, but for now let us simply assume that the backtick is reserved to K.
- Brandon: Didn't multiple backticks replace every other way of escaping backticks?
#KORE
KORE adds to GeneriK a more specific syntax for K terms, but one which is still very easy to parse (without requiring 2 phase parsing). Specifically, it adds to GeneriK the following: KAST syntax; support for variables, sort memberships and cells; the rewrite relation _=>_
; support for brackets `_`
(i.e., K's way of grouping); and the HOLE. We are likely going to use the K brackets (grouping) more than needed in what follows; hopefully the new parser will be smart enough to not need all these explicit grouping, but we prefer to overuse the K brackets here because we want to thoroughly illustrate how they work.
##KAST
KAST, which is an abbreviation for K Abstract Syntax Tree, is meant to represent any ground computation (or K) terms, including ones corresponding to programs in AST format. Here is the syntax of KAST:
syntax KLabel ::= NAME // only constants; see below for KLabels corresponding to concrete syntax
syntax KToken ::= NAME // basic tokens; see below for KTokens corresponding to concrete syntax
| #klabel(KLabel) // allows to regard a KLabel as a token, to pass around
syntax KItem ::= KToken | KLabel "(" KList ")" // generic
syntax KList ::= List{K}{","}{".KList"} // generic
syntax K ::= List{KItem}{"~>"}{".K"} // generic
The biggest change is that atomic values like tokens are now directly subsorted into KItem
, rather than into KLabel
and then applied to an empty list like it used to be. There are only constant KLabel
s and KToken
s, and there is no other way to add more of these except by adding them as constants. Some of the KLabel
s can be tagged with the attribute function
, but that changes nothing in what regards KORE parsing. Since rewrite rules will only be allowed to rewrite K
terms to K
terms (see below), functions are implicitly only allowed to take KList
s and return K
terms.
- Brandon: this grammar should describe the heavyweight forms of labels and tokens
Recall that the full K allows concrete syntax productions and rules of the form:
syntax Int ::= Token{"[1-9][0-9]*"}
syntax Exp ::= Int
| Exp "+" Exp
syntax Ints ::= List{Int}{","}{".Ints"}
syntax Exps ::= List{Exp}{","}{".Exps"}
rule 0 + E => E
rule E + 0 => E
but that all of these eventually yield KLabel
s, which then can be used to write semantic rules. What are the names of these special KLabel
s? Any names would work, provided that they contain the sequence of characters that the KToken
or the KLabel
comprises, as well as the sort/type information. For uniformity and for conceptual back-compatibility with the Full K concrete parser, we propose the following syntax for both KLabel
s and KToken
s: NAME "::" Type
, where we will refer to NAME
as the root of the label (escape mechanisms will be needed to deal with whitespaces, double colons, etc.) and where Type
is either Sort
(for tokens) or Sort1*Sort2*....*Sortn->Sort
(with no spaces) for labels. Additionally, for each label label::Sort1*Sort2*...*Sortn->Sort
we also automatically add its root label label
to KLabel
s, which for all rewriting purposes can be regarded as the top-lattice variant of the corresponding operation, that is, label:K*K*...*K->KItem
; conceputally, it stands for all the overloaded operations with the same root. For example, the above would generate the following KLabel
s and rules:
syntax KLabel ::= "_+_::Exps*Exps->Exps"
| "_+_"
| "_,_::Ints*Ints->Ints"
| "_,_"
| ".Ints::->Ints"
| ".Ints"
| "_,_::Exps*Exps->Exps"
| ".Exps::->Exps"
| ".Exps"
rule _+_::Exps*Exps->Exps(0::Int, E:K) => E:K
rule `_+_::Exps*Exps->Exps`(E:K, `0::Int`) => E:K // to avoid ambiguities we can use K's brackets (described below)
These rules can also be written as (see the discussion below):
rule _+_(0::Int, E:K) => E:K
rule `_+_`(E:K, `0::Int`) => E:K
The "::" notation is consistent with the static cast convention in Full K.
There is a subtlety regarding overloaded KLabel
s, which in fact determined the decision to add the root labels as well, but that is more a concern of the backend rewrite engines. Suppose that you want to define a function length
for lists of expressions, here using concrete syntax:
syntax Int ::= length(Exps) [function]
rule length(.Exps) => 0
rule length(E:Exp,Es:Exps) => 1 + length(Es)
The frontend will need to be smart enough to add a common KLabel
for the unit of the overloaded list, say ._,_
, and add the klabel(._,_)
attribute to both .Exps
and .Ints
, and then translate the above into the following KORE (provided that we continue to work with cons-lists; things would be different with associative lists):
syntax Int ::= length(Exps) [function]
rule `length`(`._,_`(.KList)) => `0::Int` // using K's brackets (backticks) for clarity (explained below)
rule `length`(`_,_`(E:Exp,Es:Exps)) => `_+_`(`1::Int`, `length`(Es))
Since the rewrite engine calculates the least sort of a term anyway, the above rules would safely work, because _,_
without sorting information is the same as _,_::K*K->K
.
##Variables and Sort Memberships
Adding support for dynamic sorting/casting and variables:
syntax Variable ::= NAME // possibly restricted to start with capital letter
syntax KLabel ::= Variable ":" KLabel
syntax KItem ::= Variable
| KItem ":" Sort // sorting arbitrary KItems; Sort can also be
// K, KItem, KToken, but not KList
In KORE, we want all the variable occurrences to be sorted. Although our syntax above allows for unsorted variables to keep the number of productions small, the KORE parser will reject definitions where variables appear unsorted. Also, the same variable should have the same sort in all occurrences in the same sentence.
Note that the syntax above suffices to declare variables of any of the sorts KItem
, KList
and K
! This may look wrong or tricky at first sight, because one would expect a term/variable sorted as a KList
to be a KList
term and not a KItem
, but it is in fact correct; that's because all those sorts include KItem
. For example, the term L:KLabel(KL1:KList,K:KItem,Kl2:KList)
is a well-formed KItem
term. It is likely that a backend stores and handles Kl:KList
as a term of sort KList
, but that is an orthogonal issue; our aim here is to devise a simple syntax for KORE, independently from how backends may store or handle it.
##Cells and Configurations
In the long run we want cells and configurations to be embedded into K the same way we decided to embed the builtins (integers, maps, etc.). Probably <k> T </k>
will become something of the form <k>_</k>(T)
, or to avoid ambiguities, `<k>_<.k>`(T)
. However, for now we decided to make a notational exception and allow cells to be written in an XML-ish notation in KORE (this is orthogonal to how cells are represented in KIL):
syntax K ::= Cells
syntax Cells ::= List{Cell}{""} // non-empty list
syntax Cell ::= "<" NAME OptionalTags ">" K "</" NAME ">" // the two NAMEs should be the same
syntax OptionalTags ::= List{NAME "=" String}{""}{""}
- Brandon: We need to allow rewrites and bracketing as entries in a Cells
##Rewriting and HOLE
For now, we believe that allowing rewrites only between terms of sort K should be sufficient:
syntax K ::= K "=>" K // should be greedy
| "`" K "`" // bracket in K, used for grouping; backticks can be multiplied (discussed shortly)
| "HOLE" // further restricted to only appear in certain contexts
| K "requires" K
| K 'ensures" K
A rewrite arrow =>
, or a requires
, or an ensures
construct is not allowed to appear as a descendant of any other rewrite arrow, or requires
or ensures
construct. Any rule must contain one or more rewrite arrow constructs =>
, and the requires
or ensures
constructs can only appear at the top of the K term in the rule (in any order and with any multiplicities). For example, the following is a correct rule at this stage (additional checks may reject it for other reasons):
rule L1(KL1,`K1 => K1'`, KL2, L2(K2 => K2')) requires K3 ensures K4 requires K5 ensures K6
The ensures
construct is typically useful to state constraints over fresh variables introduced by the right-hand-side of the rule.
Note that we do not want to add parentheses as a bracket in KORE, because that can be in conflict with parenthesis constructs in the user defined syntax; we were not able to handle the syntax of Scheme, for example, before.
- Brandon: HOLE should probably be a KItem like other variables, putting it here disallows freezing one item out of a KSequence as in
label(HOLE~>Rest)
To avoid conflicts with user syntax and to allow nested brackets, we plan to adopt the MarkDown code convention for brackets. In short, that
says that you can use arbitrarily many backticks as open bracket and precisely the same number of backticks as close bracket, and the nested brackets should use fewer ticks. For example, `` `_+_`(1,2) => 3 ``
represents a top bracket enclosing a rewrite rule whose LHS is the the label _+_
applied to 1 and 2 and whose RHS is 3.
Approximating the convention with context free productions, K brackets include the following
syntax K ::= "`" K "`" | "``" K "``" | "```" K "```" | ....
The rationale for adopting MarkDown's convention is twofold:
-
It is a widely used approach to mark bubbles of text within text, designed by experts in this area with millions of users in mind. There are not too many complaints posted on the internet about Markdown's backtick convention, so it is probably quite good. We need precisely the same characteristics for our brackets/bubbles in K, so why reinvent the wheel? Our best shot here is to take over MarkDown's convention as is, without changing anything in it, because any change we make can have unexpected consequences. Think about it this way: is there anything that MarkDown tried to achieve with its bubbles that we would't need? I cannot think of anything. We want a bit more than MarkDown, though, namely to have nested brackets (i.e., bubbles in bubbles), but that does not seem to be a problem because we can just apply the same mechanism inside the bubbles.
-
It only reserves one character, the backtick, so users will only need one thing to worry about (escaping that one character only); this is obviously a consequence of the overall good design of MarkDown.
The GeneriK parser will consider brackets when parsing a K bubble. This allows the object language syntax to contain keywords like rule
which would otherwise end the bubble, by enclosing them in a bracket.
GeneriK scans the text immediately following the current GeneriK keyword until the next GeneriK keyword is encountered, skipping all the brackets. The beginning and the end of brackets are very easy to identify because they consist of the same number of backquotes; since nested brackets should contain fewer backquotes, the text inside the bracket can be very effectively skipped.
#Example: IMP in Kore
Below is IMP defined in the KORE language above, including all the builtins that it needs. For simplicity, it uses the root-only syntax for KLabel
s, namely _+_
instead of _+_::Exp*Exp->Exp
.
###IMP Syntax and Builtins
We can define the syntax either using sorts and productions, which are easier to read and hold more information that may be useful in some backends, or using only KLabels, which are more minimal but also lose sort information.
Here is the syntax using sorts and productions. This is almost identical to the IMP syntax in full K, except for purely parsing-related features, such as bracket productions, operation priorities, prefer and avoid attributes, etc.:
module BUILTINS
syntax Int [builtin(Integer)]
syntax Bool [builtin(Boolean)]
syntax Id [builtin(Identifier)]
syntax Int ::= Int "/Int" Int [builtin(IntegerDivision)]
| Int "+Int" Int [builtin(IntegerAddition)]
syntax Bool ::= Int "<=Int" Int [builtin(IntegerLeq)]
| Int "=/=Int" Int [builtin(IntegerNeq)]
| "notBool" Bool [builtin(BoolNot)]
syntax Map ::= ".Map" [builtin(MapEmpty)]
| lookup(Map) [builtin(MapLookup)]
| update(Map,K,K) [builtin(MapUpdate)]
| addBinding(Map,K,K) [builtin(MapAddBinding)]
syntax Bool ::= K "in" Map [builtin(MapMembership)]
syntax K ::= keys(Map) [builtin(MapGetKeys)]
endmodule
module IMP-SYNTAX
imports BUILTINS
syntax AExp ::= Int | Id
| AExp "/" AExp [strict]
| AExp "+" AExp [strict]
syntax BExp ::= Bool
| AExp "<=" AExp [seqstrict]
| "!" BExp [strict]
| BExp "&&" BExp [strict(1)]
syntax Block ::= "{" "}"
| "{" Stmt "}"
syntax Stmt ::= Block
| Id "=" AExp ";" [strict(2)]
| "if" "(" BExp ")" Block "else" Block [strict(1)]
| "while" "(" BExp ")" Block
| Stmt Stmt
syntax Pgm ::= "int" Ids ";" Stmt
syntax Ids ::= List{Id}{","}{".Ids"}
endmodule
Here is the KLabel
alternative of the syntax above, which may be preferred in some cases:
module BUILTINS
syntax Int [builtin(Integer)]
syntax Bool [builtin(Boolean)]
syntax Id [builtin(Identifier)]
// Should we also preserve the subsorting relation, just in case the rewrite engine needs to calculate // the least sort of a term dynamically? It would be nice to not have to store that, if possible,
syntax KLabel ::= "_/Int_" [builtin(IntegerDivision)]
| "_+Int_" [builtin(IntegerAddition)]
| "_<=Int_" [builtin(IntegerLeq)]
| "_=/=Int_" [builtin(IntegerNeq)]
| "notBool_" [builtin(BoolNot)]
| ".Map" [builtin(MapEmpty)] // we may want to make this a token
| "lookup_" [builtin(MapLookup)]
| "update_" [builtin(MapUpdate)]
| "addBinding_" [builtin(MapAddBinding)]
| "_in_" [builtin(MapMembership)]
| "keys_" [builtin(MapGetKeys)]
endmodule
module IMP-SYNTAX
imports BUILTINS
syntax KLabel ::= "_/_" [strict]
| "_+_" [strict]
| "_<=_" [seqstrict]
| "!_" [strict]
| "_&&_" [strict(1)]
| "{}"
| "{_}"
| "_=_;" [strict(2)]
| "if(_)then_" [strict(1)]
| "while(_)_"
| "__"
| "int_;_"
| "_,_" [assoc] // not sure: we may want an attribute to allow label take arbitrarily many arguments
| ".Ids" // we may want to make this a token
endmodule
###IMP Semantics
Recall that sorted variables belong to the syntactic category KItem
, which only means that they can appear anywhere a KItem
variable can appear, that is, anywhere except as a KLabel
, and that unsorted variables belong to either the KLabel
or the KItem
syntactic categories; this yields no ambiguity, because the context quickly tells which is which. The internal representation of the K definition can and probably should, of course, change the sort of each variable to whatever it wants. Our objective here was to have a very simple and efficient KORE syntax and parser. For clarity, we backtick all the KLabel
s and KToken
s, although a good parser should probably not need all these K brackets, hopefully even none:
module IMP
imports IMP-SYNTAX
rule isKResult(I:Int) => `true::Bool`
rule isKResult(B:Bool) => `true::Bool`
configuration <T>
<k> $PGM:Pgm </k> // or $PGM:K if one prefers to pass other syntactic categories (e.g., Stmt, AExp, etc)
<state> `.Map`(.KList) </state> // or a token, if we decide to make .Map a token
</T>
rule <k> X:Id => `lookup`(Sigma,X:Id) ...</k> <state> Sigma </state>
rule `_/_`(I1:Int,I2:Int) => `_/Int_`(I1:Int,I2:Int) when `_=/=Int_`(I2:Int,`0::Int`)
rule `_+_`(I1:Int,I2:Int) => `_+_`(I1:Int,I2:Int)
rule `_<=_`(I1:Int,I2:Int) => `_<=_`(I1:Int,I2:Int)
rule `!_`(T:Bool) => `notBool_`(T:Bool)
rule `_&&_`(`true::Bool`),B:BExp) => B:BExp
rule `_&&_`(`false::Bool`,_) => `false::Bool`
rule `{}`(.KList) => .K [structural]
rule `{_}`(S:Stmt) => S:Stmt [structural]
rule <k> `_=_;`(X:Id,I:Int) => .K ...</k> <state> Sigma => `update`(Sigma,X:Id,I:Int) </state>
rule `__`(S1:K,S2:K) => S1:K ~> S2:K [structural]
rule `if(_)_else_`(`true::Bool`,S:Stmt,_) => S:Stmt
rule `if(_)_else_`(`false::Bool`,Bool),_,S:Stmt) => S:Stmt
rule `while(_)_`(B:BExp,S:Stmt) => `if(_)_else_`(B:BExp, `{_}`(`__`(S:Stmt, `while(_)_`(B:BExp,S:Stmt))), `{}`(.KList)) [structural]
rule <k> `int_;_`((`_,_`(X:Id, Xs:Ids) => Xs:Ids), _) </k>
<state> Sigma:Map => `addBinding`(Sigma:Map, X:Id, `0::Int`) </state>
when `notBool`(`_in_`(X:Id, `keys`(Sigma:Map)))
rule `int_;_`(`.Ids`(.KList),S:Stmt) => S:Stmt [structural]
endmodule
#Kuestions, Koncerns, etc.
Below we compiled a list of questions that were directly or indirectly asked at various moments. Please feel free to ask more questions, even if you are not sure what you are asking precisely :) The questions help us all nail down the corner cases and perhaps even come up with something better, if needed.
###Question In case we decide to disallow the unescaped use of the special backquote symbol in user syntax, wouldn't it make sense to implement a simple bracket checker, to ensure that brackets are properly nested?
Answer: This would be very useful and easy to implement (linear?). Users would be given insightful error messages very quickly. Moreover, the checker can put the (nested) bracket contents in special data-structures to make the GeneriK and the inner parsers more efficient.
###Question
How can we use the new K bracket to tell the GeneriK parser that the entire contents of a rule should be considered as a bubble, and thus prevent it from matching rule
or syntax
or other of its keywords in that rule's contents?
Answer: Suppose that your rule is rule contents [attributes]
and that contents
happens to have no K brackets in it; then you can simply write the rule as rule `contents` [attributes]
. Now suppose that contents
happens to have some subterms containing brackets with both one and two backticks, say `k1`
and ``k2``
; then just use three backticks as a bracket for contents
in the rule: rule ```contents``` [attribute]
. And so on and so forth. The point is that once we have so easy to parse brackets in K, there is no need to have heredocs or other similar mechanisms to mark the rule contents.
###Question Aren't we now asking too much from the GeneriK parser? It now has to know how to parse MarkDown bubbles, which is kind of non-trivial.
Answer: Not really. Very little needs to change in the GenerK parser compared to, say, our previously discussed approach based on heredocs. First, note that you can obtain precisely the same effect as that obtained with heredocs by adding a bracket with enough backticks around the contents of the rule, in the same style as we did in the answer of the question above. Since MarkDown enforces only brackets with fewer backticks inside brackets with more backticks, scanning the contents of the rule is equally easy now to how it was with heredocs, and easier than how it would have been with ordinary nested parentheses. So nothing is lost in terms of capabilities, on the contrary.
One may argue that something is lost in terms of user-experience in the presence of errors, specifically that Generik errors may be now harder to detect. Consider, for example, a definition snippet of the form
syntax Exp ::= rule(Exp)
rule `_+_`(1,rule(2)) => 3 // rule appears unbracketed in K term
rule `_+_`(1,`rule(2)) => 3 // the close bracket forgotten in `rule(2)
rule `_+_`(1,"rule(2)") => 3 // rule appears in a string in K term
Each of the three rules above would yield admittedly horrible GeneriK parsing errors. However, with the capability that we can pretty-print the current definition at each step, one should be able to see what happened. Users will just have to learn that if they use GeneriK keywords in they syntax, then they just have to properly bracket the rules that use those keywords. The same happens if you forget to put the heredocs around such rules, so there isn't really much of a difference in terms of user-experience.
###Question The described approach of how GeneriK deals with brackets seems to work well only under the assumption that users cannot use unescaped backquotes in their syntax. Would it extent easily if we later on decide to allow users to use bacquotes in their syntax?
Answer:
We think so, but it of course depends on how we decide to handle the user backquotes, and what expectations we want our users to have (a user study may be needed). For example, we may want writers and especially readers of K definitions to have to worry only about their own language inside of bubbles (and the K constructs added to let them write rules). The GeneriK parser cannot always recognize brackets in precisely the same place as the inner parser (because it must not depend on the object language, so a token like "ab`cd"
which the inner parser treats as a unit may look like it opens or closes a bubble to the GeneriK parser). The reader may only want to look at the first character of the rule to see whether they may need to keep in mind the GeneriK parser's weaker lexical idea of brackets, in addition to the object language grammar. In that case, if the contents of the bubble begins with a backquote, the GeneriK parser finds the end of the bubble by parsing a sequence of K brackets and unbracketed text until it finds a GeneriK keyword outside of a bracket. If the bubble does not begin with a backquote, then the GeneriK parser doesn't try to find brackets starting later, and just ends the bubble wherever it appears a GeneriK keyword appears (which may even be inside a user-defined token like "a rule b"
, because it doesn't know the user syntax).
###Question
It is nice that with one reserved character, the backtick `
, we solved both the problem of brackets in K and the problem of bubbles in GeneriK. But does that mean that users cannot use the backtick in their syntax anymore? What if I want to have backtick-enclosed strings in my language? Wouldn't I run into a nasty syntactic conflict with K's brackets, which I wouldn't have with heredocs?
Answer: Yes, you may run into parsing conflicts if you use the backtick in your language's syntax, but note that the same would happen no matter what brackets you use for K! And we do need brackets in K. So this is a problem related to brackets, and not to heredocs. That is, by adopting heredocs for expressing rule bubbles would only solve the problem of rule contents possibly containing restricted keywords/characters for the GeneriK parser, but not in general for K, not even for KORE. We had exactly that problem previously with the K ::= "(" K ")"
brackets, which conflicted very badly with user-defined parentheses, especially when they were not defined as brackets in the user-defined syntax. This is a common problem with string-like or bracket-like notations. For example, if you want a string to contain the string delimiter as a character, then that character needs to be escaped (e.g., "\""
is the string containing only one character, "
). Therefore, we decided for the time being to adopt a similar escaping convention, and thus require users to escape all the backticks that are part of their syntax. For example, they would have to always write \`abc\`
for their backtick-enclosed strings. Ugly, but not completely unexpected. After all it is only one character that K asks from its users for itself, which is not that much to remember.
Nevertheless, some users will likely complain with the escaping solution above, because they want to see their syntax unaffected by artifacts of K. To keep such users happy, we believe that it would be nice to allow for local renaming of syntax in K, something like this:
...
rename K_BRACKET to $
...
rule $_+_$($`my_`::MyString$, $`string`::MyString$) => $`my_string`$ // exaggerating the use brackets on purpose
...
rename K_BRACKET to `
...
Regardless of what solutions we pick, I want to reemphasize that this problem is a problem of brackets and not of trying to replace the heredocs with something else. Having heredocs would not help at all this problem, because all the conflicts above would happen inside the heredoc anyway. And once we solve the problem for brackets, which we have to do anyway, we get a solution for our heredoc problem for free.
###Question
The bracket does not really solve the problem of KLabel
s, because `_+_`
can also parse as +
applied to two anonymous variables using the concrete syntax of the language, so the parser for Full K may report an ambiguity anyway in terms like `_+_`(1,2)
.
Answer: This is not a problem of KORE. It is a problem that the parser for Full K needs to deal with. A simple solution there is to have the Full K parser consider the KLabel
and KToken
constants to have the prefer
attribute. That means that if constants which can be KLabel
s or KToken
s are involved in any ambiguity, then we prefer them to be KLabel
s or KToken
s and not something else. The rationale behind this decision is that users are given other means to state their defined operations in Full K, for example using whitespaces or their explicit KLabel
representation.
If the prefer
solution above is adopted, then the number of bacticks appearing in rules will be significantly reduced anyway, because in that case there is only one way to parse _+_(1,2)
, the desired one.
###Question
Hold on, does that mean that if I have an operation in prefix notation whose KLabel
has the same name, then I will never get a chance to refer to my operation because the KLabel
takes over? For example:
syntax KLabel ::= "f"
syntax Exp ::= f(Exp) [klabel(f)]
rule f(3) => 0 // I want f::Exp->Exp here and not f::KLabel!
Answer: This is again a problem with Full K and not with KORE, but since KORE will be smoothly extended by Full K, and so everything working in KORE should work unchanged when we regard it as a Full K definition, we address this issue here, too. Yes, you are right, there does not seem to be any way to refer to your original f::Exp->Exp
with this convention. Moreover, you have a parsing ambiguity on f(3)
.
We can think of two ways to deal with this problem, which are open for further discussions.
One approach is that the Full K parser should not report ambiguities in which both parses yield the same AST (or KIL term); if there is no ambiguity anyway, why would you care whether f
parses as your original operation or as its label? Another approach is to regard the user productions as explicit mechanisms for making syntax that looks like an application, so maybe we can let those productions take priority over label application, especially because you can always write `f`(Args)
or maybe even ``f``(Args)
, if there's an explicit production f(Exps)
that prevents f(Args)
from being a label application (and perhaps the production f(Exps)
is explicitly assigned a different KLabel
- maybe fExp
to separate it from a similar-looking production in another sort).
BTW, if f
is used on a different number of arguments than declared, for example
syntax KLabel ::= "f"
syntax Exp ::= f(Exp) [klabel(f)]
rule f(3,4) => 0 // I want f::Exp->Exp here and not f::KLabel!
then an error should be reported, regardless of the fact that we currently allow KLabel
s to apply to as many arguments as they want to. Some people think that we should disallow that in K anyway, that is, that we should enforce each label to declare how many arguments it takes. But that is another discussion.
###Question
What would some other reasonable possibilities for representing basic (i.e., ones generated from user-defined concrete syntax productions) KLabel
s and KToken
s be? The concern here is that maybe the backticked notation gets in some unexpected conflicts with the user-defined syntax and the users have no escape mechanism, no matter how heavy that would be.
Answer: We are contemplating the possibility to allow both the lightweight backticked notation and a more heavyweight syntax. Exact lexical details still open, but something like this:
Lightweight | Heavyweight
------------------------------------------------------
`_+_::Exp*Exp->Exp` | #label("Exp*Exp->Exp","_+_")
`12::Int` | #token("Int","12")
Since we allow multi-backtick brackets now, the heavy notation may not be necessary after all. But we keep the posibility open.
Note a lightweight label and token can be distinguished by the arrow even if the label doesn't take arguments: pi::->Float
is a label, 12.0::Float
is a token.
###Question: Should there be a .Cells for empty cells? Should we allow rewrites to be written aside cells, as in
<k>'spawn(P)~>_</k> (.K => <thread><k>P</k></thread>)
(or maybe (.Cells => <thread><k>P</k></thread>)
Answer:
Not sure about this. We would like cells to be just K terms in the long run, in which case we would prefer .K
(which we will rename to .
probably) instead of .Cells
.
###Question:
Do we need a more elaborate syntax for declaring configurations and cells, or we can limit ourselves to KItem
s? How about modularity, where we want to name and reuse configuration fragments?
Answer: I think this can be addressed separately once we are done with the first version of KORE. That is, we can keep configurations and cells as are for now, using the XML notation which is easy to parse, and we can worry about the general case later. The general solution could, indeed, be to turn cell and configuration constructs into special KLabel
s, say ones with an attribute configuration
. For example:
syntax KLabel ::= "'<T>_</T>" [configuration]
syntax KLabel ::= "'<k>_</k>" [configuration]
Then the rewrite engine / backend should recognize these special labels and interpret them accordingly, for example by applying the configuration concretizers to them.
Anyway, we still have to discuss and design the modular configurations.
###Question:
If we reduce configurations and cells to ordinary KLabel
s like above, then don't we need to allow rewrite rules to have the syntax KList ::= KList "=>" KList
in order to allow rewrites that take sets of cells to sets of cells?
Answer: Hm, maybe. We have to think about this some more.
###Question: Shouldn't we keep all the syntax declarations in KORE like in K, so that we can use the priority, prefer, bracket, etc., information for unparsing?
Answer: Probably yes.
###Question
Don't we need static cast KItem "::" Sort
in KORE?
Answer: Doesn't seem so. The syntactic casts are useful in the frontend to tell the concrete syntax parser which KLabel
to generate for the parsed concrete syntax, but once all the KLabel
s are in place we don't need the syntactic casts anymore.
###Question:
Rules of the form rule k1 => k2[X]
turned out to be a source of problems until now, because [X]
is interpreted as the optional attribute of the rule and sometimes you want it to be interpreted as concrete language syntax, for example and array index.
Answer: Yes, that is a problem, but a problem of the frontend and not of KORE. The current solution in the front end is to use parentheses: rule (k1 => k2[X])
. But we should at the same time improve the frontend to give good error messages. And we should allow the GeneriK parser to show its output to the user, so one would immediately know when [X]
is parsed as an attribute.
###Question:
How about rules between KLabel
terms, as we can do with our current concrete parser, for example:
syntax KLabel ::= "foo" | "bar" | "baz"
rule foo => bar
rule Context[(bar => baz)(kl)] // kl is some KList term, and Context some K holed context
Answer: We found that in all cases we looked at we can complete/change the rules to apply between K terms:
rule foo(KL:KList) => bar(KL:KList)
rule Context[bar(kl) => baz(kl)]
We need to develop such automatic translators as part of the frontend.
###Question:
So now we can wrap a KLabel
as a KToken
and then pass it around like any KItem
. But the whole point of KLabel
s is that they can be applied to KList
s. How can you apply a wrapped KLabel
, say #klabel('_+_)
, to a KList
?
Answer:
We can easily define a KLabel
, say #kapply
, together with a rule
rule #kapply(#klabel(L:KLabel), Kl:KList) => L:KLabel(Kl:KList)
which can now be used to apply KLabel
s represented as KToken
s to KList
s. For example, if you want to apply the wrapped label '_+_
to the list 1,2
, then you can write #kapply(#klabel('_+_),1,2)
. However, as explained below in answers to other questions, you may want to also wrap the KList
s that you want to pass around with a KLabel
, say #klist
. In that case, you would write the rule above as
rule #kapply(#klabel(L:KLabel), #klist(Kl:KList)) => L:KLabel(Kl:KList)
and the particular example as #kapply(#klabel('_+_),#klist(1,2))
###Question:
Can we support complex constructors for KLabel
s in KORE, like we can do in Full K? For example:
syntax KLabel ::= foo(KLabel)
rule Context[foo(L:KLabel)(KL:KList)]
syntax KLabel ::= bar(KList)
rule Context[bar(KL1:KList)(KL2:KList)]
Answer: No problem. One way to do it is to regard them as ordinary KLabel
s that yield KItem
s meant to be (wrapped) KLabel
s, and then use #kapply
(and #klist
) as we did above. For example:
syntax KLabel ::= "foo"
rule Context[#kapply(foo(#klabel(L:KLabel)), #klist(KL:KList))]
syntax KLabel ::= bar(KList)
rule Context[#kapply(bar(#klist(KL1:KList)), #klist(KL2:KList))]
So #kapply
needs not always be used as a "function"; it can also be used as a "constructor". I quote these words because KORE does not provide any specific support for them; they are so only in our mind, from a KORE perspective they are just KLabel
s which are matched and rewritten.
###Question: KORE should support anything we want to do in Full K, such as, for example, "functions" of the form
syntax KList ::= rev(KList)
rule rev(K,KL) => rev(KL),K
rule rev(.KList) => .KList
How can we encode this in KORE?
Answer: Again, same basic principle like above: we add the various operations as KLabel
s, and then we add corresponding rules between corresponding KItem
terms. For some of them we also need to get some special/additional labels involved, like #klist
, #kapply
, etc. For example:
syntax KLabel ::= "rev"
rule rev(#klist(K:K,KL:KList)) => #klist.snoc(rev(KL:KList),K:K)
rule rev(#klist(.KList)) => #klist(.KList)
where #klist.snoc
(snoc
is the reverse of cons
) is a special KLabel
"function" defined as follows:
rule #klist.snoc(#klist(Kl:KList),K:K) => #klist(Kl:KList,K:K)
Note that we cannot simply make rev
a KLabel
and nothing else, because then we have strange situations like this one that Traian noticed: rev(rev(1,2)) => rev(1,2) => 2,1
. The trick with #klist
achieves the desired innermost evaluation strategy. We will have to find smart ways to automatically generate such KORE rules from the original Full K rules as part of the frontend.
###Question One of the major claims of K is that it offers both the core and the meta-level (in Maude terminology) at the same, core level. If KORE is powerful enough, then it should be able to define generic up and down functions that take a K term and lifts it to its meta-representation (also a K term), and, respectively, take a meta-representation of a K term down to a K term.
Answer: Here it is. The code below is manually adapted, to fit the new KORE, from a K working definition (on March 30, 2014), so there could be typos:
module META_K
syntax KLabel ::= "#kapply" [function]
rule #kapply(#klabel(L:KLabel),#klist(Kl:KList)) => L:KLabel(Kl:KList)
syntax KLabel ::= "#klist.cons" [function] | "#kseq.cons" [function]
rule #klist.cons(K:K,#klist(Kl:KList)) => #klist(K,Kl:Klist) // sorting K:K needed, otherwise default sort would be KList
rule #kseq.cons(K:KItem,#kseq(Kl:KList)) => #kseq(K:KItem,Kl:KList)
syntax KLabel ::= "#ktoken" | "#kapp" | "#klist" | "#kseq" // constructors
syntax KLabel ::= "#upKItem" [function] | "#upK" [function] | "#upKList" [function]
syntax KLabel ::= "#down" [function]
rule #upKItem(T:KToken) => #ktoken(T:KToken)
rule #upKItem(L:KLabel(Kl:KList)) => #kapp(#klabel(L:KLabel),#upKList(#klist(Kl:KList)))
rule #upKList(#klist(K:K,Kl:KList)) => #klist.cons(#upK(K:K),#upKList(#klist(Kl:KList)))
rule #upKList(#klist(.KList)) => #klist(.KList)
rule #upK(K:KItem ~> K':K) => #kseq.cons(#upKItem(K:KItem),#upK(K':K))
rule #upK(.K) => #kseq(.KList)
rule #down(#ktoken(T:KToken)) => T:KToken
rule #down(#kapp(#klabel(L:KLabel),MKl:KItem)) => #kapply(#klabel(L:KLabel),#down(MKl))
rule #down(#klist(.KList)) => #klist(.KList)
rule #down(#klist(K:K,Kl:KLabel)) => #klist.cons(#down(K:K),#down(#klist(Kl:KList)))
rule #down(#kseq(.KList)) => .K
rule #down(#kseq(Ki:KItem,Kl:KList)) => #down(Ki:KItem) ~> #down(#kseq(Kl:KList))
endmodule
###Question:
If we can use KAST for serialization between compiler passes, how does it preserve calculated information on AST nodes that later compiler passes might need?
- Brandon: We agreed that any information can be expressed as attributes, but it's not clear what syntax could work, even if we don't try to provide the term-metadata syntax in the concrete syntax.
We allow the text meant to parse as K to use backticks `
. A backtick tells the GeneriK parser that the next character is part of the K bubble, so it should not treat it as part of any of its kewords. This feature is useful when the defined language has syntax that overlaps with that of GeneriK, and thus may confuse the GeneriK parser with regards to where the bubbles start and where they end. For example, the GeneriK parser should parse the following module
module TRY
syntax Sort ::= "rule" | "syntax" | "configuration"
configuration <T> <k> $PGM </k> </T>
rule <k> (`rule => configu`ration) ~> `syntax ~> rule ...<k> [attribute]
endmodule
as something like
module TRY
syntax Sort ::= "rule" | "syntax" | "configuration"
configuration GeneriK_BUBBLE(<T> <k> $PGM </k> </T>)
rule GeneriK_BUBBLE(<k> (`rule => configu`ration) ~> `syntax ~> )
rule GeneriK_BUBBLE(...<k>) [attribute]
endmodule
and a corresponding error message will be then given by the bubble parser (full K or the KORE). Once one adds a backtick before any of the four characters of the last rule
token, say before the last one, then the module would be parsed by the GeneriK parser as
module TRY
syntax Sort ::= "rule" | "syntax" | "configuration"
configuration GeneriK_BUBBLE(<T> <k> $PGM </k> </T>)
rule GeneriK_BUBBLE(<k> (`rule => configu`ration) ~> `syntax ~> rul`e ...<k>) [attribute]
endmodule
end eventually kompiled correctly. In practice, we will probably put the backtick at the beginning of the token, but it is simpler and more general to allow it anywhere.