Skip to content
This repository has been archived by the owner on Feb 1, 2020. It is now read-only.
grosu edited this page Oct 28, 2014 · 6 revisions

This is an old page; see https://github.com/kframework/k/wiki/KAST-and-KORE.

#Introduction

This proposal is about the syntax of KAST, which is the fragment of K where sentences only use K-style ASTs, that is, they do not use concrete syntax. Recall that K in its full generality requires a two-phase parser: one parsing the syntax declarations and generating a parser for the rules (or contexts, macros, etc.), and the other parsing the rules using the parser generated in phase one. KAST is a fragment of K meant to be very easy to parse, eliminating the need for a two-phase parser. From a user perspective, the difference between K and KAST is that instead of writing terms which need a concrete-syntax parser to parse, such as A + B, one writes terms using the KAST syntax, such as _+_(A,B) (or, to avoid ambiguities, `_+_`(A,B), where `_` is the bracket of K; this is explained below).

The idea is that KAST should be nothing but a textual reflection of the data-structure instance (a Java object) holding the K definition, the same way the KAST of a program is nothing but a textual reflection of the AST data-structure instance (also a Java object) of that program. We are going to have a trivial "loader" of a textual KAST definition, which in principle is a parser, but a very simple one, much simpler and faster than the current K parser.

  • Dorel: Here we speak for the first time of Java classes (implementation). These classes and their relation with KAST is clear for familiar with the current version of K, bur for a beginner it could be obscure. I think that KAST is defined independent of its implementation but having in mind an OO (object-oriented) implementation such that syntactical constructs in KAST are in correspondence 1-1 with the classes from the implementation. It would be interesting to see if we can make transparent this relationship (may be in a different document).

There are LOTS of advantages for doing this. For example, this gives us a convenient separation of concerns, where the complex K parser for concrete syntax is completely separated out from the kompile tool, so both of them can then be designed and improved in isolation. Further, we hope that the syntax of KAST will be simple enough that we will not need to change it as we change the support for parsing in K, so middle-end and back-end developers will not depend on other crazy syntactic tricks that we may apply to the front-end (e.g., sort inferencers, syntactic conventions for builtins like using X |-> I in cells, etc.).

The Outer Syntax

Let us start with the generic syntax of K, that is, the K syntax for defining language syntax and semantics which is independent upon the particular language that is being defined. We refer to this syntax as the outer syntax. For simplicity, we do not include the syntax for comments or for documentation, because these can be designed separately; in general, we assume usual many-line comments surrounded with /* and */, and single-line comments starting with //, and allow them to appear anywhere in a definition; also, documentation is added by using /*@ (for main text) or /*! (for preamble) instead of /*, and similarly //@ or //! instead of //. A parser for the outer syntax would put all the K terms into "bubbles". In what follows, we adopt the K notation for defining syntax, extended with a few constructs which we may adopt in K, too, such as Sort1 | Sort2 for on-the-fly disjunction of two sorts, List{Sort}{separator} for non-empty separator-separated lists of elements of sort Sort, etc.:

syntax String ::= usual double-quoted strings
syntax NAME ::= any sequences of characters; the context may further
                restrict them (e.g., variables/sorts start with capital
                letter, etc.); we are well aware that is is just too
                little said to get us any close to an actual parser,
                but we prefer to keep it this way for now, so we can
                focus on naming conventions (for modules, for files,
                for labels, etc.) later.

syntax K   // The outer syntax parser does not care about the concrete
           // or abstract syntax; all syntactic terms are parsed as
           // K "bubbles" at this stage.

syntax Definition ::= Requires Modules
syntax Requires ::= List{Require}{""}{""}
syntax Require ::= "require" String
syntax Modules ::= List{Module}{""}{""}
syntax Module ::= "module" NAME Imports Sentences "endmodule"
syntax Imports ::= List{Import}{""}{""}
syntax Import ::= "import" NAME
syntax Sentences ::= List{Sentence}{""}{""}
syntax Sentence ::= Configuration | Syntax | Context | Rule

syntax Configuration ::= "configuration" K OptionalAttributes 
                  // cells/configurations represented as K terms
syntax Syntax ::= "syntax" Sort OptionalAttributes
                  // sort declarations, possibly with attributes
                | "syntax" Sort "::=" List{Production}{"|"}
                  // non-empty list
                | "syntax" Sort "::=" SpecialProduction OptionalAttributes
                  // at most one allowed for a Sort
                // We should also keep the operator priority declarations
                // in KAST; I'll add those as soon as we are done with
                // the first round of the design.
syntax Sort ::= NAME
syntax Production ::= SymbolsOrToken OptionalAttributes
syntax SymbolsOrToken ::= Symbols | Token
syntax Symbols ::= List{Symbol}{""}   // non-empty list
syntax Symbol ::= String | Sort
syntax Token ::= "Token" "{" String "}"
                 // with default follows restrictions
               | "Token" "{" String "}" "{" String "}"
                 // with custom follows restrictions
syntax SpecialProduction ::= List
syntax List ::= "List" "{" Sort "}" "{" String "}" "{" String "}"
                // Like for Token, we may add shortcuts for lists
                // with default unit (.Sorts)
                // and/or default separator ('_,_); but this is not
                // urgent, it can wait.

syntax Context ::= "context" K OptionalAttributes
syntax Rule ::= "rule" OptionalName K OptionalAttributes

syntax OptionalName ::= "[" NAME "]" ":"
                // has priority over user-defined syntax

syntax OptionalAttributes ::= "" | "[" Attributes "]"
syntax Attributes ::=  List{Attribute}{","}
                       // (non-empty list) Not sure about this,
                       // e.g., Latex macros, etc.  Still open.
syntax Attribute ::= NAME AttributesOrNot
                   | String
                     // E.g., unstructured Latex macros, etc.
syntax AttributesOrNot ::= "" | "(" Attributes ")"

The K terms marked as "bubbles" by the outer parser need to be further parsed with the inner parser (either KAST or Concrete). Note that each such K term bubble appears in a construct that also allows optional attributes. One of those attributes can, for example, state the specific parser to be used to parse the corresponding K term bubble.

  • Brandon: We discussed attributes quite a bit. The most relevant conclusion is that because the basic parser is already implemented and works, we can implement and test KAST for most serialization purposes without needing to redesign the syntax. Second, we all like the idea of doing more structured parsing eventually. Dwight is currently cleaning up attribute-handling code, and will tell us next week what kind of parsing attributes currently do, so we'll have a more grounded idea. Third, we think Latex and everything can be handled nicely just by adding syntax Attribute ::= String | MarkdownString to the above, so unstructured leaf nodes can be written as doublequoted or backticked strings. However, this will certainly require fixing all latex attributes, if only to add two backticks, and Dwight will have more information next week so we're not quite ready to propose this.

  • Grigore: There is no basic parser anymore :). We should, however, take this opportunity and redesign the outer parser as well, to fit the syntax above, and while doing so, to also refine the syntax until we reach something we like. The fact that we have something implemented should not be used as an argument to not improve it. I added String as an option for Attribute above. However, I don't think that we should allow MarkdownString, because we are going to use the backticks as "the brackets of K", and the brackets are not meant to allow unstructured or unparsable things inside.

####Important Note

The outer parser should be as simple and efficient as possible. It should not attempt to parse any K terms at all (except for taking into account the K brackets, explained below) and it should give very nice error messages. The error messages will be, of course, language-independent; that's because the syntax of the defined language will not be known to the outer parser. The inner parser will only be concerned with parsing the K term bubbles (possibly in parallel).

####Important Restriction

The users therefore add their custom names (modules, sorts, terminals, etc.) by two basic means: NAMEs and Strings. In order to keep parsing efficient and error messages easy to understand, for the time being we are going to reserve one character to K: the bactick `. The backtick will be used by K for grouping, e.g., `1+2 => 3` ~> HOLE*4 ~> ..., and K's grouping must unambiguously work regardless of the particular syntax of the defined language. It is still possible to use backtick in your language syntax, but each use of it must be escaped with a \ in the K definitions. For example, abc\`def is how to refer to the name abc`def in one's definition, abc\\`def is how to refer to abc\`def, etc., and similarly for backticks appearing in strings. There are ways to avoid escaping the backtick (see the Kuestions below), which will probably be eventually supported if there is sufficient user pressure, but for now let us simply assume that the backtick is reserved to K.

  • Brandon: Didn't multiple backticks replace every other way of escaping backticks?

  • Grigore: No. We use the backticks as possibly nested brackets and we do not want to worry about dealing with nested backticks which are not brackets. Once this becomes clear, I will make it into a Kuestion. See also the existing Kuestions and let me know if they still don't clarify this issue.

#Inner KAST Syntax

KAST adds to the outer syntax a more specific syntax for K terms, but one which is still very easy to parse (without requiring 2 phase parsing). Specifically, it adds the following: the abstract syntax of K terms; support for variables, sort memberships and cells; the rewrite relation _=>_; support for brackets `_` (i.e., K's way of grouping); and the HOLE. We are likely going to use the K brackets (grouping) more than needed in what follows; hopefully the new parser will be smart enough to not need all these explicit grouping, but we prefer to overuse the K brackets here because we want to thoroughly illustrate how they work.

Abstract Syntax of K Ground Terms

Here is the abstract syntax of K ground terms, which can, in particular, be used to represent ASTs of programs:

syntax KLabel ::= NAME
                  // only constants; see below for details on KLabel
                  // names corresponding to concrete syntax.
syntax KToken ::= NAME
                  // basic tokens; see below for details on KToken
                  // names corresponding to concrete syntax.
                | #klabel(KLabel)
                  // allows to regard KLabels as tokens, to pass around.
syntax KItem ::= KToken  |  KLabel "(" KList ")"  // generic
syntax KList ::= List{K}{","}{".KList"}           // generic
syntax K ::= List{KItem}{"~>"}{".K"}              // generic

The biggest change is that atomic values like tokens are now directly subsorted into KItem, rather than into KLabel and then applied to an empty list like it used to be. There are only constant KLabels and KTokens, and there is no other way to add more of these except by adding them as constants. Some of the KLabels can be tagged with the attribute function, but that changes nothing in what regards KORE parsing. Since rewrite rules will only be allowed to rewrite K terms to K terms (see below), functions are implicitly only allowed to take KLists and return K terms.

  • Brandon: this grammar should describe the heavyweight forms of labels and tokens

  • Grigore: Why? The above is enough to allow users to define any language they want. For example:

    syntax KToken ::= "zero"
    syntax KLabel ::= "succ" | "plus"
    rule plus(zero,N) => N
    rule plus(succ(M),N)) => succ(plus(M,N))
    

    I understand your concern, namely that you may have an infinite number of tokens (and even of labels), and so we need some generic, heavy mechanism to encode them. Well, I don't think we need. In any given semantics we are going to only have a finite number of them, which we can simply define as constants. For example

    syntax KToken ::= "0" | "main" | "Object"
    

    Yes, programs can use infinitely many of them, but I still fail to see where the problem is. The parser will know how to deal with them. Remember that we are going to keep the syntax declarations of the form

    syntax KToken ::= Token{[1-9][0-9]*}
    

    as a means to define infinitely many KToken constants (and we may even adopt a similar one if we decide we want infinitely many KLabel constants, too). As a parenthesis, we should also fix the representation problem for tokens that we have right now, where we make a distinction between "main" and Token{main}. There should be no difference. They should just be both the same constant token. I would prefer the lighter notation, generalized with regular expressions. That is, syntax Token ::= "[1-9][0-9]*", etc. And this should work not only for constants of sort KToken, but for constants of any sort, that is,

    syntax Number ::= "[1-9][0-9]*"
    

    should generate tokens and not KLabels. In other words, tokens are regarded as special labels that take no arguments, which we prefer to write as t instead of t(.KList). Finally, if we want to also have follows restrictions for a token, then we can use attributes for those.

Special KLabels and KTokens Corresponding to Concrete Syntax

Recall that K allows concrete syntax productions and rules of the form:

syntax Int ::= Token{"[1-9][0-9]*"}
syntax Exp ::= Int
             | Exp "+" Exp
syntax Ints ::= List{Int}{","}{".Ints"}
syntax Exps ::= List{Exp}{","}{".Exps"}
rule 0 + E => E
rule E + 0 => E

but that all of these eventually yield KTokens and KLabels, which then can be used to write semantic rules. What are the names of these special KTokens and KLabels? Any names would work, provided that they contain the sequence of characters that the KToken or the KLabel comprises, as well as the sort/type information. For uniformity and for conceptual back-compatibility with the K concrete parser, we propose the following syntax for both KLabels and KTokens:

NAME "::" Type

where we will refer to NAME as the root of the token or label (escape mechanisms will be needed to deal with whitespaces, double colons, etc.) and where Type is either Sort (for tokens) or Sort1*Sort2*....*Sortn->Sort (with no spaces) for labels. Additionally, for each label label::Sort1*Sort2*...*Sortn->Sort we also automatically add its root label label to KLabels, which for all rewriting purposes can be regarded as the top-lattice variant of the corresponding operation, that is, label:K*K*...*K->KItem; conceputally, it stands for all the overloaded operations with the same root. For example, the above would be equivalent to the following KTokens, KLabels and rules:

syntax KToken ::= "[1-9][0-9]*::Int"  // or Token{[1-9][0-9]*::Int}
syntax KLabel ::= "_+_::Exps*Exps->Exps"
                | "_+_"
                | "_,_::Ints*Ints->Ints"
                | "_,_"
                | ".Ints::->Ints"
                | ".Ints"
                | "_,_::Exps*Exps->Exps"
                | ".Exps::->Exps"
                | ".Exps"
rule _+_::Exps*Exps->Exps(0::Int, E:K) => E:K
rule `_+_::Exps*Exps->Exps`(E:K, `0::Int`) => E:K
  // As a variation, we used K's brackets (described below).
  // These can be useful to avoid ambiguities.

These rules can also be written as (see the discussion below):

rule _+_(0::Int, E:K) => E:K
rule `_+_`(E:K, `0::Int`) => E:K

The "::" notation is consistent with the static cast convention in Full K.

There is a subtlety regarding overloaded KLabels, which in fact determined the decision to add the root labels as well, but that is more a concern of the backend rewrite engines. Suppose that you want to define a function length for lists of expressions, here using concrete syntax:

syntax Int ::= length(Exps)  [function]
rule length(.Exps) => 0
rule length(E:Exp,Es:Exps) => 1 + length(Es)

The frontend will need to be smart enough to add a common KToken for the unit of the overloaded list, say ._,_, and add the ktoken(._,_) attribute to both .Exps and .Ints, and then translate the above into the following KAST (provided that we continue to work with cons-lists; things would be different with associative lists):

syntax Int ::= length(Exps)  [function]
rule `length`(`._,_`) => `0::Int`
  // using K's brackets (backticks) for clarity (explained below)
rule `length`(`_,_`(E:Exp,Es:Exps)) => `_+_`(`1::Int`, `length`(Es))

Since the rewrite engine calculates the least sort of a term anyway, the above rules would safely work, because _,_ without sorting information is the same as _,_::K*K->K.

Variables and Sort Memberships

Adding support for dynamic sorting/casting and variables:

syntax Variable ::= NAME
                    // possibly restricted to start with capital letter

syntax KLabel ::= Variable ":" KLabel
syntax KItem  ::= Variable
                | KItem ":" Sort  // KItem must obey its sort at runtime
                  // sorting arbitrary KItems; Sort can also be
                  // K, KItem, KToken, but not KList

In KAST, we want all the variable occurrences to be sorted. Although our syntax above allows for unsorted variables to keep the number of productions small, the KAST parser will reject definitions where variables are not unsorted. Also, the same variable should have the same sort in all occurrences in the same sentence. If we want to not enforce a runtime sort membership check for a variable, for performance reasons, then all we have to do is to sort is with the sort K, e.g., T:K. Since any term has the sort K anyway, backends will likely not insert an unnecessary runtime check in this case.

Note that the syntax above suffices to declare variables of any of the sorts KItem, KList and K! This may look wrong or tricky at first sight, because one would expect a term/variable sorted as a KList to be a KList term and not a KItem, but it is in fact correct; that's because all those sorts include KItem. For example, the term L:KLabel(KL1:KList,K:KItem,Kl2:KList) is a well-formed KItem term. It is likely that a backend stores and handles Kl:KList as a term of sort KList, but that is an orthogonal issue; our aim here is to devise a simple syntax for KORE, independently from how backends may store or handle it.

##Cells and Configurations

In the long run we want cells and configurations to be embedded into K the same way we decided to embed the builtins (integers, maps, etc.). Probably <k> T </k> will become something of the form <k>_</k>(T), or to avoid ambiguities, `<k>_<.k>`(T). However, for now we decided to make a notational exception and allow cells to be written in an XML-ish notation in KAST (this is orthogonal to how cells are represented in the KAST data-structures):

syntax K ::= Cells
syntax Cells ::= List{Cell}{""}   // non-empty list
syntax Cell ::= "<" NAME OptionalTags ">" K "</" NAME ">"
                // the two NAMEs should be the same
syntax OptionalTags ::= List{NAME "=" String}{""}{""}
  • Brandon: We need to allow rewrites and bracketing as entries in a Cells

  • Grigore: Right ... and I don't like it. Maybe we should take this as an opportunity to also make the cells be K terms, and thus allow cells to be normally passed and handled like any other K terms?

##Rewriting and HOLE

For now, we believe that allowing rewrites only between terms of sort K should be sufficient:

syntax K ::= K "=>" K     // should be greedy
           | "`" K "`"    // bracket of K, used for grouping;
                          // multi-backticks allowed (discussed shortly)
           | "HOLE"       // further restricted, only in some contexts
           | K "requires" K
           | K 'ensures" K
syntax KLabel ::= "`" KLabel "`"  // bracket for KLabels

A rewrite arrow =>, or a requires, or an ensures construct is not allowed to appear as a descendant of any other rewrite arrow, or requires or ensures construct. Any rule must contain one or more rewrite arrow constructs =>, and the requires or ensures constructs can only appear at the top of the K term in the rule (in any order and with any multiplicities). For example, the following is a correct rule at this stage (additional checks may reject it for other reasons):

rule L1(KL1,`K1 => K1'`, KL2, L2(K2 => K2'))
  requires K3
  ensures K4
  requires K5
  ensures K6

The ensures construct is typically useful to state constraints over fresh variables introduced by the right-hand-side of the rule.

Note that we do not want to add parentheses as a bracket in KAST, because we know it that those can be in conflict with parenthesis constructs in the user-defined syntax; we were not able to handle the syntax of Scheme, for example, before.

  • Brandon: HOLE should probably be a KItem like other variables, putting it here disallows freezing one item out of a KSequence as in label(HOLE~>Rest).

  • Grigore: I don't understand this. label(HOLE~>Rest) is a KItem when HOLE is a K term, so I don't see what the problem is here.

K Brackets

To avoid conflicts with user syntax and to allow nested brackets, we plan to adopt the Markdown code convention for brackets. In short, that says that you can use arbitrarily many backticks as open bracket and precisely the same number of backticks as close bracket, and the nested brackets should use fewer ticks. For example, `` `_+_`(1,2) => 3 `` represents a top bracket enclosing a rewrite rule whose LHS is the the label _+_ applied to 1 and 2 and whose RHS is 3.

Approximating the convention with context free productions, K brackets include the following

syntax K ::= "`" K "`" | "``" K "``" | "```" K "```" | ....

The rationale for adopting Markdown's convention is twofold:

  • It is a widely used approach to mark bubbles of text within text, designed by experts in this area with millions of users in mind. There are not too many complaints posted on the internet about Markdown's backtick convention, so it is probably quite good. We need precisely the same characteristics for our brackets/bubbles in K, so why reinvent the wheel? Our best shot here is to take over Markdown's convention as is, without changing anything in it, because any change we make can have unexpected consequences. Think about it this way: is there anything that Markdown tried to achieve with its bubbles that we would't need? I cannot think of anything. We want a bit more than Markdown, though, namely to have nested brackets (i.e., bubbles in bubbles), but that does not seem to be a problem because we can just apply the same mechanism inside the bubbles.

  • It only reserves one character, the backtick, so users will only need one thing to worry about (escaping that one character only); this is obviously a consequence of the overall good design of Markdown.

Brackets and the Outer Parser

The outer parser will consider brackets when parsing a K bubble. This allows the object language syntax to contain keywords like rule which would otherwise end the bubble, by enclosing them in a bracket.

The outer parser scans the text immediately following the current outer syntax keyword until the next keyword is encountered, skipping all the brackets. The beginning and the end of brackets are very easy to identify because they consist of the same number of backticks; since nested brackets should contain fewer backquotes, the text inside the bracket can be very effectively skipped.

#Example: IMP in KAST

Below is IMP defined in the KAST language above, including all the builtins that it needs. For simplicity, it uses the root-only syntax for KLabels, namely _+_ instead of _+_::Exp*Exp->Exp.

###IMP Syntax and Builtins

We can define the syntax either using sorts and productions, which are easier to read and hold more information that may be useful in some backends, or using only KLabels, which are more minimal but also lose sort information.

Here is the syntax using sorts and productions. This is almost identical to the IMP syntax in K, except for purely parsing-related features, such as bracket productions, operation priorities, prefer and avoid attributes, etc.:

module BUILTINS
  syntax Int                           [builtin(Integer)]
  syntax Bool                          [builtin(Boolean)]
  syntax Id                            [builtin(Identifier)]
  syntax Int  ::= Int "/Int" Int       [builtin(IntegerDivision)]
                | Int "+Int" Int       [builtin(IntegerAddition)]
  syntax Bool ::= Int "<=Int" Int      [builtin(IntegerLeq)]
                | Int "=/=Int" Int     [builtin(IntegerNeq)]
                | "notBool" Bool       [builtin(BoolNot)]
  syntax Map  ::= ".Map"               [builtin(MapEmpty)]
                | lookup(Map)          [builtin(MapLookup)]
                | update(Map,K,K)      [builtin(MapUpdate)]
                | addBinding(Map,K,K)  [builtin(MapAddBinding)]
  syntax Bool ::= K "in" Map           [builtin(MapMembership)]
  syntax K    ::= keys(Map)            [builtin(MapGetKeys)]
endmodule

module IMP-SYNTAX
  imports BUILTINS
  syntax AExp  ::= Int | Id
                 | AExp "/" AExp                         [strict]
                 | AExp "+" AExp                         [strict]
  syntax BExp  ::= Bool
                 | AExp "<=" AExp                        [seqstrict]
                 | "!" BExp                              [strict]
                 | BExp "&&" BExp                        [strict(1)]
  syntax Block ::= "{" "}"
                 | "{" Stmt "}"
  syntax Stmt  ::= Block
                 | Id "=" AExp ";"                       [strict(2)]
                 | "if" "(" BExp ")" Block "else" Block  [strict(1)]
                 | "while" "(" BExp ")" Block
                 | Stmt Stmt
  syntax Pgm ::= "int" Ids ";" Stmt
  syntax Ids ::= List{Id}{","}{".Ids"}
endmodule

Here is the KLabel alternative of the syntax above, which may be preferred in some cases:

module BUILTINS
  syntax Int                        [builtin(Integer)]
  syntax Bool                       [builtin(Boolean)]
  syntax Id                         [builtin(Identifier)]
  // Should we also preserve the subsorting relation, just in case
  // the rewrite engine needs to calculate the least sort of a term
  // dynamically?  It would be nice to not have to, if possible.

  syntax KLabel ::= "_/Int_"        [builtin(IntegerDivision)]
                  | "_+Int_"        [builtin(IntegerAddition)]
                  | "_<=Int_"       [builtin(IntegerLeq)]
                  | "_=/=Int_"      [builtin(IntegerNeq)]
                  | "notBool_"      [builtin(BoolNot)]
                  | "lookup_"       [builtin(MapLookup)]
                  | "update_"       [builtin(MapUpdate)]
                  | "addBinding_"   [builtin(MapAddBinding)]
                  | "_in_"          [builtin(MapMembership)]
                  | "keys_"         [builtin(MapGetKeys)]
  syntax KToken ::= ".Map"          [builtin(MapEmpty)]
                    // All constant labels can be tokens
endmodule

module IMP-SYNTAX
  imports BUILTINS
  syntax KLabel ::= "_/_"           [strict]
                  | "_+_"           [strict]
                  | "_<=_"          [seqstrict]
                  | "!_"            [strict]
                  | "_&&_"          [strict(1)]
                  | "{_}"
                  | "_=_;"          [strict(2)]
                  | "if(_)then_"    [strict(1)]
                  | "while(_)_"
                  | "__"
                  | "int_;_"
                  | "_,_"           [assoc]
                    // not sure: we may want an attribute to allow label
                    // take arbitrarily many arguments.
  syntax KToken ::= "{}"
                  | ".Ids"
endmodule

IMP Semantics

Recall that sorted variables belong to the syntactic category KItem, which only means that they can appear anywhere a KItem variable can appear, that is, anywhere except as a KLabel, and that unsorted variables belong to either the KLabel or the KItem syntactic categories; this yields no ambiguity, because the context quickly tells which is which. The internal representation of the K definition can and probably should change the sort of each variable to whatever it wants. Our objective here was to have a very simple and efficient KAST syntax and parser. For clarity, we backtick all the KLabels and KTokens, although a good parser should probably not need all these K brackets, hopefully even none:

module IMP
  imports IMP-SYNTAX
  rule isKResult(I:Int) => `true::Bool`
  rule isKResult(B:Bool) => `true::Bool`

  configuration <T>
                  <k> $PGM:Pgm </k>
                    // or $PGM:K if one prefers to pass other syntactic
                    // categories (e.g., Stmt, AExp, etc)
                  <state> `.Map` </state>
                </T>

  rule <k> X:Id => `lookup`(Sigma,X:Id) ...</k> <state> Sigma </state>
  rule `_/_`(I1:Int,I2:Int) => `_/Int_`(I1:Int,I2:Int)
    requires `_=/=Int_`(I2:Int,`0::Int`)
  rule `_+_`(I1:Int,I2:Int) => `_+_`(I1:Int,I2:Int)
  rule `_<=_`(I1:Int,I2:Int) => `_<=_`(I1:Int,I2:Int)
  rule `!_`(T:Bool) => `notBool_`(T:Bool)
  rule `_&&_`(`true::Bool`),B:BExp) => B:BExp
  rule `_&&_`(`false::Bool`,_) => `false::Bool`
  rule `{}`(.KList) => .K   [structural]
  rule `{_}`(S:Stmt) => S:Stmt   [structural]
  rule <k> `_=_;`(X:Id,I:Int) => .K ...</k>
       <state> Sigma => `update`(Sigma,X:Id,I:Int) </state>
  rule `__`(S1:K,S2:K) => S1:K ~> S2:K   [structural]
  rule `if(_)_else_`(`true::Bool`,S:Stmt,_) => S:Stmt
  rule `if(_)_else_`(`false::Bool`,Bool),_,S:Stmt) => S:Stmt
  rule `while(_)_`(B:BExp,S:Stmt)
    => `if(_)_else_`(B:BExp,
                     `{_}`(`__`(S:Stmt,`while(_)_`(B:BExp,S:Stmt))),
                     `{}`)   [structural]
  rule <k> `int_;_`((`_,_`(X:Id, Xs:Ids) => Xs:Ids), _) </k>
       <state> Sigma:Map
            => `addBinding`(Sigma:Map, X:Id,`0::Int`) </state>
    requires `notBool`(`_in_`(X:Id, `keys`(Sigma:Map)))
  rule `int_;_`(`.Ids`,S:Stmt) => S:Stmt   [structural]
endmodule

#Kuestions, Koncerns, etc.

Below we compiled a list of questions that were directly or indirectly asked at various moments. Please feel free to ask more questions, even if you are not sure what you are asking precisely :) The questions help us all nail down the corner cases and perhaps even come up with something better, if needed.

Kuestion

In case we decide to disallow the unescaped use of the special backquote symbol in user syntax, wouldn't it make sense to implement a simple bracket checker, to ensure that brackets are properly nested?

Answer: This would be very useful and easy to implement (linear?). Users would be given insightful error messages very quickly. Moreover, the checker can put the (nested) bracket contents in special intermediate data-structures to make the outer and the inner parsers more efficient.

Kuestion

How can we use the new K bracket to tell the outer parser that the entire contents of a rule should be considered as a bubble, and thus prevent it from matching rule or syntax or other of its keywords in that rule's contents?

Answer: Suppose that your rule is rule contents [attributes] and that contents happens to have no K brackets in it; then you can simply write the rule as rule `contents` [attributes]. Now suppose that contents happens to have some subterms containing brackets with both one and two backticks, say `k1` and ``k2``; then just use three backticks as a bracket for contents in the rule: rule ```contents``` [attribute]. And so on and so forth. The point is that once we have so easy to parse brackets in K, there is no need to have heredocs or other similar mechanisms to mark the rule contents.

Kuestion

Aren't we now asking too much from the outer parser? It now has to know how to parse Markdown bubbles, which is kind of non-trivial.

Answer: Not really. Very little needs to change in the outer parser compared to, say, our previously discussed approach based on heredocs. First, note that you can obtain precisely the same effect as that obtained with heredocs by adding a bracket with enough backticks around the contents of the rule, in the same style as we did in the answer of the question above. Since Markdown enforces only brackets with fewer backticks inside brackets with more backticks, scanning the contents of the rule is equally easy now to how it was with heredocs, and easier than how it would have been with ordinary nested parentheses. So nothing is lost in terms of capabilities, on the contrary.

One may argue that something is lost in terms of user-experience in the presence of errors, specifically that outer parsing errors may be now harder to detect. Consider, for example, a definition snippet of the form

syntax Exp ::= rule(Exp)
rule `_+_`(1,rule(2)) => 3     // rule appears unbracketed in K term
rule `_+_`(1,`rule(2)) => 3    // the close bracket forgotten in `rule(2) 
rule `_+_`(1,"rule(2)") => 3   // rule appears in a string in K term

Each of the three rules above would yield admittedly horrible outer parsing errors. However, with the capability that we can pretty-print the current definition at each step, one should be able to see what happened. Users will just have to learn that if they use outer syntax keywords as terminals in their syntax, then they just have to properly bracket the rules that use those keywords. The same happens if you forget to put the heredocs around such rules, so there isn't really much of a difference in terms of user-experience.

Kuestion

The described approach of how the outer parser deals with brackets seems to work well only under the assumption that users cannot use unescaped backquotes in their syntax. Would it extent easily if we later on decide to allow users to use backquotes in their syntax?

Answer: We think so, but it of course depends on how we decide to handle the user backquotes, and what expectations we want our users to have (a user study may be needed). For example, we may want writers and especially readers of K definitions to have to worry only about their own language inside of bubbles (and the K constructs added to let them write rules). The outer parser cannot always recognize brackets in precisely the same place as the inner parser (because it must not depend on the object language, so a token like "ab`cd" which the inner parser treats as a unit may look like it opens or closes a bubble to the outer parser). The reader may only want to look at the first character of the rule to see whether they may need to keep in mind the outer parser's weaker lexical idea of brackets, in addition to the object language grammar. In that case, if the contents of the bubble begins with a backquote, the outer parser finds the end of the bubble by parsing a sequence of K brackets and unbracketed text until it finds an outer syntax keyword outside of a bracket. If the bubble does not begin with a backquote, then the outer parser doesn't try to find brackets starting later, and just ends the bubble wherever it appears an outer syntax keyword appears (which may even be inside a user-defined token like "a rule b", because it doesn't know the user syntax).

Kuestion

It is nice that with one reserved character, the backtick `, we solved both the problem of brackets in K and the problem of bubbles in the outer parser. But does that mean that users cannot use the backtick in their syntax anymore? What if I want to have backtick-enclosed strings in my language? Wouldn't I run into a nasty syntactic conflict with K's brackets, which I wouldn't have with heredocs?

Answer: Yes, you may run into parsing conflicts if you use the backtick in your language's syntax, but note that the same would happen no matter what brackets you use for K! And we do need brackets in K. So this is a problem related to brackets, and not to heredocs. That is, by adopting heredocs for expressing rule bubbles would only solve the problem of rule contents possibly containing restricted keywords/characters for the outer parser, but not in general for K, not even for KAST. We had exactly that problem previously with the K ::= "(" K ")" brackets, which conflicted very badly with user-defined parentheses, especially when they were not defined as brackets in the user-defined syntax. This is a common problem with string-like or bracket-like notations. For example, if you want a string to contain the string delimiter as a character, then that character needs to be escaped (e.g., "\"" is the string containing only one character, "). Therefore, we decided for the time being to adopt a similar escaping convention, and thus require users to escape all the backticks that are part of their syntax. For example, they would have to always write \`abc\` for their backtick-enclosed strings. Ugly, but not completely unexpected. After all it is only one character that K asks from its users for itself, which is not that much to remember.

Nevertheless, some users will likely complain with the escaping solution above, because they want to see their syntax unaffected by artifacts of K. To keep such users happy, we believe that it would be nice to allow for local renaming of syntax in K, something like this:

...
rename K_BRACKET to $
...
rule $_+_$($`my_`::MyString$, $`string`::MyString$) => $`my_string`$
     // exaggerating the use brackets on purpose
...
rename K_BRACKET to `
...

Regardless of what solutions we pick, I want to reemphasize that this problem is a problem of brackets and not of trying to replace the heredocs with something else. Having heredocs would not help at all this problem, because all the conflicts above would happen inside the heredoc anyway. And once we solve the problem for brackets, which we have to do anyway, we get a solution for our heredoc problem for free. So heredocs are not needed.

###Kuestion The bracket does not really solve the problem of KLabels, because `_+_` can also parse as + applied to two anonymous variables using the concrete syntax of the language, so the concrete syntax parser for K may report an ambiguity anyway in terms like `_+_`(1,2).

Answer: This is not a problem of KAST. It is a problem that the concrete syntax parser needs to deal with. A simple solution is to have the K parser consider the KLabel and KToken constants to have the prefer attribute. That means that if constants which can be KLabels or KTokens are involved in any ambiguity, then we prefer them to be KLabels or KTokens and not something else. The rationale behind this decision is that users are given other means to state their defined operations in K, for example using whitespaces or their explicit KLabel representation.

If the prefer solution above is adopted, then the number of bacticks appearing in rules will be significantly reduced anyway, because in that case there is only one way to parse _+_(1,2), the desired one.

###Kuestion Hold on, does that mean that if I have an operation in prefix notation whose KLabel has the same name, then I will never get a chance to refer to my operation because the KLabel takes over? For example:

syntax KLabel ::= "f"
syntax Exp ::= f(Exp)  [klabel(f)]
rule f(3) => 0   // I want f::Exp->Exp here and not f::KLabel!

Answer: This is again a problem with concrete syntax parsing in K and not with KAST, but since KAST will be smoothly extended by K, and so everything working in KAST should work unchanged when we regard it as a K definition, we address this issue here, too. Yes, you are right, there does not seem to be any way to refer to your original f::Exp->Exp with this convention. Moreover, you have a parsing ambiguity on f(3).

We can think of two ways to deal with this problem, which are open for further discussions. One approach is that the concrete K parser should not report ambiguities in which both parses yield the same AST; if there is no ambiguity anyway, why would you care whether f parses as your original operation or as its label? Another approach is to regard the user productions as explicit mechanisms for making syntax that looks like an application, so maybe we can let those productions take priority over label application, especially because you can always write `f`(Args) or maybe even ``f``(Args), if there's an explicit production f(Exps) that prevents f(Args) from being a label application (and perhaps the production f(Exps) is explicitly assigned a different KLabel - maybe fExp to separate it from a similar-looking production in another sort).

BTW, if f is used on a different number of arguments than declared, for example

syntax KLabel ::= "f"
syntax Exp ::= f(Exp)  [klabel(f)]
rule f(3,4) => 0   // I want f::Exp->Exp here and not f::KLabel!

then an error should be reported, regardless of the fact that we currently allow KLabels to apply to as many arguments as they want to. Some people think that we should disallow that in K anyway, that is, that we should enforce each label to declare how many arguments it takes. But that is another discussion.

###Kuestion What would some other reasonable possibilities for representing basic (i.e., ones generated from user-defined concrete syntax productions) KLabels and KTokens be? The concern here is that maybe the backticked notation gets in some unexpected conflicts with the user-defined syntax and the users have no escape mechanism, no matter how heavy that would be.

Answer: We are contemplating the possibility to allow both the lightweight backticked notation and a more heavyweight syntax. Exact lexical details still open, but something like this:

Lightweight            |  Heavyweight
------------------------------------------------------
`_+_::Exp*Exp->Exp`    |  #label("Exp*Exp->Exp","_+_")
`12::Int`              |  #token("Int","12")

Since we allow multi-backtick brackets now, the heavy notation may not be necessary after all. But we keep the possibility open.

Note a lightweight label and token can be distinguished by the arrow even if the label doesn't take arguments: pi::->Float is a label, 12.0::Float is a token.

###Kuestion: Should there be a .Cells for empty cells? Should we allow rewrites to be written aside cells, as in

<k>'spawn(P)~>_</k> (.K => <thread><k>P</k></thread>)
(or maybe (.Cells => <thread><k>P</k></thread>)

Answer: Not sure about this. We would like cells to be just K terms in the long run, in which case we would prefer .K (which we will rename to . probably) instead of .Cells.

###Kuestion: Do we need a more elaborate syntax for declaring configurations and cells, or we can limit ourselves to KItems? How about modularity, where we want to name and reuse configuration fragments?

Answer: I think this can be addressed separately once we are done with the first version of KAST. That is, we can keep configurations and cells as are for now, using the XML notation which is easy to parse, and we can worry about the general case later. The general solution could, indeed, be to turn cell and configuration constructs into special KLabels, say ones with an attribute configuration. For example:

syntax KLabel ::= "'<T>_</T>"  [configuration]
syntax KLabel ::= "'<k>_</k>"  [configuration]

Then the rewrite engine / backend should recognize these special labels and interpret them accordingly, for example by applying the configuration concretizers to them.

Anyway, we still have to discuss and design the modular configurations.

###Kuestion: If we reduce configurations and cells to ordinary KLabels like above, then don't we need to allow rewrite rules to have the syntax KList ::= KList "=>" KList in order to allow rewrites that take sets of cells to sets of cells?

Answer: Hm, maybe. We have to think about this some more.

###Kuestion: Shouldn't we keep all the syntax declarations in KAST like in K, so that we can use the priority, prefer, bracket, etc., information for unparsing?

Answer: Probably yes.

###Kuestion Don't we need static cast KItem "::" Sort in KAST?

Answer: Doesn't seem so. The syntactic casts are useful in the front-end to tell the concrete syntax parser which KLabel to generate for the parsed concrete syntax, but once all the KLabels are in place we don't need the syntactic casts anymore.

###Kuestion: Rules of the form rule k1 => k2[X] turned out to be a source of problems until now, because [X] is interpreted as the optional attribute of the rule and sometimes you want it to be interpreted as concrete language syntax, for example an array index.

Answer: Yes, that is a problem, but a problem of the front-end and not of KAST. The current solution in the front end is to use parentheses: rule (k1 => k2[X]). But we should at the same time improve the front-end to give good error messages. And we should allow the outer parser to show its output to the user, so one would immediately know when [X] is parsed as an attribute.

###Kuestion: How about rules between KLabel terms, as we can do with our current concrete parser, for example:

syntax KLabel ::= "foo" | "bar" | "baz"
rule foo => bar
rule Context[(bar => baz)(kl)]
  // kl is some KList term, and Context some K holed context

Answer: We found that in all cases we looked at we can complete/change the rules to apply between K terms:

rule foo(KL:KList) => bar(KL:KList)
rule Context[bar(kl) => baz(kl)]

We need to develop such automatic translators as part of the front-end.

###Kuestion: So now we can wrap a KLabel as a KToken and then pass it around like any KItem. But the whole point of KLabels is that they can be applied to KLists. How can you apply a wrapped KLabel, say #klabel('_+_), to a KList?

Answer: We can easily define a KLabel, say #kapply, together with a rule

rule #kapply(#klabel(L:KLabel), Kl:KList) => L:KLabel(Kl:KList)

which can now be used to apply KLabels represented as KTokens to KLists. For example, if you want to apply the wrapped label '_+_ to the list 1,2, then you can write #kapply(#klabel('_+_),1,2). However, as explained below in answers to other questions, you may want to also wrap the KLists that you want to pass around with a KLabel, say #klist. In that case, you would write the rule above as

rule #kapply(#klabel(L:KLabel), #klist(Kl:KList)) => L:KLabel(Kl:KList)

and the particular example as #kapply(#klabel('_+_),#klist(1,2)).

###Kuestion: Can we support complex constructors for KLabels in KAST, like we can currently do in K with concrete syntax? For example:

syntax KLabel ::= foo(KLabel)
rule Context[foo(L:KLabel)(KL:KList)]

syntax KLabel ::= bar(KList)
rule Context[bar(KL1:KList)(KL2:KList)]

Answer: No problem. One way to do it is to regard them as ordinary KLabels that yield KItems meant to be (wrapped) KLabels, and then use #kapply (and #klist) as we did above. For example:

syntax KLabel ::= "foo"
rule Context[#kapply(foo(#klabel(L:KLabel)), #klist(KL:KList))]

syntax KLabel ::= "bar"
rule Context[#kapply(bar(#klist(KL1:KList)), #klist(KL2:KList))]

So #kapply needs not always be used as a "function"; it can also be used as a "constructor". I quote these words because KAST does not provide any specific support for them. They are so only in our mind. From a KAST perspective they are just KLabels which are matched and rewritten.

###Kuestion: KAST should support anything we want to do in K with concrete syntax, such as, for example, "functions" of the form

syntax KList ::= rev(KList)
rule rev(K,KL) => rev(KL),K
rule rev(.KList) => .KList

How can we encode this in KAST?

Answer: Again, same basic principle like above: we add the various operations as KLabels, and then we add corresponding rules between corresponding KItem terms. For some of them we also need to get some special/additional labels involved, like #klist, #kapply, etc. For example:

syntax KLabel ::= "rev"
rule rev(#klist(K:K,Kl:KList)) => #klist.snoc(rev(#klist(Kl:KList)),K:K)
rule rev(#klist(.KList)) => #klist(.KList)

where #klist.snoc (snoc is the reverse of cons) is a special KLabel "function" defined as follows:

rule #klist.snoc(#klist(Kl:KList),K:K) => #klist(Kl:KList,K:K)

Note that we cannot simply make rev a KLabel and nothing else, because then we have strange situations like this one that Traian noticed: rev(rev(1,2)) => rev(1,2) => 2,1. The trick with #klist achieves the desired innermost evaluation strategy. We will have to find smart ways to automatically generate such KAST rules from the original concrete syntax K rules as part of the front-end.

###Kuestion One of the major claims of K is that it offers both the core and the meta-level (in Maude terminology) at the same, core level. If KAST is powerful enough, then it should be able to define generic up and down functions that take a K term and lifts it to its meta-representation (also a K term), and, respectively, take a meta-representation of a K term down to a K term.

Answer: Here it is. The code below is manually adapted, to fit the new KAST, from a K working definition (on March 30, 2014), so there could be typos:

module META_K
  syntax KLabel ::= "#kapply"  [function]
  rule #kapply(#klabel(L:KLabel),#klist(Kl:KList)) => L:KLabel(Kl:KList)

  syntax KLabel ::= "#klist.cons"  [function] | "#kseq.cons"  [function]
  rule #klist.cons(K:K,#klist(Kl:KList)) => #klist(K,Kl:Klist)
       // sorting K:K needed, otherwise default sort would be KList
  rule #kseq.cons(K:KItem,#kseq(Kl:KList)) => #kseq(K:KItem,Kl:KList)

  syntax KLabel ::= "#ktoken" | "#kapp" | "#klist" | "#kseq"
                    // constructors
  syntax KLabel ::= "#upKItem"  [function]
                  | "#upK"      [function]
                  | "#upKList"  [function]
  syntax KLabel ::= "#down"     [function]

  rule #upKItem(T:KToken) => #ktoken(T:KToken)
  rule #upKItem(L:KLabel(Kl:KList))
    => #kapp(#klabel(L:KLabel),#upKList(#klist(Kl:KList)))

  rule #upKList(#klist(K:K,Kl:KList))
    => #klist.cons(#upK(K:K),#upKList(#klist(Kl:KList)))
  rule #upKList(#klist(.KList)) => #klist(.KList)

  rule #upK(K:KItem ~> K':K)
    => #kseq.cons(#upKItem(K:KItem),#upK(K':K))
  rule #upK(.K) => #kseq(.KList)

  rule #down(#ktoken(T:KToken)) => T:KToken
  rule #down(#kapp(#klabel(L:KLabel),MKl:KItem))
    => #kapply(#klabel(L:KLabel),#down(MKl))

  rule #down(#klist(.KList)) => #klist(.KList)
  rule #down(#klist(K:K,Kl:KLabel))
    => #klist.cons(#down(K:K),#down(#klist(Kl:KList)))

  rule #down(#kseq(.KList)) => .K
  rule #down(#kseq(Ki:KItem,Kl:KList))
    => #down(Ki:KItem) ~> #down(#kseq(Kl:KList))
endmodule

Kuestion:

If we can use KAST for serialization between compiler passes, how does it preserve calculated information on AST nodes that later compiler passes might need?

  • Brandon: We agreed that any information can be expressed as attributes, but it's not clear what syntax could work, even if we don't try to provide the term-metadata syntax in the concrete syntax.

  • Grigore: To keep things simple, we should not worry about any other meta-data for terms. Of course, we need sort memberships, but those are already allowed as part of KAST, using the syntax term:Sort. We will continue to keep the meta-data for rules within rule attributes. If we need to report an error message regarding a particular position of a particular syntactic construct, then: 1) if that error happened during the concrete syntax phase then the concrete parser has the location information that it needs; 2) if that error happened during the kompilation, then the location of the rule and the KAST syntax of the term where the error happened are good enough for the beginning (and probably for good).

Old Things that May Still Be Useful

The Backtick Rule

We allow the text meant to parse as K to use backticks `. A backtick tells the GeneriK parser that the next character is part of the K bubble, so it should not treat it as part of any of its kewords. This feature is useful when the defined language has syntax that overlaps with that of GeneriK, and thus may confuse the GeneriK parser with regards to where the bubbles start and where they end. For example, the GeneriK parser should parse the following module

module TRY
  syntax Sort ::= "rule" | "syntax" | "configuration"
  configuration <T> <k> $PGM </k> </T>
  rule <k> (`rule => configu`ration) ~> `syntax ~> rule ...<k> [attribute]
endmodule

as something like

module TRY
  syntax Sort ::= "rule" | "syntax" | "configuration"
  configuration GeneriK_BUBBLE(<T> <k> $PGM </k> </T>)
  rule GeneriK_BUBBLE(<k> (`rule => configu`ration) ~> `syntax ~> )
  rule GeneriK_BUBBLE(...<k>) [attribute]
endmodule

and a corresponding error message will be then given by the bubble parser (full K or the KORE). Once one adds a backtick before any of the four characters of the last rule token, say before the last one, then the module would be parsed by the GeneriK parser as

module TRY
  syntax Sort ::= "rule" | "syntax" | "configuration"
  configuration GeneriK_BUBBLE(<T> <k> $PGM </k> </T>)
  rule GeneriK_BUBBLE(<k> (`rule => configu`ration) ~> `syntax ~> rul`e ...<k>) [attribute]
endmodule

end eventually kompiled correctly. In practice, we will probably put the backtick at the beginning of the token, but it is simpler and more general to allow it anywhere.

Clone this wiki locally