Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Autosemi after comments #134

Open
wants to merge 5 commits into
base: new-ast
Choose a base branch
from

Conversation

flbulgarelli
Copy link
Contributor

@flbulgarelli flbulgarelli commented Mar 12, 2022

Hi! Currently, there is a subtle difference in the way language-javascript deals with auto semicolons when comments are present.

Let's have a look at this code sample:

function f1() {
  return // hello
    4
}

function f2() {
  return /* hello */
    4
}


function f3() {
  return /* hello
    */ 4
}

function f4() {
  return
    4
}

And how esprima parses it:

{
  "type": "Program",
  "body": [
    {
      "type": "FunctionDeclaration",
      "id": {
        "type": "Identifier",
        "name": "f1"
      },
      "params": [],
      "body": {
        "type": "BlockStatement",
        "body": [
          {
            "type": "ReturnStatement",
            "argument": null
          },
          {
            "type": "ExpressionStatement",
            "expression": {
              "type": "Literal",
              "value": 4,
              "raw": "4"
            }
          }
        ]
      },
      "generator": false,
      "expression": false,
      "async": false
    },
    {
      "type": "FunctionDeclaration",
      "id": {
        "type": "Identifier",
        "name": "f2"
      },
      "params": [],
      "body": {
        "type": "BlockStatement",
        "body": [
          {
            "type": "ReturnStatement",
            "argument": null
          },
          {
            "type": "ExpressionStatement",
            "expression": {
              "type": "Literal",
              "value": 4,
              "raw": "4"
            }
          }
        ]
      },
      "generator": false,
      "expression": false,
      "async": false
    },
    {
      "type": "FunctionDeclaration",
      "id": {
        "type": "Identifier",
        "name": "f3"
      },
      "params": [],
      "body": {
        "type": "BlockStatement",
        "body": [
          {
            "type": "ReturnStatement",
            "argument": null
          },
          {
            "type": "ExpressionStatement",
            "expression": {
              "type": "Literal",
              "value": 4,
              "raw": "4"
            }
          }
        ]
      },
      "generator": false,
      "expression": false,
      "async": false
    },
    {
      "type": "FunctionDeclaration",
      "id": {
        "type": "Identifier",
        "name": "f4"
      },
      "params": [],
      "body": {
        "type": "BlockStatement",
        "body": [
          {
            "type": "ReturnStatement",
            "argument": null
          },
          {
            "type": "ExpressionStatement",
            "expression": {
              "type": "Literal",
              "value": 4,
              "raw": "4"
            }
          }
        ]
      },
      "generator": false,
      "expression": false,
      "async": false
    }
  ],
  "sourceType": "script"
}

As you can see, the parser is introducing an automatic semicolon in all the four scenarios. This behavior is consistent with modern V8 and SpiderMonkey engines.

However, let's have a look at how language-javascript handles these scenarios. If you add the following tests to ProgramParser.hs...

describe "function with autosemi on return" $ do
  it "return with inline comment" $ do
      testProg "function f1(){ return //hello\n 4 }"  `shouldBe` "Right (JSAstProgram [JSFunction 'f1' () (JSBlock [JSReturn ,JSDecimal '4'])])"

  it "return with multiline comment" $ do
      testProg "function f2(){ return /* hello */\n  4 }"  `shouldBe` "Right (JSAstProgram [JSFunction 'f2' () (JSBlock [JSReturn ,JSDecimal '4'])])"

  it "return with multiline comment with newline" $ do
      testProg "function f3(){ return /* hello\n  */ 4 }"  `shouldBe` "Right (JSAstProgram [JSFunction 'f3' () (JSBlock [JSReturn ,JSDecimal '4'])])"

  it "return with newline" $ do
      testProg "function f4(){ return \n 4 }"  `shouldBe` "Right (JSAstProgram [JSFunction 'f4' () (JSBlock [JSReturn ,JSDecimal '4'])])"

...only the fourth and last test will actually produce the expected results. The other three will fail with the same error:

  test/Test/Language/Javascript/ProgramParser.hs:29:13:
  1) Program parser:, function with autosemi on return, return with inline comment
      expected: "Right (JSAstProgram [JSFunction 'f1' () (JSBlock [JSReturn ,JSDecimal '4'])])"
        but got: "Right (JSAstProgram [JSFunction 'f1' () (JSBlock [JSReturn JSDecimal '4' ])])"

  To rerun use: --match "/Program parser:/function with autosemi on return/return with inline comment/"

  test/Test/Language/Javascript/ProgramParser.hs:32:13:
  2) Program parser:, function with autosemi on return, return with multiline comment
      expected: "Right (JSAstProgram [JSFunction 'f2' () (JSBlock [JSReturn ,JSDecimal '4'])])"
        but got: "Right (JSAstProgram [JSFunction 'f2' () (JSBlock [JSReturn JSDecimal '4' ])])"

  To rerun use: --match "/Program parser:/function with autosemi on return/return with multiline comment/"

  test/Test/Language/Javascript/ProgramParser.hs:35:13:
  3) Program parser:, function with autosemi on return, return with multiline comment with newline
      expected: "Right (JSAstProgram [JSFunction 'f3' () (JSBlock [JSReturn ,JSDecimal '4'])])"
        but got: "Right (JSAstProgram [JSFunction 'f3' () (JSBlock [JSReturn JSDecimal '4' ])])"

  To rerun use: --match "/Program parser:/function with autosemi on return/return with multiline comment with newline/"

In other words, this parser is only inserting an automatic semicolon after a newline, but not after comments. This PR fixes this issue.

I have also refactored the Lexer tests, so that they don't just inspect the tokens produced by alex, but also those sent to happy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant