Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug causing memory corruption #17

Open
uben0 opened this issue Dec 23, 2023 · 4 comments
Open

Bug causing memory corruption #17

uben0 opened this issue Dec 23, 2023 · 4 comments

Comments

@uben0
Copy link
Owner

uben0 commented Dec 23, 2023

In Helix editor, when writing any file format that support embedded languages like raw blocks in markdown or typst:

    ```typst
    _
    -
    ```

It causes a segfault when the embedded Typst code contains delimited context like _ or * with indentation sensitive items like - or +.

I don't know if it happens in other editors. It seems to be caused by the external scanner when it calls lexer->get_column(), but the segfault does not occur during this call, it occurs outside of the scanner.

Could somebody try this on Helix on his machine to know if this is reproducible or not? And also test with another editor, like Neovim or Emacs.

@Ziqi-Yang
Copy link
Collaborator

Ziqi-Yang commented Dec 24, 2023

In Emacs, markdown mode can open code block in another buffer with correct mode (like rust, c, typst-ts). The opened buffer for editing other tree sitter mode works correctly (syntax highlighting, indentation, etc.) However, when opening typst-ts-mode, it produces an error on startup:

Debugger entered--Lisp error: (wrong-type-argument stringp nil)
  typst-ts-mode()

It is probably not caused by the Emacs lisp code, but the parser. As a result, all the features (like syntax highlighting) gone.


The tested markdown content is:

```typst-ts

```

@monaqa
Copy link
Contributor

monaqa commented Dec 26, 2023

I tested this with Neovim (v0.10.0-dev-530 g8376e8700) in macOS, which I use regularly, and it reproduced the problem.
The editor itself was killed when I tried to paste the target code into a Typst or Markdown file (reproduced in both of them).

@uben0 uben0 changed the title Embedded Typst code causes segfault in Helix Embedded Typst code causes segfault Dec 27, 2023
@uben0 uben0 changed the title Embedded Typst code causes segfault Bug causing memory corruption Jan 12, 2024
@uben0
Copy link
Owner Author

uben0 commented Jan 12, 2024

The bug seems to be broader than just related to indentation in embedded Typst code. I do get regularly a tree sitter failure stopping syntax highlighting for arbitrary simple files. I will try to pin down the problem but I need a way to reproduce it systematically.

Another track I am looking at is lexer simplification. I believe the actual lexer is too complex and doesn't have to be. The problem is that it sequentially try to match available token instead of looking at the available next character and pin down the corresponding token. I will attempt a rewrite with this approach.

Also, I want to migrate all tokenization to the external lexer. This would remove the need for get_column() calls (which I suspect are the cause of the bug). Which is impossible do to the current approach.

@uben0
Copy link
Owner Author

uben0 commented May 5, 2024

It looks like with the new version of Tree Sitter, the bug has been fixed. Could someone try to reproduce it with a recent version of Tree Sitter? On my side, I could not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants