`tex_parser`

⚠️ Warning: This is not a formal TeX parser. TeX cannot actually be parsed as a context free grammar, and requires a turing machine to handle certain edge cases.

Despite this, most real world TeX is "well-behaved" enough that a context free grammar may suffice. To that end, this library attempts to provide a "best effort" solution to parsing TeX. This library is naturally opinionated.

How it works

tex_parser uses pest under the hood to parse the file (the grammar can be found in tex.pest) and turns it into an AST.

Example file and AST output

Consider the following file

\begin{document}
  Hello world
\end{document}

The resulting AST would be

[
    Cmd(
        Command {
            name: "begin",
            args: [
                Required(
                    [
                        Text(
                            "document",
                        ),
                    ],
                ),
            ],
        },
    ),
    Text(
        "Hello World",
    ),
    Cmd(
        Command {
            name: "end",
            args: [
                Required(
                    [
                        Text(
                            "document",
                        ),
                    ],
                ),
            ],
        },
    ),
]

Once this is done, several (opinionated) optimizations can be performed such as

environment merging

Example

The following AST

[
    Cmd(
        Command {
            name: "begin",
            args: [
                Required(
                    [
                        Text(
                            "document",
                        ),
                    ],
                ),
            ],
        },
    ),
    Text(
        "Hello World",
    ),
    Cmd(
        Command {
            name: "end",
            args: [
                Required(
                    [
                        Text(
                            "document",
                        ),
                    ],
                ),
            ],
        },
    ),
]

would turn into

[
    Environment {
        name: "document",
        args: [
            Required(
                [
                    Text(
                        "document",
                    ),
                ],
            ),
        ],
        contents: [
            Text(
                "Hello World",
            ),
        ],
    },
]

command substitution (not yet implemented)
Example

For the following file
```
\def\R{\mathbb R}

\R
```
All instances of \R in the AST would be replaced with {\mathbb R}.

Current (known) Limitations

Parsing TeX is hard. Consider the following example

% command 1
\section
{My Title}

% command 2
\medskip
{\bf unrelated}

Command 1 is a section, which takes one argument, while command 2, medskip, takes none.

Currently, the grammar keeps the definitions of commands as general as possible, but this introduces false positives. Currently, the parser will parse both commands above as commands of one argument.

One solution to this problem might be to check all commands in the AST for the correct number of arguments. This could be done by keeping a list of default (or common) tex commands and their number of arguments, and building such rules dynamically for new commands defined in the file.

Currently, tex_parser does not do this.

Trying it

Despite limitations described above, tex_parser still works surprisingly well on a large portion of tex files. You can try it yourself by running

cargo run --example text

Found a Bug?

If you think you've found a bug, or an example of incorrectly parsed tex, open an issue and I'll be happy to look into it! Contributions and feedback are greatly welcome.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
examples/text		examples/text
src		src
tests		tests
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

`tex_parser`

How it works

Current (known) Limitations

Trying it

Found a Bug?

About

Releases

Packages

Languages

llGaetanll/tex_parser

Folders and files

Latest commit

History

Repository files navigation

tex_parser

How it works

Current (known) Limitations

Trying it

Found a Bug?

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

`tex_parser`

Packages