SAOC 2023: replace libdparse with dmd #589

snprajwal · 2023-09-15T08:11:46Z

This PR tracks the commits for replacing libdparse with dmd. Unlike with other projects like D-Scanner, dfmt generates the AST only once (in the main function), and uses it across all 5 files. Since these files are tightly coupled together, making this change across multiple PRs is not possible as it will lead to a broken master branch. Instead, this PR will track every change in independent commits, and will be marked ready once dfmt+dmd works exactly how dfmt+libdparse previously did.

NOTE: This PR is work in progress, and will most likely be broken for the foreseeable future.

cc @RazvanN7

dub.json

WebFreak001

is it possible to keep the changes of formatter.d in there and not rename it to ast.d, so the diff can keep all the indentation related stuff if possible?

snprajwal · 2023-11-07T18:33:04Z

is it possible to keep the changes of formatter.d in there and not rename it to ast.d

I've not renamed formatter.d, I've just removed the code that will not be used anymore. ast.d has been introduced as a new file since it deals with walking the AST and doesn't fit into any of the existing files. formatter.d currently only contains the driver code for invoking the AST walk.

so the diff can keep all the indentation related stuff if possible?

I'm not sure I understand what you're referring to as "indentation related"...

Signed-off-by: Prajwal S N <[email protected]>

- dfmt_space_before_function_parameters - dfmt_space_after_cast - dfmt_align_switch_statements - dfmt_space_before_aa_colon Signed-off-by: Prajwal S N <[email protected]>

WebFreak001 · 2023-11-11T14:34:07Z

ah sorry I didn't actually check the diff too much there and was thinking about the layout inside libdparse, where there is also a formatter that works through the visitor pattern like this one does.

I think in the end the changes here will be a useful tool, being accurate with the compiler library, but I don't think the new code should go into this dfmt repository. I think it would be good to have this as a second repository in parallel that could support the same settings and reuse code such as the editorconfig parsing from dfmt, however the differences in how the two formatters are formatting code are too big to upgrade to with old code bases. Additionally new language features will require upgrade and maintenance each time, since otherwise code will be completely unable to parse/format, which with dfmt often just results in new features not looking good, but the rest still working as expected, so in the future it might still be a desired stable formatter that just doesn't get the latest and greatest features like this dmd replacement does.

Instead I think this new dmd-based formatter is something we can introduce slowly to new projects once it's ready. It fundamentally works differently to dfmt since it's actually rewriting the code based on the AST instead of just inserting/removing whitespace like what dfmt does. This of course also adds new opportunities, like being able to reorder imports alphabetically while formatting and adding/removing braces / shortened methods to be consistent with project settings - features which would never make it into dfmt with its current architecture.

For now I think it's good to keep it in this PR here, just for people to also look at it, but eventually we will probably want a new repo for it, for example calling it dmdfmt.

RazvanN7 · 2023-11-15T12:47:18Z

@WebFreak001 Our aim is to have the same functionality as the current dfmt, so there shouldn't be any breaking changes once this project is finished. However, I don't really understand how using dmd as a lib requires more upgrade & maintenance time as opposed to the status quo. Let's say a new feature is added in dmd. If you want to upgrade dfmt right now, you need to add some code to libdparse and then potentially add some code to dfmt to support the new case. With dmd as a lib, you will most likely need to update to a newer version and I suspect that in most cases you won't need to add anything new to the formatter or you would just need to update the dfmt code that uses the visitor classes a bit. All in all, I expect the changes to require less time than what you need to do now.

Now, regarding old code bases: dfmt gets updates from time to time. There is nothing that forces you as a user to use the latest version. If you have an old code base, then it makes sense to use an old compiler and an old version of the formatter. If you are using new features, then you need a newer compiler and a newer formatter. The versioning system in dub (or manual versioning) should take care of this.

My personal opinion is that we should just upgrade dfmt and make sure that people can select the proper version of the formatter that is in sync with the compiler version that they are using. However, if you fear breaking changes, one other way we can do it is to just create a branch with dmddfmt that goes in parallel with the old formatter. That way, people can select the branch that they are interested in. Having 2 separate repos that essentially do the same thing is not something that I am attracted to - we will need to always backport dfmt-frontend additions just to keep in sync.

WebFreak001 · 2023-11-15T13:05:58Z

However, I don't really understand how using dmd as a lib requires more upgrade & maintenance time as opposed to the status quo

currently dfmt needs almost no changes in case of a compiler change, since it mostly doesn't use AST information for formatting. (although it does need it to recognize some things like locations of struct initializers and AAs, where libdparse updates can help in case of new syntax preceding them)

With the new code every AST node seems to have a format function, so every introduction or change to the code how it can be represented as AST needs to be covered by the format functions here.

With old dfmt, new syntax would at worst break through whitespaces, but you could disable it with //dfmt off/on and the rest would still work. With this new approach we need to release a new formatter for every syntax change added, since it seems it needs to implement an AST -> formatted code output visitor method. Also this means that if we allow things like omitting parentheses, e.g. the new @int syntax, it might always be formatted into @(int) until new things are supported.

Since the formatting algorithm and procedure itself is really so different from what dfmt currently does, I would really say it would be better to have it a separate project because there are going to be a bunch of people who will want the more stable old token-based approach instead of something that literally rewrites the AST each time. Maintaining old versions is something we can't do just because of missing maintainers and we don't have the branches setup for it - no user of old dfmt wanting to change how it works will try to touch the old dfmt code branch if it's not master.

Signed-off-by: Prajwal S N <[email protected]>

maxhaton · 2023-12-01T18:12:42Z

Is this working well enough to start getting stuff into the main tree?

maxhaton · 2023-12-01T18:13:03Z

@RazvanN7 I'd like to try getting the tools into the dmd repo at some point.

snprajwal · 2023-12-01T19:21:19Z

Is this working well enough to start getting stuff into the main tree?

It largely works, but there's still some work left. It does not retain comments at the moment, and also does not allow all the configurations that dfmt supports. I'm working on hitting feature parity. I guess it's @RazvanN7's call to make about how this will be merged (into a different branch/a new repo/something else)

WebFreak001 · 2023-12-01T20:42:37Z

I think retaining comments is a very important issue to solve first before getting it to users, extending the feature support can be done later as well

snprajwal · 2023-12-02T08:49:49Z

I think retaining comments is a very important issue to solve first before getting it to users, extending the feature support can be done later as well

Agreed, I've been working on it across the last couple of days. It requires some changes in DMD that I'll have to make before adding support into dmdfmt for it, so it'll take some time.

maxhaton · 2023-12-02T11:28:57Z

I think retaining comments is a very important issue to solve first before getting it to users, extending the feature support can be done later as well

Agreed, I've been working on it across the last couple of days. It requires some changes in DMD that I'll have to make before adding support into dmdfmt for it, so it'll take some time.

Can you put your findings either here or in the D slack. I'm pretty sure lots of people can help you (including me) do this so make sure you get in touch.

Feels like deja Vu, it's come up in the past.

snprajwal · 2023-12-04T11:56:35Z

Here's the situation right now:

DMD lexes comments, but does not store normal comments. It creates a TOK.comment token and moves on with the lexing (see this line. It looks like it doesn't even create the token if we're using DMD as a library. It only stores the comment text if it is a doc comment (///, /** **/, or /++ ++/) -> see this line.
The doc comments stored in the previous step get added to the respective lang item nodes in the AST during parsing in ~78 places in parse.d

One way to store comments is to just change the lexing logic to store all types of comments rather than just doc comments. I don't think this is a feasible solution since it'll require a lot of changes across the rest of the compiler, especially in the DDoc generation logic. It also bloats the size of the AST during parsing with information that is effectively useless to the compiler.

Another way is to introduce a new field into struct Token, something like rawComment. It will store the raw comment strings that the lexer encounters, including the prefix and delimiter (//, /* */, etc). This will again require some change to be made to the parser to propagate this info into the AST nodes, but should not break anything that is currently present. This will allow us to simply dump the raw comment string into the output stream during formatting. I'm still not sure if this is the way to go, or if there is an even easier solution that I'm not able to think of.

The objectively "correct" way to solve this would be to introduce a new stage in the parsing phase where we build a concrete syntax tree (CST). This is the AST enriched with additional information including whitespaces, line breaks, comments, offset in the source file, etc. This CST can be used in the formatter to generate better context-aware output. The compiler will just have another pass that strips all of this info from the CST and reduces it to the current AST (before the semantic passes are run). Again, I'm not sure if this is the way to go, owing the the complexity of implementing this, potential of breaking the existing compiler pipeline, and adding additional overhead into the parsing phase.

WebFreak001 · 2023-12-04T11:59:30Z

in libdparse we solved this by Token having leading and trailing Trivia[] attached to it. Trivia is just tokens of type whitespace and comment (and some other skipped things)

in libdparse each AST node has an array of tokens it was constructed from, so you can eventually go all the way down to the trivia.

This is copied from Roslyn's design and makes it possibly to do a 1:1 reconstruction of the input code from the AST, even after modifying parts of it.

maxhaton · 2023-12-04T12:57:06Z

in libdparse we solved this by Token having leading and trailing Trivia[] attached to it. Trivia is just tokens of type whitespace and comment (and some other skipped things)

in libdparse each AST node has an array of tokens it was constructed from, so you can eventually go all the way down to the trivia.

This is copied from Roslyn's design and makes it possibly to do a 1:1 reconstruction of the input code from the AST, even after modifying parts of it.

The way Roslyn does this is the way this should be done indeed although I'm not sure if it's exactly the same as having some Trivia bolted on (or maybe it is in practice?)

snprajwal · 2023-12-05T18:34:59Z

I discussed this with Razvan, and we both feel that the correct way for us to go ahead would be to store this data correctly during the parsing stage and provide it through the AST, similar to having a trivia[] field. I'm working on adding this into DMD, and will open a PR once I make sure it works with dfmt.

On a tangent, with regards to the question about merging the work in this PR, I don't think we've reached a consensus yet on whether it should go into a separate dmdfmt project repo, or a separate branch in this repo.

WebFreak001 · 2023-12-06T02:29:05Z

since the functionality is so drastically different from how dfmt works I would suggest creating a separate project. Or alternatively it might be better to just have both the libdparse and dmd based formatters in dfmt, but be able to switch between them using a CLI switch like --engine=dmd, since it going to support the dfmt formatting flags as well.

Signed-off-by: Prajwal S N <[email protected]>

The non-conditional styles for `dfmt_template_constraint_style` are supported for now. The conditional ones will require line length tracking, which is yet to be implmented. Signed-off-by: Prajwal S N <[email protected]>

RazvanN7 · 2023-12-14T12:42:54Z

@WebFreak001 I brought the issue of merging dmdlib with dfmt at our Foundation Monthly Meeting and the general consensus was to integrate this work into the mainline dfmt and have dfmt be part of the release archive. That way dfmt is going to be tied with a compiler version and everytime you update the compiler you also get the latest version of dfmt. How does that sound?

maxhaton · 2023-12-14T14:11:49Z

Put it in the compiler tree!

…

On Thu, 14 Dec 2023, 12:43 Razvan Nitu, ***@***.***> wrote: @WebFreak001 <https://github.com/WebFreak001> I brought the issue of merging dmdlib with dfmt at our Foundation Monthly Meeting and the general consensus was to integrate this work into the mainline dfmt and have dfmt be part of the release archive. That way dfmt is going to be tied with a compiler version and everytime you update the compiler you also get the latest version of dfmt. How does that sound? — Reply to this email directly, view it on GitHub <#589 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABLI75BNHIY4DO5EMIZSBDTYJLX5TAVCNFSM6AAAAAA4ZNYQYCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNJVG44DGNRUHE> . You are receiving this because you commented.Message ID: ***@***.***>

WebFreak001 · 2023-12-14T19:26:29Z

as long as we keep both engines (libdparse and dmd) around for a while that sounds like a good idea, it's just that we will still want to support the old dfmt users for a while especially while early issues in the dmd issues are still being sorted out.

We don't want to just introduce completely different formatting behavior onto full projects that have relied on dfmt for a long time and are suddenly getting formatted differently (it will just make the git history of all these projects a mess as well)

maxhaton · 2024-01-03T16:37:56Z

src/dfmt/ast.d

+import dmd.rootobject;
+import dmd.target;
+import dmd.root.string : toDString;
+import dmd.optimize : optimize;


This is breaking CI for some reason.

maxhaton · 2024-01-03T16:39:04Z

src/dfmt/ast.d

+import dmd.rootobject;
+import dmd.target;
+import dmd.root.string : toDString;
+import dmd.optimize : optimize;


Suggested change

import dmd.optimize : optimize;

looks like it can be zapped

It's needed for calling Expression.optimize, which was moved out of expression.d into optimize.d in a recent PR. It breaks CI because I don't think there's been a release with the change yet, I compile dmdfmt with the master branch of dmd locally so I'm sometimes ahead of the CI pipeline. I'll update dub.json to use v2.107.0-beta.1, that should fix it.

Why do you call .optimize specifically?

It's called here and here. It was introduced to fix https://issues.dlang.org/show_bug.cgi?id=7375, and was taken from hdrgen.d in DMD. Now that I look closer at it, it's a compile time check, so shouldn't be needed in dfmt. I'll get rid of it.

maxhaton · 2024-01-03T16:39:57Z

README.md

-* Run ```git submodule update --init --recursive``` in the dfmt directory
-* To compile with DMD, run ```make``` in the dfmt directory. To compile with
-  LDC, run ```make ldc``` instead. The generated binary will be placed in ```dfmt/bin/```.
+* Run `git submodule update --init --recursive` in the dfmt directory


Move these changes into a different PR if this is intentional.

maxhaton · 2024-01-03T16:40:21Z

.editorconfig

-insert_final_newline = true
-max_line_length = 120
-tab_width = 8
+dfmt_single_template_constraint_indent = true


Why are these changes here? Intentional?

Yes, the configs that were removed were redundant, because they're the defaults. The two lines that were added were to maintain consistency between multi-line formatting across the codebase (the default behaviour uses two tabs to indent, which is somewhat unidiomatic, imo). There's also a discussion on the forum about whether to keep this knob or not on dmdfmt.

Its good practice (for good reason) to make PRs as small as possible. If you want to change this it should be a separate pull request for that reason.

maxhaton · 2024-01-03T16:41:09Z

makefile

+DMD_ROOT_SRC := \
+	$(shell find dmd/compiler/src/dmd/common -name "*.d")\
+	$(shell find dmd/compiler/src/dmd/root -name "*.d")
+DMD_LEXER_SRC := \
+	dmd/compiler/src/dmd/console.d \
+	dmd/compiler/src/dmd/entity.d \
+	dmd/compiler/src/dmd/errors.d \
+	dmd/compiler/src/dmd/errorsink.d \
+	dmd/compiler/src/dmd/location.d \
+	dmd/compiler/src/dmd/file_manager.d \
+	dmd/compiler/src/dmd/globals.d \
+	dmd/compiler/src/dmd/id.d \
+	dmd/compiler/src/dmd/identifier.d \
+	dmd/compiler/src/dmd/lexer.d \
+	dmd/compiler/src/dmd/tokens.d \
+	dmd/compiler/src/dmd/utils.d \
+	$(DMD_ROOT_SRC)
+
+DMD_PARSER_SRC := \
+	dmd/compiler/src/dmd/astbase.d \
+	dmd/compiler/src/dmd/parse.d \
+	dmd/compiler/src/dmd/parsetimevisitor.d \
+	dmd/compiler/src/dmd/transitivevisitor.d \
+	dmd/compiler/src/dmd/permissivevisitor.d \
+	dmd/compiler/src/dmd/strictvisitor.d \
+	dmd/compiler/src/dmd/astenums.d \
+	$(DMD_LEXER_SRC)


Might dmd -i be useful here? This makes you quite brittle towards upstream changes.

Yeah I'm not really sure about keeping these changes, because they refused to work once I started using ASTCodegen. I tried a bunch of things to fix it, but didn't manage to make it work. dmdfmt currently works with dub build, but the Makefile itself is pretty broken. Even with every single D file in DMD included, it would die during linking.

maxhaton · 2024-01-03T16:44:16Z

src/dfmt/formatter.d

-import dparse.lexer;
-import dparse.parser;
-import dparse.rollback_allocator;
-import dfmt.ast_info;


keep libdparse around for now, abstract your additions into some kind of API.

You mean libdparse and dmd should be a switchable flag in dfmt?

Yes, both for redundancy and because it will make your diff smaller. It's much easier to merge when you are adding rather than replacing.

Some breaking changes in upstream borked the AST walker, it's been fixed in this commit. Signed-off-by: Prajwal S N <[email protected]>

Signed-off-by: Prajwal S N <[email protected]>

_sigh_ should've done this at the beginning, but better late than never. Signed-off-by: Prajwal S N <[email protected]>

Signed-off-by: Prajwal S N <[email protected]>

snprajwal force-pushed the saoc-2023 branch from 43b2e48 to 5fab59b Compare September 15, 2023 08:49

rikkimax reviewed Sep 22, 2023

View reviewed changes

dub.json Outdated Show resolved Hide resolved

snprajwal force-pushed the saoc-2023 branch from 5fab59b to 09a580e Compare September 27, 2023 17:57

snprajwal force-pushed the saoc-2023 branch 2 times, most recently from c3a276f to 14a2326 Compare November 6, 2023 18:12

WebFreak001 reviewed Nov 7, 2023

View reviewed changes

snprajwal force-pushed the saoc-2023 branch from 14a2326 to 412a0ce Compare November 8, 2023 11:28

snprajwal added 6 commits November 8, 2023 23:40

chore(deps): add dmd library

34174ba

Signed-off-by: Prajwal S N <[email protected]>

refactor: lexer in tokens.d

61206bc

Signed-off-by: Prajwal S N <[email protected]>

refactor: lexer in indentation.d and wrapping.d

806e758

Signed-off-by: Prajwal S N <[email protected]>

feat: make it compile with dmd AST

6c1c543

Signed-off-by: Prajwal S N <[email protected]>

feat: implement visitors and writers

3ac8e5a

Signed-off-by: Prajwal S N <[email protected]>

fix: add indentation logic

decdc79

Signed-off-by: Prajwal S N <[email protected]>

snprajwal force-pushed the saoc-2023 branch from 412a0ce to 3ac8e5a Compare November 8, 2023 18:17

snprajwal added 2 commits November 10, 2023 00:02

fix: iron out some bugs

ded6e8d

Signed-off-by: Prajwal S N <[email protected]>

feat: add 4 transformations

90c9040

- dfmt_space_before_function_parameters - dfmt_space_after_cast - dfmt_align_switch_statements - dfmt_space_before_aa_colon Signed-off-by: Prajwal S N <[email protected]>

snprajwal added 3 commits November 23, 2023 00:11

feat: add dfmt_space_after_keywords

62e47bb

Signed-off-by: Prajwal S N <[email protected]>

feat: add dfmt_selective_import_space

b4e36d9

Signed-off-by: Prajwal S N <[email protected]>

feat: add brace styles

07bfe5c

Signed-off-by: Prajwal S N <[email protected]>

snprajwal added 3 commits December 13, 2023 18:40

feat: add dfmt_compact_labeled_statements

14024a5

Signed-off-by: Prajwal S N <[email protected]>

feat: add dfmt_space_before_named_arg_colon

0259887

Signed-off-by: Prajwal S N <[email protected]>

feat: non-conditional template constraints styles

0897c83

The non-conditional styles for `dfmt_template_constraint_style` are supported for now. The conditional ones will require line length tracking, which is yet to be implmented. Signed-off-by: Prajwal S N <[email protected]>

maxhaton reviewed Jan 3, 2024

View reviewed changes

maxhaton requested changes Jan 3, 2024

View reviewed changes

snprajwal added 6 commits January 4, 2024 21:18

chore: make it compile with latest DMD

3b80011

Some breaking changes in upstream borked the AST walker, it's been fixed in this commit. Signed-off-by: Prajwal S N <[email protected]>

fix: store unit tests in AST

16028a8

Signed-off-by: Prajwal S N <[email protected]>

feat: track line length

6e97ea8

Signed-off-by: Prajwal S N <[email protected]>

feat: conditional template constraint style

7f81285

Signed-off-by: Prajwal S N <[email protected]>

chore: run dfmt on all files

3eb0928

_sigh_ should've done this at the beginning, but better late than never. Signed-off-by: Prajwal S N <[email protected]>

fix: use temporary buffer for conditional newline

90bb91d

Signed-off-by: Prajwal S N <[email protected]>

snprajwal force-pushed the saoc-2023 branch from 7d1584b to 90bb91d Compare January 4, 2024 15:48

snprajwal added 4 commits January 4, 2024 21:18

chore: revert changes to README

6ef9fba

Signed-off-by: Prajwal S N <[email protected]>

feat: add dfmt_split_operator_at_line_end

325f093

Signed-off-by: Prajwal S N <[email protected]>

feat: add dfmt_single_template_constraint_indent

864caad

Signed-off-by: Prajwal S N <[email protected]>

fix: boolean logic for line split

b845d96

Signed-off-by: Prajwal S N <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SAOC 2023: replace libdparse with dmd #589

SAOC 2023: replace libdparse with dmd #589

snprajwal commented Sep 15, 2023

WebFreak001 left a comment

snprajwal commented Nov 7, 2023

WebFreak001 commented Nov 11, 2023

RazvanN7 commented Nov 15, 2023 •

edited

Loading

WebFreak001 commented Nov 15, 2023

maxhaton commented Dec 1, 2023

maxhaton commented Dec 1, 2023

snprajwal commented Dec 1, 2023

WebFreak001 commented Dec 1, 2023

snprajwal commented Dec 2, 2023

maxhaton commented Dec 2, 2023

snprajwal commented Dec 4, 2023

WebFreak001 commented Dec 4, 2023

maxhaton commented Dec 4, 2023

snprajwal commented Dec 5, 2023

WebFreak001 commented Dec 6, 2023

RazvanN7 commented Dec 14, 2023

maxhaton commented Dec 14, 2023 via email

WebFreak001 commented Dec 14, 2023

maxhaton Jan 3, 2024

maxhaton Jan 3, 2024

snprajwal Jan 3, 2024

maxhaton Jan 3, 2024

maxhaton Jan 3, 2024

snprajwal Jan 4, 2024

maxhaton Jan 3, 2024

maxhaton Jan 3, 2024

snprajwal Jan 3, 2024

maxhaton Jan 3, 2024

maxhaton Jan 3, 2024

snprajwal Jan 3, 2024

maxhaton Jan 3, 2024

snprajwal Jan 3, 2024

maxhaton Jan 3, 2024

SAOC 2023: replace libdparse with dmd #589

Are you sure you want to change the base?

SAOC 2023: replace libdparse with dmd #589

Conversation

snprajwal commented Sep 15, 2023

WebFreak001 left a comment

Choose a reason for hiding this comment

snprajwal commented Nov 7, 2023

WebFreak001 commented Nov 11, 2023

RazvanN7 commented Nov 15, 2023 • edited Loading

WebFreak001 commented Nov 15, 2023

maxhaton commented Dec 1, 2023

maxhaton commented Dec 1, 2023

snprajwal commented Dec 1, 2023

WebFreak001 commented Dec 1, 2023

snprajwal commented Dec 2, 2023

maxhaton commented Dec 2, 2023

snprajwal commented Dec 4, 2023

WebFreak001 commented Dec 4, 2023

maxhaton commented Dec 4, 2023

snprajwal commented Dec 5, 2023

WebFreak001 commented Dec 6, 2023

RazvanN7 commented Dec 14, 2023

maxhaton commented Dec 14, 2023 via email

WebFreak001 commented Dec 14, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RazvanN7 commented Nov 15, 2023 •

edited

Loading