-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SAOC 2023: replace libdparse with dmd #589
base: master
Are you sure you want to change the base?
Conversation
43b2e48
to
5fab59b
Compare
5fab59b
to
09a580e
Compare
c3a276f
to
14a2326
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it possible to keep the changes of formatter.d in there and not rename it to ast.d, so the diff can keep all the indentation related stuff if possible?
I've not renamed
I'm not sure I understand what you're referring to as "indentation related"... |
Signed-off-by: Prajwal S N <[email protected]>
Signed-off-by: Prajwal S N <[email protected]>
Signed-off-by: Prajwal S N <[email protected]>
Signed-off-by: Prajwal S N <[email protected]>
Signed-off-by: Prajwal S N <[email protected]>
Signed-off-by: Prajwal S N <[email protected]>
Signed-off-by: Prajwal S N <[email protected]>
- dfmt_space_before_function_parameters - dfmt_space_after_cast - dfmt_align_switch_statements - dfmt_space_before_aa_colon Signed-off-by: Prajwal S N <[email protected]>
ah sorry I didn't actually check the diff too much there and was thinking about the layout inside libdparse, where there is also a formatter that works through the visitor pattern like this one does. I think in the end the changes here will be a useful tool, being accurate with the compiler library, but I don't think the new code should go into this dfmt repository. I think it would be good to have this as a second repository in parallel that could support the same settings and reuse code such as the editorconfig parsing from dfmt, however the differences in how the two formatters are formatting code are too big to upgrade to with old code bases. Additionally new language features will require upgrade and maintenance each time, since otherwise code will be completely unable to parse/format, which with dfmt often just results in new features not looking good, but the rest still working as expected, so in the future it might still be a desired stable formatter that just doesn't get the latest and greatest features like this dmd replacement does. Instead I think this new dmd-based formatter is something we can introduce slowly to new projects once it's ready. It fundamentally works differently to dfmt since it's actually rewriting the code based on the AST instead of just inserting/removing whitespace like what dfmt does. This of course also adds new opportunities, like being able to reorder imports alphabetically while formatting and adding/removing braces / shortened methods to be consistent with project settings - features which would never make it into dfmt with its current architecture. For now I think it's good to keep it in this PR here, just for people to also look at it, but eventually we will probably want a new repo for it, for example calling it dmdfmt. |
@WebFreak001 Our aim is to have the same functionality as the current dfmt, so there shouldn't be any breaking changes once this project is finished. However, I don't really understand how using dmd as a lib requires more upgrade & maintenance time as opposed to the status quo. Let's say a new feature is added in dmd. If you want to upgrade dfmt right now, you need to add some code to libdparse and then potentially add some code to dfmt to support the new case. With dmd as a lib, you will most likely need to update to a newer version and I suspect that in most cases you won't need to add anything new to the formatter or you would just need to update the dfmt code that uses the visitor classes a bit. All in all, I expect the changes to require less time than what you need to do now. Now, regarding old code bases: dfmt gets updates from time to time. There is nothing that forces you as a user to use the latest version. If you have an old code base, then it makes sense to use an old compiler and an old version of the formatter. If you are using new features, then you need a newer compiler and a newer formatter. The versioning system in dub (or manual versioning) should take care of this. My personal opinion is that we should just upgrade dfmt and make sure that people can select the proper version of the formatter that is in sync with the compiler version that they are using. However, if you fear breaking changes, one other way we can do it is to just create a branch with dmddfmt that goes in parallel with the old formatter. That way, people can select the branch that they are interested in. Having 2 separate repos that essentially do the same thing is not something that I am attracted to - we will need to always backport dfmt-frontend additions just to keep in sync. |
currently dfmt needs almost no changes in case of a compiler change, since it mostly doesn't use AST information for formatting. (although it does need it to recognize some things like locations of struct initializers and AAs, where libdparse updates can help in case of new syntax preceding them) With the new code every AST node seems to have a format function, so every introduction or change to the code how it can be represented as AST needs to be covered by the format functions here. With old dfmt, new syntax would at worst break through whitespaces, but you could disable it with Since the formatting algorithm and procedure itself is really so different from what dfmt currently does, I would really say it would be better to have it a separate project because there are going to be a bunch of people who will want the more stable old token-based approach instead of something that literally rewrites the AST each time. Maintaining old versions is something we can't do just because of missing maintainers and we don't have the branches setup for it - no user of old dfmt wanting to change how it works will try to touch the old dfmt code branch if it's not master. |
Signed-off-by: Prajwal S N <[email protected]>
Signed-off-by: Prajwal S N <[email protected]>
Signed-off-by: Prajwal S N <[email protected]>
Is this working well enough to start getting stuff into the main tree? |
@RazvanN7 I'd like to try getting the tools into the dmd repo at some point. |
It largely works, but there's still some work left. It does not retain comments at the moment, and also does not allow all the configurations that dfmt supports. I'm working on hitting feature parity. I guess it's @RazvanN7's call to make about how this will be merged (into a different branch/a new repo/something else) |
I think retaining comments is a very important issue to solve first before getting it to users, extending the feature support can be done later as well |
Agreed, I've been working on it across the last couple of days. It requires some changes in DMD that I'll have to make before adding support into dmdfmt for it, so it'll take some time. |
Can you put your findings either here or in the D slack. I'm pretty sure lots of people can help you (including me) do this so make sure you get in touch. Feels like deja Vu, it's come up in the past. |
Here's the situation right now:
One way to store comments is to just change the lexing logic to store all types of comments rather than just doc comments. I don't think this is a feasible solution since it'll require a lot of changes across the rest of the compiler, especially in the DDoc generation logic. It also bloats the size of the AST during parsing with information that is effectively useless to the compiler. Another way is to introduce a new field into The objectively "correct" way to solve this would be to introduce a new stage in the parsing phase where we build a concrete syntax tree (CST). This is the AST enriched with additional information including whitespaces, line breaks, comments, offset in the source file, etc. This CST can be used in the formatter to generate better context-aware output. The compiler will just have another pass that strips all of this info from the CST and reduces it to the current AST (before the semantic passes are run). Again, I'm not sure if this is the way to go, owing the the complexity of implementing this, potential of breaking the existing compiler pipeline, and adding additional overhead into the parsing phase. |
in libdparse we solved this by Token having leading and trailing Trivia[] attached to it. Trivia is just tokens of type whitespace and comment (and some other skipped things) in libdparse each AST node has an array of tokens it was constructed from, so you can eventually go all the way down to the trivia. This is copied from Roslyn's design and makes it possibly to do a 1:1 reconstruction of the input code from the AST, even after modifying parts of it. |
The way Roslyn does this is the way this should be done indeed although I'm not sure if it's exactly the same as having some Trivia bolted on (or maybe it is in practice?) |
I discussed this with Razvan, and we both feel that the correct way for us to go ahead would be to store this data correctly during the parsing stage and provide it through the AST, similar to having a On a tangent, with regards to the question about merging the work in this PR, I don't think we've reached a consensus yet on whether it should go into a separate |
since the functionality is so drastically different from how dfmt works I would suggest creating a separate project. Or alternatively it might be better to just have both the libdparse and dmd based formatters in dfmt, but be able to switch between them using a CLI switch like |
Signed-off-by: Prajwal S N <[email protected]>
Signed-off-by: Prajwal S N <[email protected]>
The non-conditional styles for `dfmt_template_constraint_style` are supported for now. The conditional ones will require line length tracking, which is yet to be implmented. Signed-off-by: Prajwal S N <[email protected]>
@WebFreak001 I brought the issue of merging dmdlib with dfmt at our Foundation Monthly Meeting and the general consensus was to integrate this work into the mainline dfmt and have dfmt be part of the release archive. That way dfmt is going to be tied with a compiler version and everytime you update the compiler you also get the latest version of dfmt. How does that sound? |
Put it in the compiler tree!
…On Thu, 14 Dec 2023, 12:43 Razvan Nitu, ***@***.***> wrote:
@WebFreak001 <https://github.com/WebFreak001> I brought the issue of
merging dmdlib with dfmt at our Foundation Monthly Meeting and the general
consensus was to integrate this work into the mainline dfmt and have dfmt
be part of the release archive. That way dfmt is going to be tied with a
compiler version and everytime you update the compiler you also get the
latest version of dfmt. How does that sound?
—
Reply to this email directly, view it on GitHub
<#589 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABLI75BNHIY4DO5EMIZSBDTYJLX5TAVCNFSM6AAAAAA4ZNYQYCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNJVG44DGNRUHE>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
as long as we keep both engines (libdparse and dmd) around for a while that sounds like a good idea, it's just that we will still want to support the old dfmt users for a while especially while early issues in the dmd issues are still being sorted out. We don't want to just introduce completely different formatting behavior onto full projects that have relied on dfmt for a long time and are suddenly getting formatted differently (it will just make the git history of all these projects a mess as well) |
src/dfmt/ast.d
Outdated
import dmd.rootobject; | ||
import dmd.target; | ||
import dmd.root.string : toDString; | ||
import dmd.optimize : optimize; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is breaking CI for some reason.
src/dfmt/ast.d
Outdated
import dmd.rootobject; | ||
import dmd.target; | ||
import dmd.root.string : toDString; | ||
import dmd.optimize : optimize; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
import dmd.optimize : optimize; |
looks like it can be zapped
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's needed for calling Expression.optimize
, which was moved out of expression.d
into optimize.d
in a recent PR. It breaks CI because I don't think there's been a release with the change yet, I compile dmdfmt with the master branch of dmd locally so I'm sometimes ahead of the CI pipeline. I'll update dub.json
to use v2.107.0-beta.1, that should fix it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you call .optimize specifically?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and where
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's called here and here. It was introduced to fix https://issues.dlang.org/show_bug.cgi?id=7375, and was taken from hdrgen.d
in DMD. Now that I look closer at it, it's a compile time check, so shouldn't be needed in dfmt. I'll get rid of it.
README.md
Outdated
* Run ```git submodule update --init --recursive``` in the dfmt directory | ||
* To compile with DMD, run ```make``` in the dfmt directory. To compile with | ||
LDC, run ```make ldc``` instead. The generated binary will be placed in ```dfmt/bin/```. | ||
* Run `git submodule update --init --recursive` in the dfmt directory |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move these changes into a different PR if this is intentional.
.editorconfig
Outdated
insert_final_newline = true | ||
max_line_length = 120 | ||
tab_width = 8 | ||
dfmt_single_template_constraint_indent = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are these changes here? Intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the configs that were removed were redundant, because they're the defaults. The two lines that were added were to maintain consistency between multi-line formatting across the codebase (the default behaviour uses two tabs to indent, which is somewhat unidiomatic, imo). There's also a discussion on the forum about whether to keep this knob or not on dmdfmt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its good practice (for good reason) to make PRs as small as possible. If you want to change this it should be a separate pull request for that reason.
DMD_ROOT_SRC := \ | ||
$(shell find dmd/compiler/src/dmd/common -name "*.d")\ | ||
$(shell find dmd/compiler/src/dmd/root -name "*.d") | ||
DMD_LEXER_SRC := \ | ||
dmd/compiler/src/dmd/console.d \ | ||
dmd/compiler/src/dmd/entity.d \ | ||
dmd/compiler/src/dmd/errors.d \ | ||
dmd/compiler/src/dmd/errorsink.d \ | ||
dmd/compiler/src/dmd/location.d \ | ||
dmd/compiler/src/dmd/file_manager.d \ | ||
dmd/compiler/src/dmd/globals.d \ | ||
dmd/compiler/src/dmd/id.d \ | ||
dmd/compiler/src/dmd/identifier.d \ | ||
dmd/compiler/src/dmd/lexer.d \ | ||
dmd/compiler/src/dmd/tokens.d \ | ||
dmd/compiler/src/dmd/utils.d \ | ||
$(DMD_ROOT_SRC) | ||
|
||
DMD_PARSER_SRC := \ | ||
dmd/compiler/src/dmd/astbase.d \ | ||
dmd/compiler/src/dmd/parse.d \ | ||
dmd/compiler/src/dmd/parsetimevisitor.d \ | ||
dmd/compiler/src/dmd/transitivevisitor.d \ | ||
dmd/compiler/src/dmd/permissivevisitor.d \ | ||
dmd/compiler/src/dmd/strictvisitor.d \ | ||
dmd/compiler/src/dmd/astenums.d \ | ||
$(DMD_LEXER_SRC) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might dmd -i
be useful here? This makes you quite brittle towards upstream changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I'm not really sure about keeping these changes, because they refused to work once I started using ASTCodegen
. I tried a bunch of things to fix it, but didn't manage to make it work. dmdfmt currently works with dub build
, but the Makefile itself is pretty broken. Even with every single D file in DMD included, it would die during linking.
import dparse.lexer; | ||
import dparse.parser; | ||
import dparse.rollback_allocator; | ||
import dfmt.ast_info; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
keep libdparse around for now, abstract your additions into some kind of API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean libdparse and dmd should be a switchable flag in dfmt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, both for redundancy and because it will make your diff smaller. It's much easier to merge when you are adding rather than replacing.
Some breaking changes in upstream borked the AST walker, it's been fixed in this commit. Signed-off-by: Prajwal S N <[email protected]>
Signed-off-by: Prajwal S N <[email protected]>
Signed-off-by: Prajwal S N <[email protected]>
Signed-off-by: Prajwal S N <[email protected]>
_sigh_ should've done this at the beginning, but better late than never. Signed-off-by: Prajwal S N <[email protected]>
Signed-off-by: Prajwal S N <[email protected]>
Signed-off-by: Prajwal S N <[email protected]>
Signed-off-by: Prajwal S N <[email protected]>
Signed-off-by: Prajwal S N <[email protected]>
Signed-off-by: Prajwal S N <[email protected]>
This PR tracks the commits for replacing libdparse with dmd. Unlike with other projects like D-Scanner, dfmt generates the AST only once (in the main function), and uses it across all 5 files. Since these files are tightly coupled together, making this change across multiple PRs is not possible as it will lead to a broken master branch. Instead, this PR will track every change in independent commits, and will be marked ready once dfmt+dmd works exactly how dfmt+libdparse previously did.
cc @RazvanN7