Skip to content

Support ethdebug source locations under EOF #15994

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: develop
Choose a base branch
from

Conversation

clonker
Copy link
Member

@clonker clonker commented Apr 10, 2025

  • Adds an EthdebugSchema header with the relevant part of the schema mapped to structs with corresponding to_json methods and validations
  • Skips the context for instructions with source locations that are not valid (ie, (-1, -1))
  • Adds source location info in evmasm assembly for assembleEOF
  • Refactors legacy assemble to use RAII-style instruction location gathering

Fixes the unoptimized part of #15978.
Fixes #15998.

@clonker clonker force-pushed the eof_source_locations_unoptimized branch 2 times, most recently from 20d6a64 to fb559fb Compare April 11, 2025 09:12
@clonker clonker marked this pull request as ready for review April 11, 2025 09:38
@clonker clonker force-pushed the eof_source_locations_unoptimized branch 9 times, most recently from e6a1055 to 2a91773 Compare April 12, 2025 06:49
@clonker clonker requested review from cameel and aarlt April 12, 2025 09:12
@clonker clonker force-pushed the eof_source_locations_unoptimized branch 5 times, most recently from d82b70c to 0fd1232 Compare April 15, 2025 07:16
@clonker clonker force-pushed the eof_source_locations_unoptimized branch from 0fd1232 to cf6aa71 Compare April 17, 2025 08:18
@clonker clonker requested a review from matheusaaguiar April 17, 2025 09:11
matheusaaguiar
matheusaaguiar previously approved these changes Apr 22, 2025
Copy link
Collaborator

@matheusaaguiar matheusaaguiar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link
Member

@cameel cameel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to take a closer look at this, especially the Assembly.cpp part, but for now just a few small annoyances I found while doing a quick initial pass.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The output here is quite long and seems to have little to do with ethdebug itself (the ethdebug JSON gets stripped). Do we need it all? What's the point?

Copy link
Member Author

@clonker clonker Apr 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I just added it because there is the same test for non-eof. And looking at it now, I agree. There doesn't seem to be much value to keep it (or: them) around, especially with #16009 around the corner. I have removed this one.

@clonker clonker force-pushed the eof_source_locations_unoptimized branch 3 times, most recently from 3c02df0 to f9a0985 Compare April 23, 2025 06:51
@clonker clonker requested a review from cameel April 28, 2025 15:44
Copy link
Member

@aarlt aarlt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Member

@aarlt aarlt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok.. at least on the cli I just saw that ethdebug returns null for the interface (e.g. using test/libsolidity/semanticTests/interfaceID/homer.sol) - I guess we want an empty json object there. I think it would be nice to add a small test for that simple interface case.

@clonker clonker force-pushed the eof_source_locations_unoptimized branch 4 times, most recently from 42d5d8f to 7eecf23 Compare May 6, 2025 12:35
"C": {
"evm": {
"bytecode": {
"ethdebug": null
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I thought those would get automatically removed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's even better if they don't. if you request ethdebug output there should be an ethdebug output artifact there that indicates that there is no ethdebug output :)

@aarlt
Copy link
Member

aarlt commented May 7, 2025

ok.. at least on the cli I just saw that ethdebug returns null for the interface (e.g. using test/libsolidity/semanticTests/interfaceID/homer.sol) - I guess we want an empty json object there. I think it would be nice to add a small test for that simple interface case.

regarding the null stuff I changed my mind - lets fix this later if its really needed.

@clonker clonker force-pushed the eof_source_locations_unoptimized branch from 7eecf23 to e696889 Compare May 7, 2025 09:27
aarlt
aarlt previously approved these changes May 9, 2025
@clonker clonker force-pushed the eof_source_locations_unoptimized branch from e696889 to 667d2e3 Compare May 9, 2025 10:37
aarlt
aarlt previously approved these changes May 9, 2025
Comment on lines 87 to 91
m_instructionLocations.emplace_back(LinkerObject::InstructionLocation{
.start = m_instructionLocationStart,
.end = end,
.assemblyItemIndex = m_assemblyItemIndex
});
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be as follows:

Suggested change
m_instructionLocations.emplace_back(LinkerObject::InstructionLocation{
.start = m_instructionLocationStart,
.end = end,
.assemblyItemIndex = m_assemblyItemIndex
});
m_instructionLocations.emplace_back(
m_instructionLocationStart,
end,
m_assemblyItemIndex
);

Otherwise it's equivalent to a push_back, i.e. copied instead of constructed in place.

Copy link
Member Author

@clonker clonker May 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup but clang (in the version we use it at least) doesn't like that :( in the end it's POD though so the effect should be marginal at best

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest using push_back() in that situation though. The whole shtick of emplace_back() is that it will perform an implicit conversion. When you know the exact type, you don't really want that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally I'd rather say the whole shtick is that stuff is created in-place (if possible) and we avoid an additional move. This is still the case if I know the precise type.
In our specific case there is exactly zero performance (or copy/move semantic) difference between push_back and emplace_back - both perform a move into an allocated memory region and that is it. For the sake of clarity I have changed it to push_back, though.

Copy link
Member

@cameel cameel May 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the creation in place happens via an implicit conversion (basically, though choosing a constructor, though it could be one with multiple args; which is why sometimes use explicit to prevent such conversions).

I'm just pointing this out because for some time I've seen a push to use the emplace-style methods everywhere and wanted to push back against it a little :P. push_back() actually seems to me like a better default, because it's stricter about what it accepts. I'd only use emplace_back() when I know I want such a conversion or construction in-place.

Copy link
Member Author

@clonker clonker May 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to derail this PR and I think the push-back is great! Although I do have to push back a bit on the push-back. You absolutely can have an explicit constructor and still use emplace_back by virtue of its args. It is simply forwarded into std::construct_at / placement new. So this can perform implicit conversion but doesn't have to :)

For me personally it's more of a performance thing. For small objects in a tight loop or objects that are expensive to move it can make quite a bit of difference. Implicit conversions are (sometimes) mean.

/// instruction locations vector.
/// If the instruction decomposes into multiple individual evm instructions, `emit` can be
/// called for all but the last one (which will be emitted by the destructor).
class InstructionLocationEmitter
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this refactor strictly necessary in this PR?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no. it does however disentangle the ethdebug metadata stuff a bit from the assembly items themselves and unifies the style.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it would have been better off as a separate PR. Actually wanted to suggest that at some point.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't feel like it is necessary in this case. It is a cosmetic change and not a ton of lines of change either. Not that it matters now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the main reason is that it touches a critical part of the compiler and is completely independent of the rest of the PR. I was only really interested in this part, and it looks fine so would be best if we could merge it right away (not sure what the state of the rest is, but no reason for it to hold it back).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. I'll extract it!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-> #16052

schema::program::Instruction::Operation operation;
operation.mnemonic = instructionInfo(static_cast<Instruction>(_linkerObject.bytecode[_start]), _assembly.evmVersion()).name;
static size_t constexpr instructionSize = 1;
if (_start + instructionSize < _end)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might even make sense to replace this with an assert as this could end up with an empty operation, which is not desired?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is an assert that start < end. which leaves start + 1== end as possibility, ie, an op with no argument data - which is desired and possible I'd think

schema::materials::SourceRange::Range locationRange(langutil::SourceLocation const& _location)
{
return {
.length = schema::data::Unsigned{_location.end - _location.start},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assert

Copy link
Member Author

@clonker clonker May 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assert already is in Unsigned.

@clonker clonker force-pushed the eof_source_locations_unoptimized branch from 667d2e3 to bcf1c34 Compare May 14, 2025 15:16
Copy link
Member

@cameel cameel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only looked at the Assembly changes so far, but they look good. We could merge that if you extract it into a separate PR.

@@ -4,7 +4,7 @@ Language Features:


Compiler Features:

* ethdebug: Experimental support for instructions and source locations under EOF.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* ethdebug: Experimental support for instructions and source locations under EOF.
* ethdebug: Experimental support for instructions and source locations under EOF.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uhh. Happy to fix this - but why? It's not like there's a blank line between bullet point list and headline anywhere else.

Copy link
Member

@cameel cameel May 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, my bad, should have been the other way around - an extra empty line below :)

but why?

Two empty lines between sections.

Maybe we should just change the convention to a single line? No one seems to be able to keep it straight or see it in the diff and it gets changed by PRs back and forth. I usually ignore it but I got annoyed enough seeing it again to suggest a correction :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha yeah, that explains it. Might've also been my pre-coffee confusion that I didn't realize you meant the blank line below. I am absolutely fine with changing it to one line in any case! :)

@clonker clonker force-pushed the eof_source_locations_unoptimized branch 3 times, most recently from af84571 to aad6fca Compare May 15, 2025 09:08
@clonker clonker force-pushed the eof_source_locations_unoptimized branch from aad6fca to 739761a Compare May 15, 2025 14:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

EthDebug source range offsets must be non-negative
6 participants