Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some updates for DOM-first code selection #3640

Merged
merged 1 commit into from
Feb 3, 2025

Conversation

mickaelistria
Copy link
Contributor

What it does

Improve DOM-based codeSelection in several ways:

  • Ensure resolution has happened when reconciling model with problems requested
  • Make getOrBuildAST taking focalPosition as input and pass focalPosition when processing long files (improve performance)
  • Support more cases in codeSelect

How to test

Ensure no regression.

Author checklist

Sorry, something went wrong.

* Ensure resolution has happened when reconciling model with problems
requested
* Make getOrBuildAST taking focalPosition as input and pass
focalPosition when processing long files (improve performance)
* Support more cases in codeSelect
@@ -60,7 +60,7 @@ public IJavaElement[] codeSelect(int offset, int length) throws JavaModelExcepti
if (offset + length > this.unit.getSource().length()) {
throw new JavaModelException(new IndexOutOfBoundsException(offset + length), IJavaModelStatusConstants.INDEX_OUT_OF_BOUNDS);
}
org.eclipse.jdt.core.dom.CompilationUnit currentAST = this.unit.getOrBuildAST(this.owner);
org.eclipse.jdt.core.dom.CompilationUnit currentAST = this.unit.getOrBuildAST(this.owner, this.unit.getBuffer().getLength() < 50000 ? -1 : offset);
Copy link
Contributor

@rgrunber rgrunber Jan 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would be the reason for not applying focal position always (ie. not just when the buffer is >= 50000) ? Is there something that's lost out on large files? Intuitively I'm just thinking that if we need to return a list of IJavaElements that are partly covered by a selection, then do we even need statement nodes completely outside the scope to make that decision ? Maybe if the selection goes across different scopes ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current rationale (that we can discuss as we have a lot of choices here) is to enable some caching. If we always care about focalPosition, then all the doms we get are partial and resolve faster (that is per se not too bad), but in case of codeSelect, we may prefer to build the whole DOM once and reuse it as long as the file is not modified. The size of the buffer was used as a criterion to decide whether to enable caching and ignore focalPosition, or whether to prefer a partial but non-reusable DOM.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. Ok, it just hit me. This also caches the AST, and returning a cached AST is more efficient that computing a partial (setFocalPosition) one, but if we always compute a partial, we never get to cache a proper one. So may as well limit ourselves to cases where we know the full parse is particularly bad.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So may as well limit ourselves to cases where we know the full parse is particularly bad.

Yes, that's the idea, but I'm not sure what is the best resolution of "particularly bad" concretely. I went for document size because it's easy and cheap to get, but I guess some better conditions could be identified.

Copy link
Contributor

@rgrunber rgrunber Jan 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe @datho7561 has some ideas from his look into parsing about some "easy" checks other than document size that can indicate parsing will be expensive.

The only other thing I've looked at is the usage of setFocalPosition(-1) in regular cases as a way of saying, "don't set the focal position and do a full AST parse". It still sets the CompilationUnitResolver.PARTIAL bit, and useSearcher to true, but these are only meaningfully used for determining whether we set IGNORE_METHOD_BODIES. In other words we could never set IGNORE_METHOD_BODIES on a full AST parse. Almost identical to what setFocalPosition would do to begin with except even the elements around the focal position would be ignored if in a method body. Are you fine with this ? Assuming I understood the logic, it is a subtle difference to be aware of. Update: I mean at the very least it's unexpected behaviour going forward.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

particularly bad

lambdas and switch expressions (these require some fancy and costly hacks in the parser to work), and deeply nested expressions. Lambdas and switch expressions could be identified by searching for -> in the file text. However, that check itself could take a while for a large file. I don't know how much time this would save in practice though.

@rgrunber rgrunber added this to the 4.35 M3 milestone Feb 3, 2025
Copy link
Contributor

@rgrunber rgrunber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine to me. Everything seems isolated from basic ECJ operations.

@rgrunber rgrunber merged commit 9827871 into eclipse-jdt:master Feb 3, 2025
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants