Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

‎Folders are not properly added to the folder exclusion list when scanning files on Windows. #859

Open
EvanTich opened this issue Nov 23, 2024 · 5 comments
Labels
in:core MLP core module is:bug Bugs to fix

Comments

@EvanTich
Copy link

Version affected

Running on the latest version (4.6) on Windows 10.

Describe the bug

Building the exclusions in Selection::buildExclusions adds all excludes to the list without regard for the system's path-separator. The DirectoryScanner used in Selection::scanIfneeded automatically converts file separators to the proper ones, including the name used in the ScanConductor, so folders that should be excluded are not.

This is largely a benign bug that only shows its fangs when large, auto-generated folders like node_modules or .vite are created.

How to Reproduce

  1. Open any Maven project that uses this plugin.
  2. Run license:format with debug enabled. The bug should also be able to be seen on license:check and license:remove.
  3. Find line stating Starting to visit {basedir}, excluding directories: []. Observe default folders and others are not properly added to folder excludes.
  4. Add an exclude like **\node_modules\** to the plugin settings/parameters.
  5. Re-run the command with debug, and see the directory is excluded correctly.

Hopefully, that should be enough to reproduce the bug...

@EvanTich EvanTich added the is:bug Bugs to fix label Nov 23, 2024
@mathieucarbou mathieucarbou added the in:core MLP core module label Nov 23, 2024
@hazendaz
Copy link
Collaborator

Do you believe it is reading those folders and files? Our exclusion list is pretty large and it doesn't complain about inability to read things like jpg for example. Is it just that its not showing you the default list correctly and that would help for debugging purposes? Since you note its benign, I presume that is the case but wanted to clarify. Of course it should show the list correctly if discovered as that would be important for not only debugging but understanding how the plugin works.

@EvanTich
Copy link
Author

The problem is that it reads every file name, not the contents, in folders that should be excluded. I say that it is benign until you get huge folders like node_modules in the mix which can have hundreds or thousands of files. Execution slows to a halt until each file name in the folder is collected recursively, with all of them being ignored anyway down the line.

The ScanConductor implementation inside Selection::scanIfneeded takes in file names with the appropriate path-separator and checks that name with the folder excludes, see this line in the ScanConductor. Unfortunately, the exclusion file names are improperly formatted with / path-separators, so that method never tells the DirectoryScanner to not recurse through that folder. I haven't had the time to properly debug the plugin, but that's what I think is happening.

@hazendaz
Copy link
Collaborator

ok thank you for description. this indeed would explain why some repos run so slow when they really shouldn't. I'll try to look into this soon but likely after Thanksgiving.

@hazendaz
Copy link
Collaborator

20faa4d introduces the bugs. I had for the longest time searching for why this plugin is so incredibly slow. I used https://github.com/mybatis/mybatis-3/ to test. It is incredibly slow in general. After making adjustments to the class affected by this, mybatis ran quickly. It still properly added license headers as expected. I think this issue is the long nirvana to fixing this plugin speed. I'll send up a draft for fix. Basic gist is that a method used to find the directories to ignore failed to use the correct check and was hard coded to '/' which would fail on windows to add as indicated here. After fixing it up, the list is generated as would be expected and how to be expected (user ignores first then the default list).

@hazendaz
Copy link
Collaborator

ok I don't think this actually solves long standing issue I have with this plugin after further testing. Do I think it would improve thins to not read files we don't need to, yes. See #870 for work I've done on this as a starting point. Its not complete as tests don't pass due to LF hard-coding at least for windows. GHA will confirm the *nix side of things. I agree original logic appears wrong and at worst confusing. However really thinking about it, reading files doesn't necessary end up extremely slow. Writing files would be slower but avoiding any git related actions if any taken would be where performance would degrade IMO. It still warrants review, fix for clarify as its super confusing that excludes for directories exist and it didn't account for any (at least on windows) due to incorrect coding usage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
in:core MLP core module is:bug Bugs to fix
Projects
None yet
Development

No branches or pull requests

3 participants