Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add typo checking for *.rst and *.md files #10603

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

philderbeast
Copy link
Collaborator

@philderbeast philderbeast commented Nov 28, 2024

Fix #10601. Adds spell checking (finding typos) and fixing makefile targets with recipes using typos and an GitHub action that uses the makefile target (even though there is a canned action available for this).

Limited at the moment to checking and fixing spelling in *.rst files (and *.md files - requested at review). No configuration is required as yet. The default configuration can be viewed with typos --dump-config -.

Fixes the typos that make typos found with make fix-typos except for the one with an ambiguous fix that I fixed by hand.


Template B: This PR does not modify behaviour or interface

E.g. the PR only touches documentation or tests, does refactorings, etc.

Include the following checklist in your PR:

  • Patches conform to the coding conventions.
  • Is this a PR that fixes CI? If so, it will need to be backported to older cabal release branches (ask maintainers for directions).

Makefile Outdated Show resolved Hide resolved
Copy link
Collaborator

@ulysses4ever ulysses4ever left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we only use it for text files (e.g. RST) there's not much profit we get over conventional spell checkers like aspell? I think the killer feature of typos is that it tries to analyze source code and find typos in things like program_idenfitiars? If we don't use this feature, I'd rather we use something like aspell. But I won't block it on these grounds.

Besides RST, could we include MD and, most importantly, the contents of the changelog-d directory? That looks critical to me.

@geekosaur
Copy link
Collaborator

Some of the clean-ups suggested it was finding typos in comments in source code, at least. Identifiers might be a harder problem if they affect API.

@philderbeast
Copy link
Collaborator Author

@ulysses4ever, I'm starting cautiously, doing only *.rst files for now. That is working without having to setup configuration in _typos.toml. I have experimented with typos in *.hs source code and expect it will work but not without configuration.

@ulysses4ever
Copy link
Collaborator

I didn't say a word about .hs files. I just ask to add Markdown and changelog-d-files. Is it completely out of reach for this PR?

I also suggest to add a constant in Makefile for the wildcards like **/*.rst since we will want to add more.

@philderbeast
Copy link
Collaborator Author

Besides RST, could we include MD and, most importantly, the contents of the changelog-d directory? That looks critical to me.

I found typos in *.md files but left that out of this PR. Because of the way we create the release notes I was unsure how we'd tackle fixing typos there, #10604.

@geekosaur
Copy link
Collaborator

FWIW we usually catch them while compiling them into actual release notes. Ideally we identify them within PRs before they're merged.

@ulysses4ever
Copy link
Collaborator

Ideally we identify them [typos] within PRs before they're merged.

This.

@philderbeast philderbeast changed the title Add typo checking for *.rst files Add typo checking for *.rst and *.md files Nov 29, 2024
@philderbeast
Copy link
Collaborator Author

I have experimented with typos in *.hs source code and expect it will work but not without configuration.

I've managed to get this working but want to propose it on a separate PR.

@philderbeast
Copy link
Collaborator Author

This needs another review now that I've changed the way I'm passing files to the typos tool.

Makefile Outdated Show resolved Hide resolved
@ulysses4ever
Copy link
Collaborator

I also disagree with separate targets for .md and .rst. I think one target for spell checking is enough.

@philderbeast
Copy link
Collaborator Author

I also disagree with separate targets for .md and .rst. I think one target for spell checking is enough.

@ulysses4ever, there's one more pair I'd like to add in a separate PR, the hs-* ones:

.PHONY: users-guide-typos
users-guide-typos: ## Find typos in users guide
	find doc -type f -name '*.rst' | $(GREP_EXCLUDE) | xargs typos

.PHONY: users-guide-fix-typos
users-guide-fix-typos: ## Fix typos in users guide
	find doc -type f -name '*.rst' | $(GREP_EXCLUDE) | xargs typos --write-changes

.PHONY: markdown-typos
markdown-typos: ## Find typos in markdown files
	find . -type f -name '*.md' | $(GREP_EXCLUDE) | xargs typos

.PHONY: markdown-fix-typos
markdown-fix-typos: ## Fix typos in markdown files
	find . -type f -name '*.md' | $(GREP_EXCLUDE) | xargs typos --write-changes

.PHONY: hs-typos
hs-typos: ## Find typos in *.hs files
	find . -type f -name '*.hs' | $(GREP_EXCLUDE_HS) | $(ESCAPE_SPACES) | xargs typos

.PHONY: hs-fix-typos
hs-fix-typos: ## Fix typos in *.hs files
	find . -type f -name '*.hs' | $(GREP_EXCLUDE_HS) | $(ESCAPE_SPACES) | xargs typos --write-changes

@philderbeast
Copy link
Collaborator Author

philderbeast commented Nov 30, 2024

I also disagree with separate targets for .md and .rst. I think one target for spell checking is enough.

I wouldn't be able to use the cd doc for the doc/**/*.rst files in that case. The typos tool seems fast enough to scan a lot of files but we have yet to scan *.hs files here on this PR. I do have the *.hs scanning ready on another branch.

Combining the scans into one larger scan will noticeably slow things down; *.rst ≈ 55ms, *.md ≈ 150ms, *.hs ≈ 1.2s:

$ time make hs-typos
find . -type f -name '*.hs' | \
  grep -v -E 'dist-|cabal-testsuite|python-|Cabal-syntax.+Quirks' | \
  sed 's| |\\ |g' | xargs typos
________________________________________________________
Executed in    1.15 secs    fish           external
   usr time    1.00 secs  235.00 micros    1.00 secs
   sys time    0.16 secs   76.00 micros    0.16 secs

$ time make hs-fix-typos
find . -type f -name '*.hs' | \
   grep -v -E 'dist-|cabal-testsuite|python-|Cabal-syntax.+Quirks' | \
   sed 's| |\\ |g' | xargs typos --write-changes
________________________________________________________
Executed in    1.15 secs      fish           external
   usr time  978.16 millis  246.00 micros  977.92 millis
   sys time  178.80 millis   76.00 micros  178.72 millis
$ time make markdown-typos
find . -type f -name '*.md' | grep -v -E 'dist-|cabal-testsuite|python-' | xargs typos
________________________________________________________
Executed in  148.81 millis    fish           external
   usr time  100.42 millis  247.00 micros  100.17 millis
   sys time   50.87 millis   70.00 micros   50.80 millis

$ time make markdown-fix-typos
find . -type f -name '*.md' | grep -v -E 'dist-|cabal-testsuite|python-' | xargs typos --write-changes
________________________________________________________
Executed in  155.61 millis    fish           external
   usr time  104.69 millis    0.00 micros  104.69 millis
   sys time   55.77 millis  354.00 micros   55.42 millis
$ time make users-guide-typos
cd doc && find . -type f -name '*.rst' | xargs typos
________________________________________________________
Executed in   55.00 millis    fish           external
   usr time   39.78 millis  345.00 micros   39.43 millis
   sys time   16.90 millis   85.00 micros   16.82 millis

$ time make users-guide-fix-typos
cd doc && find . -type f -name '*.rst' | xargs typos --write-changes
________________________________________________________
Executed in   55.10 millis    fish           external
   usr time   43.34 millis  361.00 micros   42.98 millis
   sys time   13.08 millis   87.00 micros   12.99 millis

@ulysses4ever
Copy link
Collaborator

I think one rule for all file types should be enough. Please, name the reasons you think it's not. If it's dist-newstyle that trips the tool, can we just blocklist it in the config (given that you are adding a config file anyway).

@ulysses4ever
Copy link
Collaborator

Combining the scans into one larger scan will noticeably slow things down; *.rst ≈ 55ms, *.md ≈ 150ms, *.hs ≈ 1.2s:

I don't see a slowdown from combining. I see run times for three different queries. None of these numbers are sensible in CI (or, indeed, locally).

@philderbeast
Copy link
Collaborator Author

philderbeast commented Nov 30, 2024

The slowness of the larger scans is quite noticable (sensible), locally. Also I don't trust the configuration of typo-cli enough for filtering to have it reliably exclude folders like dist-newstyle or dist-newstyle-validate-ghc-9.8.2. I found it hard to work with, I found issues about globbing on typos issue tracker and with the combination of find+grep+xargs we have complete control over what files typos sees using well known tools.

@ulysses4ever
Copy link
Collaborator

The slowness of the larger scans is quite noticable (sensible)

Can you show the numbers?

with the combination of find+grep+xargs we have complete control over what files typos sees using well known tools.

That's great, so lets use some more bash magic to exclude what we need to exclude, if necessary. Again, please, name reasons to not spell-check all desired files in one go?

@philderbeast
Copy link
Collaborator Author

Can you show the numbers?

You may not have seen those. I edited one of my previous posts in this thread, #10603 (comment), adding the run times.

@philderbeast
Copy link
Collaborator Author

philderbeast commented Nov 30, 2024

so lets use some more bash magic to exclude what we need to exclude

I'm already doing exclusion using a simple inverted grep pattern. Which files are being scanned can be seen by adding the --files option to typos.

$ typos --help
Source Code Spelling Correction

Usage: typos [OPTIONS] [PATH]...

Arguments:
  [PATH]...  Paths to check with `-` for stdin [default: .]

Mode:
      --files                      Debug: Print each file that would be spellchecked
$ git diff
diff --git a/Makefile b/Makefile
index e4749580a..257bd8070 100644
--- a/Makefile
+++ b/Makefile
@@ -278,7 +278,7 @@ FIND_NAMED := find . -type f -name
 
 .PHONY: users-guide-typos
 users-guide-typos: ## Find typos in users guide
-       cd doc && $(FIND_NAMED) '*.rst' | xargs typos
+       cd doc && $(FIND_NAMED) '*.rst' | xargs typos --files
 
 .PHONY: users-guide-fix-typos
 users-guide-fix-typos: ## Fix typos in users guide
$ make users-guide-typos
cd doc && find . -type f -name '*.rst' | xargs typos --files
./package-concepts.rst
./how-to-build-like-nix.rst
./how-to-source-packages.rst
./how-to-report-bugs.rst
./how-to-run-in-windows.rst
./cabal-package-description-file.rst
./cabal-context.rst
./vcs/location.rst
./vcs/tag.rst
./vcs/kind.rst
./vcs/fields.rst
./vcs/subdir.rst
./vcs/branch.rst
./how-to-package-haskell-code.rst
./how-to-use-backpack.rst
./cabal-config-and-commands.rst
./config.rst
./cabal-project-description-file.rst
./external-commands.rst
./cabal-commands.rst
./file-format-changelog.rst
./getting-started.rst
./how-to-analyze-haskell-code-performance.rst
./version-control-fields.rst
./buildinfo-fields-reference.rst
./setup-commands.rst
./nix-local-build.rst
./index.rst
./cabal-interface-stability.rst

@ulysses4ever
Copy link
Collaborator

ulysses4ever commented Nov 30, 2024

No, I did see those because I put the summary into my quote in #10603 (comment) These numbers are negligible.

So far, I don't see a single reason for having separate make-rules for every file type.

@philderbeast
Copy link
Collaborator Author

These numbers are neglible.

I disagree. Right now make users-guide-typos seems instantaneous but make hs-typos drags on, the delay is quite perceptible.

@philderbeast
Copy link
Collaborator Author

I understand your wish to keep the makefile small. I would like to wait on trying to have one typos and another fix-typos target in the makefile until the *.hs spell checking is up as a PR.

doc/README.md Outdated Show resolved Hide resolved
- Add a typos action
- Fix typos in *.rst
- Split version and platform env variables
- Add comments to typo makefile targets
- Add a section on typos to README
- Layout block quotation
- Add typos to checks
- Fix typos in release-notes/*.md
- Fix typos in **/*.md
- Extract typos to its own folder
- Rename to .typos.toml
- Use find, grep and xargs with typos
- We have users-guide-typos and markdown-typos
- Fix markdown typos
- Typo in help comment for markdown-fix-typos
- Trigger action pushing to master only
- Add FIND_NAMED variable
- Update target in doc/README for finding typos
- Update prerequisite targets of checks PHONY target
@philderbeast
Copy link
Collaborator Author

Label merge+no rebase is necessary when the pull request is from an organisation.

@mergify mergify bot added the ready and waiting Mergify is waiting out the cooldown period label Nov 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merge+no rebase ready and waiting Mergify is waiting out the cooldown period
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use a spell checker to find typos in *.rst files
3 participants