Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log misspellings as errors #35323

Open
heaths opened this issue Apr 3, 2023 · 15 comments
Open

Log misspellings as errors #35323

heaths opened this issue Apr 3, 2023 · 15 comments
Labels
Client This issue points to a problem in the data-plane of the library. EngSys This issue is impacting the engineering system.
Milestone

Comments

@heaths
Copy link
Member

heaths commented Apr 3, 2023

Currently we log cspell errors as build warnings, which almost no one will ever notice and won't block automatically merging PRs. This could prevent some bad PRs that lead to #35320 , though in that case projectNam (no trailing "e") wasn't caught by cpell as you can see above, perhaps because "Nam" is a word.

This could be part of #24069 so we can turn on spelling mistakes as errors, as Azure/azure-sdk-for-python does.

@heaths heaths added Client This issue points to a problem in the data-plane of the library. EngSys This issue is impacting the engineering system. labels Apr 3, 2023
@heaths
Copy link
Member Author

heaths commented Apr 3, 2023

@pallavit @jsquire @weshaggard @hallipr what do you think? I suppose for the Python repo it's needed more sans compiler, but if we can get clean - or, like some other issues, slowly opt service directories into it - we could prevent some rather embarrassing spelling errors in public APIs. If you declare a parameter, etc., wrong, auto-complete will keep repeating that over and over again unchecked, potentially. Warnings just don't get seen.

@jsquire
Copy link
Member

jsquire commented Apr 4, 2023

I like it, and my vote is that we do it.

@pallavit
Copy link
Contributor

pallavit commented Apr 4, 2023

For my understanding once this is turned on ci\release pipelines will start getting red\blocked any time we have spell check issues and won't be able to check-in\release SDKs until they are fixed. Correct?

If so - do we know how many errors we have today, how disruptive is this switch going to be for our repo?

@heaths
Copy link
Member Author

heaths commented Apr 4, 2023

The goal is to block PRs. We'd first have to get clean, and we normally do that by opting services out of spell check, filing individual bugs, and have them remove the opt-out when the issues are resolved.

@pallavit
Copy link
Contributor

pallavit commented Apr 4, 2023

The second part is what I wanted to understand. As long as we enable the switch after we do the second pass - and slowly enable the switch - I am all onboard :)

One more question - do we do this on all files? Or do we use it against a subset (which is public facing subset?)

@heaths
Copy link
Member Author

heaths commented Apr 4, 2023

According to @weshaggard we do mandatory spell checking on the api/ folder to avoid spelling mistakes in public APIs already. Perhaps that's good enough, but I still think we might want to consider spell checking all our src/**/*.cs files at least. All our repos are public facing.

@jsquire
Copy link
Member

jsquire commented Apr 4, 2023

One more question - do we do this on all files? Or do we use it against a subset (which is public facing subset?)

We're open source. All of our files are public. 😄

@pallavit
Copy link
Contributor

pallavit commented Apr 4, 2023

Yeah but it may slow our builds considerably so we will likely have to pick a subset.

@heaths
Copy link
Member Author

heaths commented Apr 4, 2023

It already spell checks most of the files, but most misspellings are just warnings. cSpell is very fast and ubiquitous, which is why I nominated them for something coming up soon. :)

See https://github.com/Azure/azure-sdk-for-net/blob/main/.vscode/cspell.json for the exemptions and current allowed words. All our language repos are running cSpell (it's part of eng/common). This is a matter of what to treat as errors.

@pallavit
Copy link
Contributor

pallavit commented Apr 4, 2023

Got it. Sounds like a plan :)

@heaths heaths added this to the 2023-06 milestone Apr 4, 2023
@weshaggard
Copy link
Member

When @danieljurek enabled this in the repos we didn't make spelling issues as errors because there was so many places that needed to get fixed. If someone is willing to clean-up the full repo then perhaps enabling it going forward as errors should be fine.

@heaths
Copy link
Member Author

heaths commented Apr 5, 2023

@weshaggard I was considering doing it how we've done other similar endeavours, such that we make it opt-out on a service directory basis, such that the template and any new projects based on that template are opted in. Could go the other way, but can miss new things.

That said, this is low-pri for me at the moment.

@danieljurek
Copy link
Member

How does spell checking work in PRs? -- Spell checking examines files changed in the PR at most. (It won't go checking your README.md if you didn't edit it in any given PR.) there's rarely a performance bottleneck on spell checking.

To enable spell checking for "all" files in the repo, the process we've followed in other repos is:

  1. Find a partner who is knowledgeable about the language domain
  2. They run a script which generates a list of spelling errors and suggests remediations. Make obvious remediations (add exceptions for words that are relevant to the domain, fix obvious spelling errors in comments, etc.)
  3. Make a PR which excludes from spell checking all of the files in the packages (or services) which have many spelling errors that need to be addressed by the package (or service owners).
  4. Open issues for each package (or service) to address their spelling errors

Save time, use a script

I have a script which makes this work easier by spell checking in parallel, proposing remediations, suggesting which directories to exclude, and opening issues when you're at that point.

@heaths
Copy link
Member Author

heaths commented Apr 5, 2023

How does spell checking work in PRs? -- Spell checking examines files changed in the PR at most. (It won't go checking your README.md if you didn't edit it in any given PR.) there's rarely a performance bottleneck on spell checking.

To clarify, I'm not expecting any change in behavior going forward, but I'd think we'd want to get clean - at least per service directory - before we make PR authors responsible for all past mistakes in files they touched. I mean, that's a way to do it, but really feels like it may dissuade at least casual contributors.

@danieljurek
Copy link
Member

If the package or service is excluded because it has unaddressed spelling error detections then future PRs to that excluded package or service directory won't surface spelling errors. It's up to the team which owns the package or service to fix the spelling errors and re-activate spell checking.

See this example spelling error issue in the Python repo -- Azure/azure-sdk-for-python#22717

In the case of .NET we'll need to figure out how to exclude package source but continue checking public API surface area. This will likely take the form of cleverly crafted exclusion glob expressions which do not exclude the folders containing specs of the public API surface.

@jsquire jsquire modified the milestones: 2023-06, 2023-08 Jul 13, 2023
@jsquire jsquire modified the milestones: 2023-08, Backlog Aug 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Client This issue points to a problem in the data-plane of the library. EngSys This issue is impacting the engineering system.
Projects
None yet
Development

No branches or pull requests

5 participants