You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CHANGELOG.md
+8Lines changed: 8 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -1,3 +1,11 @@
1
+
[0.3.0]
2
+
3
+
Breaking changes:
4
+
5
+
- User defined regex is now run on a file line-by-line instead of word-by-word. This means regex should likely not match the beginning of a line. For example to match DNA, this pattern used to work: `^[ATCG]+$`. This pattern will now need to be something like: `\\b[ATCG]+\\b` (double `\\` is for escaping in TOML)
6
+
7
+
- Codebook will now ignore text like URLs and color hex codes by default. See README `User-Defined Regex Patterns` for more details.
# List of regex patterns to ignore when spell checking
224
+
# Patterns are matched against each line of text, not individual words
224
225
# Useful for domain-specific strings or patterns
226
+
# Note: Backslashes must be escaped in TOML (use \\ instead of \)
225
227
# Default: []
226
228
ignore_patterns = [
227
-
"^[ATCG]+$", # DNA sequences
228
-
"\\d{3}-\\d{2}-\\d{4}"# Social Security Number format
229
+
"\\b[ATCG]+\\b", # DNA sequences
230
+
"\\d{3}-\\d{2}-\\d{4}", # Social Security Number format
231
+
"^[A-Z]{2,}$", # All caps words like "HTML", "CSS"
232
+
"https?://[^\\s]+"# URLs
229
233
]
230
234
231
235
# Whether to use global configuration (project config only)
@@ -248,6 +252,39 @@ use_global = true
248
252
- Project settings are saved automatically when words are added
249
253
- Configuration files are automatically reloaded when they change
250
254
255
+
### User-Defined Regex Patterns
256
+
257
+
The `ignore_patterns` configuration allows you to define custom regex patterns to skip during spell checking. Here are important details about how they work:
258
+
259
+
**Default Patterns**: Codebook already includes built-in regex patterns for common technical strings, so you don't need to define these yourself:
**Line-by-Line Matching**: Regex patterns are applied to each line of text, not individual words. This means your patterns should account for the line context.
270
+
271
+
**TOML Escaping**: Since configuration files use TOML format, backslashes in regex patterns must be escaped by doubling them:
272
+
- Use `\\b` for word boundaries (not `\b`)
273
+
- Use `\\d` for digits (not `\d`)
274
+
- Use `\\\\` for literal backslashes (not `\\`)
275
+
276
+
**Examples**:
277
+
```toml
278
+
ignore_patterns = [
279
+
"\\b[ATCG]+\\b", # DNA sequences with word boundaries
280
+
"^\\s*//.*$", # Comment lines starting with //
281
+
"https?://[^\\s]+", # URLs (note the escaped \s)
282
+
"\\$[a-zA-Z_][a-zA-Z0-9_]*", # Variables starting with $
283
+
]
284
+
```
285
+
286
+
**Migration Note**: If you're upgrading from an older version, patterns that used `^` and `$` anchors may need adjustment since matching now occurs line-by-line rather than word-by-word.
287
+
251
288
## Goals
252
289
253
290
Spell checking is complicated and opinions about how it should be done, especially with code, differs. This section is about the trade offs that steer decisions.
0 commit comments