Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix invalid escape sequences in regex strings #340

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

DavidCain
Copy link

Summary

This commit fixes deprecation warnings that arise from using backslashes
in strings, but not as part of an escape sequence. It will help this
library be used with newer versions of Python.

String literals do not change (for current versions of Python)

>>> r'[\[\]]' == '[\[\]]'
True

Examples

$ python -Wd -c 'print("\d")'
DeprecationWarning: invalid escape sequence \d
$ python -W error -c 'print("\d")'
SyntaxError: invalid escape sequence \d

Explanation

For an explanation of the problem (and the recommended solution),
see: https://docs.python.org/3/library/re.html

Also, please note that any invalid escape sequences in Python’s usage
of the backslash in string literals now generate a DeprecationWarning
and in the future this will become a SyntaxError. This behaviour will
happen even if it is a valid escape sequence for a regular expression.

The solution is to use Python’s raw string notation for regular
expression patterns; backslashes are not handled in any special way in a
string literal prefixed with 'r'.

How to keep these errors from source code

I didn't make any proposed changes in this commit, but there are a few
ways to make sure that new invalid escape sequences are not used:

  • Use a linter!
    • pylint has anomalous-backslash-in-string
    • flake8 has W605
    • other linters work too!
  • Escalate deprecation warnings to full errors at test time
    (e.g. error:invalid escape sequence:DeprecationWarning with
    filterwarnings will change these warnings to errors)

Summary
=======
This commit fixes deprecation warnings that arise from using backslashes
in strings, but *not* as part of an escape sequence. It will help this
library be used with newer versions of Python.

String literals do not change (for current versions of Python)
==============================================================
```python
>>> r'[\[\]]' == '[\[\]]'
True
```

Examples
========
```bash
$ python -Wd -c 'print("\d")'
DeprecationWarning: invalid escape sequence \d
$ python -W error -c 'print("\d")'
SyntaxError: invalid escape sequence \d
```

Explanation
===========
For an explanation of the problem (and the recommended solution),
see: https://docs.python.org/3/library/re.html

>  Also, please note that any invalid escape sequences in Python’s usage
> of the backslash in string literals now generate a DeprecationWarning
> and in the future this will become a SyntaxError. This behaviour will
> happen even if it is a valid escape sequence for a regular expression.
>
> The solution is to use Python’s raw string notation for regular
> expression patterns; backslashes are not handled in any special way in
> a string literal prefixed with 'r'.

How to keep these errors from source code
=========================================
I didn't make any proposed changes in this commit, but there are a few
ways to make sure that *new* invalid escape sequences are not used:

- Use a linter!
    - `pylint` has `anomalous-backslash-in-string`
    - `flake8` has `W605`
    - other linters work too!
- Escalate deprecation warnings to full errors at test time
  (e.g. `error:invalid escape sequence:DeprecationWarning` with
  `filterwarnings`  will change these warnings to errors)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant