Skip to content

BUG: to_latex does not escape % with percent formatter #61478

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
2 of 3 tasks
stertingen opened this issue May 22, 2025 · 5 comments
Open
2 of 3 tasks

BUG: to_latex does not escape % with percent formatter #61478

stertingen opened this issue May 22, 2025 · 5 comments
Labels
Bug IO LaTeX to_latex Needs Discussion Requires discussion from core team before further action

Comments

@stertingen
Copy link

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

print(pd.DataFrame({"x": [0.1, 0.5, 1.0]}).to_latex(formatters={"x": "{:.0%}"}, escape=True))
print(pd.DataFrame({"x": [0.1, 0.5, 1.0]}).style.format("{:.0%}", escape="latex").to_latex())

Issue Description

When using "{:.0%}" to format floating point values as percentages, the percent signs are not correctly escaped even if explicitly specified. This applies to DataFrame.to_latex and Styler.to_latex.

Output:

\begin{tabular}{lr}
\toprule
 & x \\
\midrule
0 & 10% \\
1 & 50% \\
2 & 100% \\
\bottomrule
\end{tabular}

\begin{tabular}{lr}
 & x \\
0 & 10% \\
1 & 50% \\
2 & 100% \\
\end{tabular}

Expected Behavior

\begin{tabular}{lr}
\toprule
 & x \\
\midrule
0 & 10\% \\
1 & 50\% \\
2 & 100\% \\
\bottomrule
\end{tabular}

\begin{tabular}{lr}
 & x \\
0 & 10\% \\
1 & 50\% \\
2 & 100\% \\
\end{tabular}

Installed Versions

INSTALLED VERSIONS ------------------ commit : 0691c5c python : 3.12.10 python-bits : 64 OS : Windows OS-release : 11 Version : 10.0.26100 machine : AMD64 processor : Intel64 Family 6 Model 165 Stepping 2, GenuineIntel byteorder : little LC_ALL : None LANG : None LOCALE : de_DE.cp1252

pandas : 2.2.3
numpy : 2.0.2
pytz : 2024.2
dateutil : 2.9.0.post0
pip : 25.0.1
Cython : 3.0.11
sphinx : None
IPython : 8.30.0
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : 4.12.3
blosc : None
bottleneck : 1.4.2
dataframe-api-compat : None
fastparquet : None
fsspec : 2024.10.0
html5lib : None
hypothesis : None
gcsfs : None
jinja2 : 3.1.4
lxml.etree : None
matplotlib : 3.10.0
numba : 0.60.0
numexpr : 2.10.2
odfpy : None
openpyxl : 3.1.5
pandas_gbq : None
psycopg2 : 2.9.10
pymysql : None
pyarrow : 18.1.0
pyreadstat : None
pytest : None
python-calamine : None
pyxlsb : None
s3fs : None
scipy : 1.14.1
sqlalchemy : 2.0.36
tables : None
tabulate : 0.9.0
xarray : None
xlrd : None
xlsxwriter : None
zstandard : 0.23.0
tzdata : 2024.2
qtpy : None
pyqt5 : None

@stertingen stertingen added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels May 22, 2025
@stertingen
Copy link
Author

Boiled it down to the implementation of _maybe_wrap_formatter.
It applies the escape function before the formatting function (adding the % sign).
Technically, the formatter, the decimals, the thousands and the na_rep could introduce symbols to the string which should probably be escaped.
Not sure why the the functions are applied in that precise order, there might be a good reason I overlooked, tho.

@rhshadrach
Copy link
Member

Thanks for the report. I believe the intention is that if the user is introducing their own symbols, they can decide to escape them if they desire.

cc @attack68

@rhshadrach rhshadrach added Needs Discussion Requires discussion from core team before further action IO LaTeX to_latex and removed Needs Triage Issue that has not been reviewed by a pandas team member labels May 23, 2025
@attack68
Copy link
Contributor

attack68 commented May 23, 2025

This is not a bug. It is as documented. "Escaping is done before formatter".
The rationale is that there is not one direction that suits all purposes. But, under the current design all cases can be covered.
The solution to the above is to apply an adjusted formatter: f"{x * 100: .0f}\%", which is relatively simple for a user to do.
For cases where applying escaping first is needed, there would be no easy, or no solution at all, if the design is implemented the other way round.

See the documentation example: Using a formatter with HTML escape and Na rep for a case which requires escaping first.

@stertingen
Copy link
Author

Yes, it is indeed documented for Styler.format. So I guess this is intended behavior.
The adjusted formatter needs to be lambda x: f"{x* 100: .0f}\\%" in my case, but @attack68 nudged be to the right direction, thanks!

For DataFrame.to_latex, this behavior is not documented. In fact, it only states:

By default, the value will be read from the pandas config module and set to True if the option styler.format.escape is “latex”. When set to False prevents from escaping latex special characters in column names.

It does not explicitly tell what happens if it is set to True and only mentions its impact on the column names.
However, this setting does control the escaping of cell contents:

print(pd.DataFrame({"x": ["%"]}).to_latex(escape=False))
\begin{tabular}{ll}
\toprule
 & x \\
\midrule
0 & % \\
\bottomrule
\end{tabular}

vs.

print(pd.DataFrame({"x": ["%"]}).to_latex(escape=True))
\begin{tabular}{ll}
\toprule
 & x \\
\midrule
0 & \% \\
\bottomrule
\end{tabular}

So, for DataFrame.to_latex it is not entirely clear what the escape parameter is supposed to do and what not; I would suggest refining the docs here.
However, its behavior is consistent with Styler.format, so technically we could be fine with referring to the Styler implementation as well.

@attack68
Copy link
Contributor

DataFrame to_latex was re-engineered for version 2.0.0 to use the Styler mechanics (this was to reduce maintenance burden and avoid dual implementations of same feature where one was much more out of date). Really it shouldnt exist at all and all its arguments were monkey patched to suit Styler.
The docs do state this and advise users to use Styler instead

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO LaTeX to_latex Needs Discussion Requires discussion from core team before further action
Projects
None yet
Development

No branches or pull requests

3 participants