You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Running LaTeXML on untrusted inputs is dangerous in so far as it will run arbitrary perl code (by loading .ltxml files) and read or write to arbitrary locations (in different phases: \input in TeX, document() in XSLT, etc).
TeX has a simple security model: -shell-escape (and environment variable shell_escape) controls arbitrary code execution; openin_any, openout_any1 control whether access is restricted to the current/output directories or all the filesystem.
Some relevant prior discussion is in #606 which lead to the secureio plugin.
Generally it's quite hard to improve the safety profile of latexml with claims about it being "complete", especially in the command-line use cases.
It is a little more manageable to containerize the conversion in e.g. a Docker image (related #1178) and pose restrictions on the source contents being passed in. Though they are not mutually exclusive.
I think this kind of change is more feasible in light of #2185 - I like to think I have found all the I/O happening in latexml, and adding some hooks to stop with errors when reading or writing outside the boundary set by open(in|out)_any is doable, in principle. But this could just be an intermediate step: first an implementation of -recorder, and once it seems complete, you can bolt on I/O filtering. (Full filtering requires #2053 to also catch I/O from LibXSLT.)
For this to make any sense, one also need some form of -shell-escape to forbid custom .ltxml bindings, i.e. bindings should be loaded from the default locations, but not from . unless specifically requested with --path=. or --shell-escape.
If the above is workable, latexml could reach the same safety profile of a normal LaTeX run, which is a familiar thing.
Of course I am making big assumptions about the other tools (dvipng, dvisvgm, Ghostscript, and [shudders] ImageMagick) having a similar safety approach, i.e. not reading from/writing to arbitrary locations when fed dodgy inputs.
Running LaTeXML on untrusted inputs is dangerous in so far as it will run arbitrary perl code (by loading
.ltxml
files) and read or write to arbitrary locations (in different phases:\input
in TeX,document()
in XSLT, etc).TeX has a simple security model:
-shell-escape
(and environment variableshell_escape
) controls arbitrary code execution;openin_any
,openout_any
1 control whether access is restricted to the current/output directories or all the filesystem.Maybe LaTeXML could follow the same model?
Footnotes
Documentation at https://tug.org/texinfohtml/kpathsea.html#Calling-sequence. Values
a
(all),p
(paranoid),r
(restricted), plus some backward compatibility aliases. ↩The text was updated successfully, but these errors were encountered: