-
Notifications
You must be signed in to change notification settings - Fork 1.7k
feat: improved XMLArgs processing #3363
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: v3/master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see this being the same patch as v2. Here you implemented the extra My bad, this is indeed implemented.ctl:parseXMLintoArgs
: it is not in v2.
bool ParseXmlIntoArgs::init(std::string *error) { | ||
std::string what(m_parser_payload, 17, m_parser_payload.size() - 17); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
17? 🤦 It looks like the lenght of the SecParseXMLIntoArgs
, but alone is so arbitrary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're definitely right, but I just wanted to follow the conventions (see other files) (even if it's not the prettiest solution). If we want to clean up the code, then we should do that as a sub project.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the project want this? It is worth tracking then (e.g. creating a new issue with the subproject?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the project want this?
Not necessarily. But we can live together with this.
Just a few example:
It is worth tracking then (e.g. creating a new issue with the subproject?)
Sure, it's been like this for almost ten years. Nobody reported, you are the first, so I can't say this has a priority.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds perfect. But (simple) tickets like this might get additional people into the dev side ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But (simple) tickets like this might get additional people into the dev side ;)
Amen! :)
void MSC_startElement(void *userData, | ||
const xmlChar *name, | ||
const xmlChar *prefix, | ||
const xmlChar *URI, | ||
int nb_namespaces, | ||
const xmlChar **namespaces, | ||
int nb_attributes, | ||
int nb_defaulted, | ||
const xmlChar **attributes) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here on this declaration. Is this the new standard?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, could you show a "reference"? What is the "standard"? What is the "old" standard?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The signature of the function is just in the same line, normally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I had pointed out in the other PR, there are already several different ways of formatting long function signatures in the code base. This is one of them.
void *userData, | ||
const xmlChar *name, | ||
const xmlChar* prefix, | ||
const xmlChar* URI) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above.
It's there. |
Co-authored-by: Max Leske <[email protected]>
Co-authored-by: Max Leske <[email protected]>
Co-authored-by: Max Leske <[email protected]>
Co-authored-by: Max Leske <[email protected]>
Co-authored-by: Max Leske <[email protected]>
Co-authored-by: Max Leske <[email protected]>
Co-authored-by: Max Leske <[email protected]>
Co-authored-by: Max Leske <[email protected]>
Co-authored-by: Max Leske <[email protected]>
Co-authored-by: Max Leske <[email protected]>
Co-authored-by: Max Leske <[email protected]>
Co-authored-by: Max Leske <[email protected]>
Co-authored-by: Max Leske <[email protected]>
Co-authored-by: Max Leske <[email protected]>
Co-authored-by: Max Leske <[email protected]>
Co-authored-by: Max Leske <[email protected]>
Co-authored-by: Max Leske <[email protected]>
Co-authored-by: Max Leske <[email protected]>
I added a few new commits here as at #3358:
|
Just out of curiosity, have you tried enabling this and sending a big xml file? |
What do you mean "a big xml file"? Btw the size of the file is just one thing... There are couple of limits in the engine when you're sending an XML payload and want to process it as ARGS, like:
I sent an XML with size > 400kB (I increased the
Actually, the behavior of this function is exactly the same as that used for JSON. There is no difference. You could ask "have you tried enabling this and sending a big JSON file?". But the JSON works as is, and this feature is optional. If you tell me what you're curious about, I'll try it - with an XML and a JSON too. |
The sentence starts with |
m_secXMLExternalEntity(PropertyNotSetConfigBoolean), | ||
m_secXMLParseXmlIntoArgs(PropertyNotSetConfigXMLParseXmlIntoArgs), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
m_secXMLExternalEntity(PropertyNotSetConfigBoolean), | |
m_secXMLParseXmlIntoArgs(PropertyNotSetConfigXMLParseXmlIntoArgs), | |
m_secXmlExternalEntity(PropertyNotSetConfigBoolean), | |
m_secXmlParseXmlIntoArgs(PropertyNotSetConfigXMLParseXmlIntoArgs), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
m_secXMLExternalEntity property is already a member of RulesSetProperties
class. If we change here the format, then we must change that every place, eg. in seclang-parser.yy (and other line in the same file), and in few other files.
We could do this, but I'm afraid this wouldn't be a fair to those who use the project in some forked form.
(Consider a company uses libmodsecurity3 with their own modifications, and they follow the mainline source tree from our repository. Consider they uses these (modified) class members, functions that we changed their names. I think this is a modification at a level where at least a subversion number should be changed, I mean 3.0.XX to 3.1.XX - but I'm not sure we have any (other) strict reason to do that).
But because the old syntax is m_SecXML...
therefore I followed that syntax in case of new members and functions, so I would keep this (unless we can change Xml
to XML
in case of second occurrence).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fine. I just marked all occurrences for consistency.
* The ConfigXMLParseXmlIntoArgs enumerator defines the states for the configuration | ||
* XMLParseXmlIntoArgs values. | ||
* The default value is PropertyNotSetConfigXMLParseXmlIntoArgs. | ||
*/ | ||
enum ConfigXMLParseXmlIntoArgs { | ||
TrueConfigXMLParseXmlIntoArgs, | ||
FalseConfigXMLParseXmlIntoArgs, | ||
OnlyArgsConfigXMLParseXmlIntoArgs, | ||
PropertyNotSetConfigXMLParseXmlIntoArgs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* The ConfigXMLParseXmlIntoArgs enumerator defines the states for the configuration | |
* XMLParseXmlIntoArgs values. | |
* The default value is PropertyNotSetConfigXMLParseXmlIntoArgs. | |
*/ | |
enum ConfigXMLParseXmlIntoArgs { | |
TrueConfigXMLParseXmlIntoArgs, | |
FalseConfigXMLParseXmlIntoArgs, | |
OnlyArgsConfigXMLParseXmlIntoArgs, | |
PropertyNotSetConfigXMLParseXmlIntoArgs | |
* The ConfigXmlParseXmlIntoArgs enumerator defines the states for the configuration | |
* XmlParseXmlIntoArgs values. | |
* The default value is PropertyNotSetConfigXmlParseXmlIntoArgs. | |
*/ | |
enum ConfigXmlParseXmlIntoArgs { | |
TrueConfigXmlParseXmlIntoArgs, | |
FalseConfigXmlParseXmlIntoArgs, | |
OnlyArgsConfigXmlParseXmlIntoArgs, | |
PropertyNotSetConfigXmlParseXmlIntoArgs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can change this, because this enum
type is new, but in this case we will deviate from the above-mentioned syntax (m_SecXML...
). What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, I'd use the "new" spelling. I don't think it matters here.
/* | ||
* XMLNodes for parsing XML into args | ||
*/ | ||
XMLNodes::XMLNodes(Transaction *transaction) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
XMLNodes::XMLNodes(Transaction *transaction) | |
XmlNodes::XmlNodes(Transaction *transaction) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can change this syntax too because it's new, but the existing code uses XML
everywhere. I think my syntax matches with that better.
handler->onCharacters(userData, ch, len); | ||
} | ||
} | ||
|
||
XML::XML(Transaction *transaction) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
XML::XML(Transaction *transaction) | |
Xml::Xml(Transaction *transaction) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as I wrote here. This is an existing code, the XML
class and its constructor is already exists. It wouldn't be fair to change (and if we change we must align all places in the code).
I saw, but honestly: your question is absolutely legitimate. That's why I'm asking you: what should we expect with what size of XML, and/or how many nodes of XML? I wrote a small script which helps to generate different payloads in XML and JSON - but with same content. I tried them, and the problem is not in the conversion (JSON to Which is clearly visible: the critical value is not the size of the file, but number of nodes (both in XML and in JSON). And of course: without any rules there is no any performance issue even with a huge number of node JSON/XML file - indeed the aim is not to forget the rules, but that's also a good question with which rules should we test? |
Co-authored-by: Max Leske <[email protected]>
|
Unfortunately, this depends heavily on the installation. With SOAP, for example, you'll probably process 50-100 nodes per request, with some requests sending much larger XMLs. I don't know whether we can do much more than document the performance given different numbers of nodes and let users decide how to configure the engine.
You could use rules with |
what
This PR adds a new feature within XML processing. It's same as the #3358 but for v3.
why
See v2 patch.
references
See #3178 and #3358.