-
Notifications
You must be signed in to change notification settings - Fork 455
[rapid7_insightvm] expand documents to map each vulnerability per asset #13878
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[rapid7_insightvm] expand documents to map each vulnerability per asset #13878
Conversation
Pinging @elastic/security-service-integrations (Team:Security-Service Integrations) |
🚀 Benchmarks reportPackage
|
Data stream | Previous EPS | New EPS | Diff (%) | Result |
---|---|---|---|---|
asset |
3333.33 | 2551.02 | -782.31 (-23.47%) | 💔 |
vulnerability |
4201.68 | 3021.15 | -1180.53 (-28.1%) | 💔 |
To see the full report comment with /test benchmark fullreport
@@ -4,7 +4,7 @@ rules: | |||
responses: | |||
- status_code: 200 | |||
body: | | |||
{"data":[{"assessed_for_policies":false,"assessed_for_vulnerabilities":true,"critical_vulnerabilities":0,"exploits":0,"id":"452534235-25a7-40a3-9321-28ce0b5cc90e-default-asset-199","ip":"10.1.0.128","last_assessed_for_vulnerabilities":"2020-03-20T19:19:42.611Z","last_scan_end":"2020-03-20T19:19:42.611Z","last_scan_start":"2020-03-20T19:18:13.611Z","malware_kits":0,"moderate_vulnerabilities":2,"os_architecture":"x86_64","os_description":"CentOS Linux 2.6.18","os_family":"Linux","os_name":"Linux","os_system_name":"CentOS Linux","os_type":"General","os_vendor":"CentOS","os_version":"2.6.18","risk_score":0,"severe_vulnerabilities":0,"tags":[{"name":"lab","type":"SITE"}],"total_vulnerabilities":2,"new":[],"remediated":[]},{"assessed_for_policies":false,"assessed_for_vulnerabilities":true,"critical_vulnerabilities":1,"exploits":9,"host_name":"host.domain.com","id":"452534235-25a7-40a3-9321-28ce0b5cc90e-default-asset-198","ip":"10.4.24.164","last_scan_end":"2020-03-20T19:12:39.766Z","last_scan_start":"2020-03-20T19:05:06.766Z","malware_kits":0,"moderate_vulnerabilities":11,"os_architecture":"","os_description":"Ubuntu Linux 12.04","os_family":"Linux","os_name":"Linux","os_system_name":"Ubuntu Linux","os_type":"","os_vendor":"Ubuntu","os_version":"12.04","risk_score":12251.76171875,"severe_vulnerabilities":16,"tags":[{"name":"all_assets2","type":"CUSTOM"},{"name":"all_assets","type":"CUSTOM"},{"name":"Linux","type":"CUSTOM"},{"name":"docker hosts","type":"SITE"},{"name":"lab","type":"SITE"}],"total_vulnerabilities":28,"new":[],"remediated":[],"unique_identifiers":{"id":"4421d73dfe04f594df731e6bcd8156a","source":"R7 Agent"}}],"metadata":{"number":0,"size":2,"totalResources":2195,"totalPages":1098,"cursor":null},"links":[{"href":"https://us.api.insight.rapid7.com:443/vm/v4/integration/assets?page=0&size=2","rel":"first"},{"href":"https://us.api.insight.rapid7.com:443/vm/v4/integration/assets?page=0&size=2","rel":"self"},{"href":"https://us.api.insight.rapid7.com:443/vm/v4/integration/assets?page=1&size=2&cursor=1542252837:::_S:::12474375-34a7-40a3-9821-28db0b5cc90e-default-asset-10","rel":"next"},{"href":"https://us.api.insight.rapid7.com:443/vm/v4/integration/assets?page=1097&size=2","rel":"last"}]} | |||
{"data":[{"assessed_for_policies":false,"assessed_for_vulnerabilities":true,"critical_vulnerabilities":0,"exploits":0,"id":"452534235-25a7-40a3-9321-28ce0b5cc90e-default-asset-199","ip":"10.1.0.128","last_assessed_for_vulnerabilities":"2020-03-20T19:19:42.611Z","last_scan_end":"2020-03-20T19:19:42.611Z","last_scan_start":"2020-03-20T19:18:13.611Z","malware_kits":0,"moderate_vulnerabilities":2,"os_architecture":"x86_64","os_description":"CentOS Linux 2.6.18","os_family":"Linux","os_name":"Linux","os_system_name":"CentOS Linux","os_type":"General","os_vendor":"CentOS","os_version":"2.6.18","risk_score":0,"severe_vulnerabilities":0,"tags":[{"name":"lab","type":"SITE"}],"total_vulnerabilities":2,"new":[],"remediated":[]},{"assessed_for_policies":false,"assessed_for_vulnerabilities":true,"credential_assessments":[{"port":22,"protocol":"TCP","status":"NO_CREDS_SUPPLIED"}],"critical_vulnerabilities":1,"exploits":1,"id":"8bcfe121-1234-5678-9012-c4a6abcdabcde-default-asset-4","ip":"175.16.199.1","last_assessed_for_vulnerabilities":"2025-05-08T06:51:31.736Z","last_scan_end":"2025-05-08T06:51:31.736Z","last_scan_start":"2025-05-08T06:51:19.193Z","mac":"00:00:5E:00:53:00","malware_kits":0,"moderate_vulnerabilities":0,"os_architecture":"","os_description":"Ubuntu Linux","os_family":"Linux","os_name":"Linux","os_system_name":"Ubuntu Linux","os_type":"","os_vendor":"Ubuntu","risk_score":1268,"severe_vulnerabilities":1,"tags":[{"name":"test","type":"SITE"}],"total_vulnerabilities":2,"unique_identifiers":[],"new":[],"remediated":[],"same":[{"check_id":null,"first_found":"2025-05-08T06:51:31Z","key":"","last_found":"2025-05-08T06:51:31.736Z","nic":null,"port":22,"proof":"<p><ul><li>Running SSH service</li><li>Product OpenSSH exists -- OpenBSD OpenSSH 8.9p1</li><li>Vulnerable version of product OpenSSH found -- OpenBSD OpenSSH 8.9p1</li></ul><p>Vulnerable version of OpenSSH detected on Ubuntu Linux</p></p>","protocol":"TCP","solution_fix":"<p>Download and apply the upgrade from: <a href=\"https://ftp.openbsd.org/pub/OpenBSD/OpenSSH\">https://ftp.openbsd.org/pub/OpenBSD/OpenSSH</a></p>","solution_id":"openbsd-openssh-upgrade-latest","solution_summary":"Upgrade to the latest version of OpenSSH","solution_type":"rollup","status":"VULNERABLE_VERS","vulnerability_id":"openbsd-openssh-cve-2024-6387"},{"check_id":null,"first_found":"2025-05-08T06:51:31Z","key":"","last_found":"2025-05-08T06:51:31.736Z","nic":null,"port":22,"proof":"<p><ul><li>Running SSH service</li><li>Insecure MAC algorithms in use: hmac-sha1,[email protected]</li></ul></p>","protocol":"TCP","solution_fix":"<p>Consult the product documentation for instructions to disable any insecure MD5 or 96-bit HMAC algorithms within the SSH configuration.</p>","solution_id":"ssh-weak-message-authentication-code-algorithms","solution_summary":"Disable any MD5 or 96-bit HMAC algorithms within the SSH configuration","solution_type":"workaround","status":"VULNERABLE_VERS","vulnerability_id":"ssh-weak-message-authentication-code-algorithms"}]}],"metadata":{"number":0,"size":2,"totalResources":2195,"totalPages":1098,"cursor":null},"links":[{"href":"https://us.api.insight.rapid7.com:443/vm/v4/integration/assets?page=0&size=2","rel":"first"},{"href":"https://us.api.insight.rapid7.com:443/vm/v4/integration/assets?page=0&size=2","rel":"self"},{"href":"https://us.api.insight.rapid7.com:443/vm/v4/integration/assets?page=1&size=2&cursor=1542252837:::_S:::12474375-34a7-40a3-9821-28db0b5cc90e-default-asset-10","rel":"next"},{"href":"https://us.api.insight.rapid7.com:443/vm/v4/integration/assets?page=1097&size=2","rel":"last"}]} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's pretty print this and use { minify_json …
.
changes: | ||
- description: Expand documents to map each vulnerability per asset. | ||
type: enhancement | ||
link: https://github.com/elastic/integrations/pull/1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
link: https://github.com/elastic/integrations/pull/1 | |
link: https://github.com/elastic/integrations/pull/13878 |
"proof": "\n\n\nRunning SSH service\n\nProduct OpenSSH exists -- OpenBSD OpenSSH 8.9p1\n\nVulnerable version of product OpenSSH found -- OpenBSD OpenSSH 8.9p1\n\n\nVulnerable version of OpenSSH detected on Ubuntu Linux\n\n", | ||
"protocol": "TCP", | ||
"solution": { | ||
"fix": "\nDownload and apply the upgrade from: https://ftp.openbsd.org/pub/OpenBSD/OpenSSH\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we want the scars from the html_strip
processor here and in proof above. Suggest adding a trim left and right \n
to these fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth trimming the other cases of html_strip?
/cc @kcreddy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth trimming the other cases of html_strip?
Yes that would make sense 👍🏼. Thanks @efd6.
- set: | ||
target: url.params.comparisonTime | ||
value: '[[formatDate (now (parseDuration "-{{interval}}")) "RFC3339"]]' | ||
response.pagination: | ||
- set: | ||
target: url.params.comparisonTime | ||
value: '[[.last_response.url.params.Get "comparisonTime"]]' | ||
fail_on_template_error: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May I know why this part if removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@efd6 @jamiehynds, I would like to get your thoughts on this. Its related to #9354 which requires splitting current documents to make one doc per asset vulnerability.
What we currently have is an incremental ingestion using url.params.comparisonTime
inside httpjson. With this, as per the assets API doc, we are going to get 3 arrays: new[]
, same[]
, and remediated[]
. If we go with incremental ingestion, we might need to split on both new[]
and remediated[]
(because thats where the incremental updates are going to be), which httpjson is not capable of. httpjson cannot split on 2 arrays.
To get around this limitation, following are our options:
- Switching to full sync by removing
url.params.comparisonTime
, this way we always have only 1 array:same[]
where split works using httpjson. This is what current PR is doing. I think if we are okay with this, we should atleast increment major version because of drastic increase in volume for users. - Rewrite in CEL and continue incremental ingestion by splitting on both
new[]
andremediated[]
. A future enhancement (for CDR work) is going to happen which joins our current asset API data onto vulnerability API data. This join is also going to be easier to implement inside CEL instead of httpjson chain calls. Because of this, I think it is better to rewrite in CEL now itself.
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding more points to consider after discussion with @efd6 on the above 2 options:
- Both options will lead to increase in volume, the first permanently and the second until the collection reaches where the cursor was up to.
- Both options leave users with duplicates as we don't have fingerprint processor.
- Both options need a bump in major version because there's going to be duplicates and volume increase.
I think CEL option (both inputs cohabiting) might be lesser evil of the two options. We could leave the httpjson input for users that still want to use it.
@jamiehynds let me know your thoughts if we can go ahead with adding new CEL input.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kcreddy If CEL is the better long-term technical fit, I think it makes sense to rewrite in CEL now. That said, we should avoid keeping both input types side-by-side long-term, as it inevitably creates user confusion — we’ve already seen this play out with the Google Workspace integration.
As part of the major version bump, could we mark the httpjson data stream within the integration as legacy (or something equivalent) and will be removed in a future update. We also need to clearly document what’s included in the update. We can surface the breaking changes to users through the new notification mechanism and required their acknowledgement before updating.
While it's not possible to hide inputs for users not currently using the integration, maybe this could be a forcing function and could bring to the Fleet team. i.e. for any user who doesn't currently use the Rapid7 integration, they'll only get presented with the CEL input.
On the topic of increased data volume — do we have a rough estimate of the scale (2x, 10x, etc.)? Even if it’s a temporary spike with CEL, we’ll need to give users a heads-up.
And regarding duplicates — are we getting a duplicate for every vulnerability, or is it certain events or records being duplicated? Just want to be clear on what exactly users will see.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jamiehynds When @kcreddy and I were talking, we were thinking of the model that we've used in the mimecast integration, where the HTTPJSON input data stream is labelled legacy (without reference to the input) and the CEL input data stream is labelled v2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main prerequisite for this PR was to analyse existing data properly for CDR work item: https://github.com/elastic/security-team/issues/12534, but it looks like we need not merge this PR, as I was able to perform the analysis based on this PR code itself (without merging). I was able to unblock myself by testing this PR change and doing the analysis.
For now, I think it makes sense to mark this PR as Draft
and work on adding CDR changes to this PR itself. That way we only have 1 breaking-change
i.e., all changes for split docs + rewrite in CEL (deprecate httpjson) + CDR mappings (and transform) instead of having 2 breaking-changes
. It would thus make more sense now to rewrite this in CEL.
cc: @brijesh-elastic
@efd6, with mimecast both httpjson and CEL co-existing would make more sense because CEL is for v2 API while httpjson is for older v1 API and also this older v1 API is not deprecated.
In Rapid7 case, both httpjson and CEL will be relying on same API. CEL will have more enriched data (because it calls vulnerability API as well) but essentially it ingests same asset vulnerabilities. In Rapid7, I believe there could be slightly more user confusion than mimecast if we make both inputs coexist. I think we should mark httpjson as (deprecated)
by clearly documenting the update and hope our Fleet team can handle presenting only CEL input to new users. WDYT?
@jamiehynds, I think it makes sense to me to add a deprecation to the title suffix such as (deprecated)
for httpjson input so that users become aware when they see it. If Fleet team can base their input off of this title suffix, it will be great because we don't have a proper way to deprecate input.
Regarding the questions:
On the topic of increased data volume — do we have a rough estimate of the scale (2x, 10x, etc.)? Even if it’s a temporary spike with CEL, we’ll need to give users a heads-up.
And regarding duplicates — are we getting a duplicate for every vulnerability, or is it certain events or records being duplicated? Just want to be clear on what exactly users will see.
The estimated increase in volume is directly proportional to number vulnerabilities in user's environment. If the user's environment has N
vulnerabilities in their assets, the increase is going to be ~Nx
.
This is because existing code was ingesting 1 document per asset (all vulnerabilities are bundled together in this document), but after this change we are going to ingest 1 document per asset per vulnerability.
The duplication is difficult to categorise. The initial pull of CEL will ingest all existing assets along with their vulnerabilities (historical) before it reaches where the previous cursor was for httpjson. One could say that an asset information, which was earlier in 1 document, will now be in 1 (httpjson)+ N (CEL)
documents . The vulnerability information, which is in 1 document, will now be in 1 (httpjson)+ 1 (CEL)
documents.
If we want to avoid all the fuss, we also have another option: We could add a new data_stream called asset_vulnerabilities
on top of the existing data_streams asset
or vulnerabilities
. There won't be any duplicates if we take this approach. The existing data_streams can still continue to work.
asset
data_stream.- Existing data_stream based on httpjson.
- Relies on Search Assets API
- Has details on assets belonging to user's environment along with top-level in information on all vulnerabilities per asset inside each document (proof, solution, fix, etc.)
vulnerability
data_stream:- Existing data_stream based on httpjson.
- Relies on Search Vulnerabilities API.
- Has detailed vulnerability information of all known Rapid7 vulnerabilities (cve, score, severity, etc.)
asset_vulnerabilities
data_stream:- New data_stream based on CEL.
- Relies on both Search Assets API and Search Vulnerabilities API to enrich asset vulnerability data.
- Has details on assets belonging to user's environment along with detailed information of all vulnerabilities belonging to these assets with 1 document per asset per vulnerability (proof, solution, fix, cve, score, severity etc.)
Let me know your thoughts @jamiehynds @efd6.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should mark httpjson as (deprecated) by clearly documenting the update and hope our Fleet team can handle presenting only CEL input to new users. WDYT?
This is what I was thinking.
Leaving the asset_vulnerabilities
query to @jamiehynds, but it does look appealing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leaving the httpjson input version of the asset_data_stream
makes sense to avoid significant breaking changes for existing users. It's far from ideal, but the only solution we have right now is to mark it as deprecated. This situation is going to come up again I'm sure and reiterates the need for seamless method to move from httpjson to CEL and the ability to hide deprecated data streams for new users of an integration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the input @jamiehynds. Just to confirm it, are we going with:
- Create new data_stream
asset_vulnerabilities
in CEL and then also deprecateasset
data_stream (based on httpjson) completely?
OR
- Use existing
asset
data_stream, add CEL input, and deprecatehttpjson
input?
Option 1 will avoid significant breaking changes, because its a new data_stream.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just got clarification from @jamiehynds below:
Going with the new data stream seems like the best path, to avoid major breaking changes and hopefully new users to the integration will use that integration when they see the old one is deprecated.
@brijesh-elastic, please go with option 1 from above comment, i.e., creating new data_stream asset_vulnerability
and deprecate existing asset
data_stream.
I think we should close this PR and start on new one because the changes here are no longer valid.
Please create 1 PR adding this new data_stream (incrementing major version) that closes: #9354, #13775, and #13776
"dataset": "rapid7_insightvm.asset", | ||
"ingested": "2023-05-23T16:17:08Z", | ||
"ingested": "2025-05-12T09:23:16Z", | ||
"kind": "state", | ||
"original": "{\"assessed_for_policies\":false,\"assessed_for_vulnerabilities\":true,\"critical_vulnerabilities\":0,\"exploits\":0,\"id\":\"452534235-25a7-40a3-9321-28ce0b5cc90e-default-asset-199\",\"ip\":\"10.1.0.128\",\"last_assessed_for_vulnerabilities\":\"2020-03-20T19:19:42.611Z\",\"last_scan_end\":\"2020-03-20T19:19:42.611Z\",\"last_scan_start\":\"2020-03-20T19:18:13.611Z\",\"malware_kits\":0,\"moderate_vulnerabilities\":2,\"new\":[],\"os_architecture\":\"x86_64\",\"os_description\":\"CentOS Linux 2.6.18\",\"os_family\":\"Linux\",\"os_name\":\"Linux\",\"os_system_name\":\"CentOS Linux\",\"os_type\":\"General\",\"os_vendor\":\"CentOS\",\"os_version\":\"2.6.18\",\"remediated\":[],\"risk_score\":0,\"severe_vulnerabilities\":0,\"tags\":[{\"name\":\"lab\",\"type\":\"SITE\"}],\"total_vulnerabilities\":2}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you update config.yml
so that one of the documents containing same
is populated instead of this one (doesn't have same
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@brijesh-elastic don't close #9354 as the later part of the issue description indicates a requirement for preserving only 1 asset vulnerability inside the index. I think this can only be truly closed after implementing latest transform which stores latest state of each asset vulnerability inside destination index.
💚 Build Succeeded
History
|
|
Proposed commit message
Checklist
changelog.yml
file.How to test this PR locally
Related issues