Skip to content

[rapid7_insightvm] expand documents to map each vulnerability per asset #13878

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
386 changes: 382 additions & 4 deletions packages/rapid7_insightvm/_dev/deploy/docker/files/config.yml

Large diffs are not rendered by default.

5 changes: 5 additions & 0 deletions packages/rapid7_insightvm/changelog.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,9 @@
# newer versions go on top
- version: "1.17.0"
changes:
- description: Expand documents to map each vulnerability per asset.
type: enhancement
link: https://github.com/elastic/integrations/pull/13878
- version: "1.16.0"
changes:
- description: Update Kibana constraint to support 9.0.0.
Expand Down
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
{"assessed_for_policies":false,"assessed_for_vulnerabilities":true,"critical_vulnerabilities":0,"exploits":0,"id":"452534235-25a7-40a3-9321-28ce0b5cc90e-default-asset-199","ip":"10.1.0.128","last_assessed_for_vulnerabilities":"2020-03-20T19:19:42.611Z","last_scan_end":"2020-03-20T19:19:42.611Z","last_scan_start":"2020-03-20T19:18:13.611Z","malware_kits":0,"moderate_vulnerabilities":2,"os_architecture":"x86_64","os_description":"CentOS Linux 2.6.18","os_family":"Linux","os_name":"Linux","os_system_name":"CentOS Linux","os_type":"General","os_vendor":"CentOS","os_version":"2.6.18","risk_score":0,"severe_vulnerabilities":0,"tags":[{"name":"lab","type":"SITE"}],"total_vulnerabilities":2,"new":[],"remediated":[]}
{"assessed_for_policies":false,"assessed_for_vulnerabilities":true,"critical_vulnerabilities":1,"exploits":9,"host_name":"HOST.domain.com","id":"452534235-25a7-40a3-9321-28ce0b5cc90e-default-asset-198","ip":"10.4.24.164","last_scan_end":"2020-03-20T19:12:39.766Z","last_scan_start":"2020-03-20T19:05:06.766Z","malware_kits":0,"moderate_vulnerabilities":11,"os_architecture":"","os_description":"Ubuntu Linux 12.04","os_family":"Linux","os_name":"Linux","os_system_name":"Ubuntu Linux","os_type":"","os_vendor":"Ubuntu","os_version":"12.04","risk_score":12251.76171875,"severe_vulnerabilities":16,"tags":[{"name":"all_assets2","type":"CUSTOM"},{"name":"all_assets","type":"CUSTOM"},{"name":"Linux","type":"CUSTOM"},{"name":"docker hosts","type":"SITE"},{"name":"lab","type":"SITE"}],"total_vulnerabilities":28,"new":[],"remediated":[],"unique_identifiers":{"id":"4421d73dfe04f594df731e6bcd8156a","source":"R7 Agent"}}
{"data":[],"metadata":{"number":0,"size":0,"totalResources":2195,"totalPages":2195,"cursor":null},"links":[{"href":"https://us.api.insight.rapid7.com:443/vm/v4/integration/assets?page=0&size=2","rel":"first"},{"href":"https://us.api.insight.rapid7.com:443/vm/v4/integration/assets?page=0&size=2","rel":"self"},{"href":"https://us.api.insight.rapid7.com:443/vm/v4/integration/assets?page=1097&size=2","rel":"last"}]}
{"assessed_for_policies":false,"assessed_for_vulnerabilities":true,"credential_assessments":[{"port":22,"protocol":"TCP","status":"NO_CREDS_SUPPLIED"}],"critical_vulnerabilities":1,"exploits":1,"id":"8bcfe121-1234-5678-9012-c4a6abcdabcde-default-asset-4","ip":"175.16.199.1","last_assessed_for_vulnerabilities":"2025-05-08T06:51:31.736Z","last_scan_end":"2025-05-08T06:51:31.736Z","last_scan_start":"2025-05-08T06:51:19.193Z","mac":"00:00:5E:00:53:00","malware_kits":0,"moderate_vulnerabilities":0,"os_architecture":"","os_description":"Ubuntu Linux","os_family":"Linux","os_name":"Linux","os_system_name":"Ubuntu Linux","os_type":"","os_vendor":"Ubuntu","risk_score":1268,"severe_vulnerabilities":1,"tags":[{"name":"test","type":"SITE"}],"total_vulnerabilities":2,"unique_identifiers":[],"new":[],"remediated":[],"same":{"check_id":null,"first_found":"2025-05-08T06:51:31Z","key":"","last_found":"2025-05-08T06:51:31.736Z","nic":null,"port":22,"proof":"<p><ul><li>Running SSH service</li><li>Product OpenSSH exists -- OpenBSD OpenSSH 8.9p1</li><li>Vulnerable version of product OpenSSH found -- OpenBSD OpenSSH 8.9p1</li></ul><p>Vulnerable version of OpenSSH detected on Ubuntu Linux</p></p>","protocol":"TCP","solution_fix":"<p>Download and apply the upgrade from: <a href=\"https://ftp.openbsd.org/pub/OpenBSD/OpenSSH\">https://ftp.openbsd.org/pub/OpenBSD/OpenSSH</a></p>","solution_id":"openbsd-openssh-upgrade-latest","solution_summary":"Upgrade to the latest version of OpenSSH","solution_type":"rollup","status":"VULNERABLE_VERS","vulnerability_id":"openbsd-openssh-cve-2024-6387"}}
Original file line number Diff line number Diff line change
Expand Up @@ -175,6 +175,113 @@
"preserve_duplicate_custom_fields"
]
},
null
null,
{
"ecs": {
"version": "8.11.0"
},
"event": {
"category": [
"host"
],
"kind": "state",
"original": "{\"assessed_for_policies\":false,\"assessed_for_vulnerabilities\":true,\"credential_assessments\":[{\"port\":22,\"protocol\":\"TCP\",\"status\":\"NO_CREDS_SUPPLIED\"}],\"critical_vulnerabilities\":1,\"exploits\":1,\"id\":\"8bcfe121-1234-5678-9012-c4a6abcdabcde-default-asset-4\",\"ip\":\"175.16.199.1\",\"last_assessed_for_vulnerabilities\":\"2025-05-08T06:51:31.736Z\",\"last_scan_end\":\"2025-05-08T06:51:31.736Z\",\"last_scan_start\":\"2025-05-08T06:51:19.193Z\",\"mac\":\"00:00:5E:00:53:00\",\"malware_kits\":0,\"moderate_vulnerabilities\":0,\"os_architecture\":\"\",\"os_description\":\"Ubuntu Linux\",\"os_family\":\"Linux\",\"os_name\":\"Linux\",\"os_system_name\":\"Ubuntu Linux\",\"os_type\":\"\",\"os_vendor\":\"Ubuntu\",\"risk_score\":1268,\"severe_vulnerabilities\":1,\"tags\":[{\"name\":\"test\",\"type\":\"SITE\"}],\"total_vulnerabilities\":2,\"unique_identifiers\":[],\"new\":[],\"remediated\":[],\"same\":{\"check_id\":null,\"first_found\":\"2025-05-08T06:51:31Z\",\"key\":\"\",\"last_found\":\"2025-05-08T06:51:31.736Z\",\"nic\":null,\"port\":22,\"proof\":\"<p><ul><li>Running SSH service</li><li>Product OpenSSH exists -- OpenBSD OpenSSH 8.9p1</li><li>Vulnerable version of product OpenSSH found -- OpenBSD OpenSSH 8.9p1</li></ul><p>Vulnerable version of OpenSSH detected on Ubuntu Linux</p></p>\",\"protocol\":\"TCP\",\"solution_fix\":\"<p>Download and apply the upgrade from: <a href=\\\"https://ftp.openbsd.org/pub/OpenBSD/OpenSSH\\\">https://ftp.openbsd.org/pub/OpenBSD/OpenSSH</a></p>\",\"solution_id\":\"openbsd-openssh-upgrade-latest\",\"solution_summary\":\"Upgrade to the latest version of OpenSSH\",\"solution_type\":\"rollup\",\"status\":\"VULNERABLE_VERS\",\"vulnerability_id\":\"openbsd-openssh-cve-2024-6387\"}}",
"type": [
"info"
]
},
"host": {
"id": "8bcfe121-1234-5678-9012-c4a6abcdabcde-default-asset-4",
"ip": [
"175.16.199.1"
],
"mac": [
"00-00-5E-00-53-00"
],
"os": {
"family": "Linux",
"full": "Ubuntu Linux",
"name": "Linux"
},
"risk": {
"static_score": 1268.0
}
},
"network": {
"transport": [
"tcp"
]
},
"rapid7": {
"insightvm": {
"asset": {
"assessed_for_policies": false,
"assessed_for_vulnerabilities": true,
"credential_assessments": [
{
"port": 22,
"protocol": "TCP",
"status": "NO_CREDS_SUPPLIED"
}
],
"critical_vulnerabilities": 1,
"exploits": 1,
"id": "8bcfe121-1234-5678-9012-c4a6abcdabcde-default-asset-4",
"ip": "175.16.199.1",
"last_assessed_for_vulnerabilities": "2025-05-08T06:51:31.736Z",
"last_scan_end": "2025-05-08T06:51:31.736Z",
"last_scan_start": "2025-05-08T06:51:19.193Z",
"mac": "00-00-5E-00-53-00",
"malware_kits": 0,
"moderate_vulnerabilities": 0,
"os": {
"description": "Ubuntu Linux",
"family": "Linux",
"name": "Linux",
"system_name": "Ubuntu Linux",
"vendor": "Ubuntu"
},
"risk_score": 1268.0,
"same": {
"first_found": "2025-05-08T06:51:31.000Z",
"last_found": "2025-05-08T06:51:31.736Z",
"port": 22,
"proof": "Running SSH service\n\nProduct OpenSSH exists -- OpenBSD OpenSSH 8.9p1\n\nVulnerable version of product OpenSSH found -- OpenBSD OpenSSH 8.9p1\n\n\nVulnerable version of OpenSSH detected on Ubuntu Linux",
"protocol": "TCP",
"solution": {
"fix": "Download and apply the upgrade from: https://ftp.openbsd.org/pub/OpenBSD/OpenSSH",
"id": "openbsd-openssh-upgrade-latest",
"summary": "Upgrade to the latest version of OpenSSH",
"type": "rollup"
},
"status": "VULNERABLE_VERS",
"vulnerability_id": "openbsd-openssh-cve-2024-6387"
},
"severe_vulnerabilities": 1,
"tags": [
{
"name": "test",
"type": "SITE"
}
],
"total_vulnerabilities": 2
}
}
},
"related": {
"ip": [
"175.16.199.1"
]
},
"tags": [
"preserve_original_event",
"preserve_duplicate_custom_fields"
],
"vulnerability": {
"id": [
"openbsd-openssh-cve-2024-6387"
]
}
}
]
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -8,3 +8,5 @@ data_stream:
preserve_original_event: true
preserve_duplicate_custom_fields: true
enable_request_tracer: true
assert:
hit_count: 3
Original file line number Diff line number Diff line change
Expand Up @@ -28,21 +28,18 @@ request.transforms:
- set:
target: url.params.size
value: {{batch_size}}
- set:
target: url.params.comparisonTime
value: '[[formatDate (now (parseDuration "-{{interval}}")) "RFC3339"]]'
response.pagination:
- set:
target: url.params.comparisonTime
value: '[[.last_response.url.params.Get "comparisonTime"]]'
fail_on_template_error: true
Comment on lines -31 to -38
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May I know why this part if removed?

Copy link
Contributor

@kcreddy kcreddy May 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@efd6 @jamiehynds, I would like to get your thoughts on this. Its related to #9354 which requires splitting current documents to make one doc per asset vulnerability.

What we currently have is an incremental ingestion using url.params.comparisonTime inside httpjson. With this, as per the assets API doc, we are going to get 3 arrays: new[], same[], and remediated[] . If we go with incremental ingestion, we might need to split on both new[] and remediated[] (because thats where the incremental updates are going to be), which httpjson is not capable of. httpjson cannot split on 2 arrays.

To get around this limitation, following are our options:

  1. Switching to full sync by removing url.params.comparisonTime, this way we always have only 1 array: same[] where split works using httpjson. This is what current PR is doing. I think if we are okay with this, we should atleast increment major version because of drastic increase in volume for users.
  2. Rewrite in CEL and continue incremental ingestion by splitting on both new[] and remediated[]. A future enhancement (for CDR work) is going to happen which joins our current asset API data onto vulnerability API data. This join is also going to be easier to implement inside CEL instead of httpjson chain calls. Because of this, I think it is better to rewrite in CEL now itself.

WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding more points to consider after discussion with @efd6 on the above 2 options:

  • Both options will lead to increase in volume, the first permanently and the second until the collection reaches where the cursor was up to.
  • Both options leave users with duplicates as we don't have fingerprint processor.
  • Both options need a bump in major version because there's going to be duplicates and volume increase.

I think CEL option (both inputs cohabiting) might be lesser evil of the two options. We could leave the httpjson input for users that still want to use it.

@jamiehynds let me know your thoughts if we can go ahead with adding new CEL input.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kcreddy If CEL is the better long-term technical fit, I think it makes sense to rewrite in CEL now. That said, we should avoid keeping both input types side-by-side long-term, as it inevitably creates user confusion — we’ve already seen this play out with the Google Workspace integration.

As part of the major version bump, could we mark the httpjson data stream within the integration as legacy (or something equivalent) and will be removed in a future update. We also need to clearly document what’s included in the update. We can surface the breaking changes to users through the new notification mechanism and required their acknowledgement before updating.

While it's not possible to hide inputs for users not currently using the integration, maybe this could be a forcing function and could bring to the Fleet team. i.e. for any user who doesn't currently use the Rapid7 integration, they'll only get presented with the CEL input.

On the topic of increased data volume — do we have a rough estimate of the scale (2x, 10x, etc.)? Even if it’s a temporary spike with CEL, we’ll need to give users a heads-up.

And regarding duplicates — are we getting a duplicate for every vulnerability, or is it certain events or records being duplicated? Just want to be clear on what exactly users will see.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jamiehynds When @kcreddy and I were talking, we were thinking of the model that we've used in the mimecast integration, where the HTTPJSON input data stream is labelled legacy (without reference to the input) and the CEL input data stream is labelled v2.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main prerequisite for this PR was to analyse existing data properly for CDR work item: https://github.com/elastic/security-team/issues/12534, but it looks like we need not merge this PR, as I was able to perform the analysis based on this PR code itself (without merging). I was able to unblock myself by testing this PR change and doing the analysis.

For now, I think it makes sense to mark this PR as Draft and work on adding CDR changes to this PR itself. That way we only have 1 breaking-change i.e., all changes for split docs + rewrite in CEL (deprecate httpjson) + CDR mappings (and transform) instead of having 2 breaking-changes. It would thus make more sense now to rewrite this in CEL.
cc: @brijesh-elastic


@efd6, with mimecast both httpjson and CEL co-existing would make more sense because CEL is for v2 API while httpjson is for older v1 API and also this older v1 API is not deprecated.
In Rapid7 case, both httpjson and CEL will be relying on same API. CEL will have more enriched data (because it calls vulnerability API as well) but essentially it ingests same asset vulnerabilities. In Rapid7, I believe there could be slightly more user confusion than mimecast if we make both inputs coexist. I think we should mark httpjson as (deprecated) by clearly documenting the update and hope our Fleet team can handle presenting only CEL input to new users. WDYT?

@jamiehynds, I think it makes sense to me to add a deprecation to the title suffix such as (deprecated) for httpjson input so that users become aware when they see it. If Fleet team can base their input off of this title suffix, it will be great because we don't have a proper way to deprecate input.

Regarding the questions:

On the topic of increased data volume — do we have a rough estimate of the scale (2x, 10x, etc.)? Even if it’s a temporary spike with CEL, we’ll need to give users a heads-up.

And regarding duplicates — are we getting a duplicate for every vulnerability, or is it certain events or records being duplicated? Just want to be clear on what exactly users will see.

The estimated increase in volume is directly proportional to number vulnerabilities in user's environment. If the user's environment has N vulnerabilities in their assets, the increase is going to be ~Nx.
This is because existing code was ingesting 1 document per asset (all vulnerabilities are bundled together in this document), but after this change we are going to ingest 1 document per asset per vulnerability.

The duplication is difficult to categorise. The initial pull of CEL will ingest all existing assets along with their vulnerabilities (historical) before it reaches where the previous cursor was for httpjson. One could say that an asset information, which was earlier in 1 document, will now be in 1 (httpjson)+ N (CEL) documents . The vulnerability information, which is in 1 document, will now be in 1 (httpjson)+ 1 (CEL) documents.


If we want to avoid all the fuss, we also have another option: We could add a new data_stream called asset_vulnerabilities on top of the existing data_streams asset or vulnerabilities. There won't be any duplicates if we take this approach. The existing data_streams can still continue to work.

  1. asset data_stream.
    • Existing data_stream based on httpjson.
    • Relies on Search Assets API
    • Has details on assets belonging to user's environment along with top-level in information on all vulnerabilities per asset inside each document (proof, solution, fix, etc.)
  2. vulnerability data_stream:
    • Existing data_stream based on httpjson.
    • Relies on Search Vulnerabilities API.
    • Has detailed vulnerability information of all known Rapid7 vulnerabilities (cve, score, severity, etc.)
  3. asset_vulnerabilities data_stream:
    • New data_stream based on CEL.
    • Relies on both Search Assets API and Search Vulnerabilities API to enrich asset vulnerability data.
    • Has details on assets belonging to user's environment along with detailed information of all vulnerabilities belonging to these assets with 1 document per asset per vulnerability (proof, solution, fix, cve, score, severity etc.)

Let me know your thoughts @jamiehynds @efd6.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should mark httpjson as (deprecated) by clearly documenting the update and hope our Fleet team can handle presenting only CEL input to new users. WDYT?

This is what I was thinking.

Leaving the asset_vulnerabilities query to @jamiehynds, but it does look appealing.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving the httpjson input version of the asset_data_stream makes sense to avoid significant breaking changes for existing users. It's far from ideal, but the only solution we have right now is to mark it as deprecated. This situation is going to come up again I'm sure and reiterates the need for seamless method to move from httpjson to CEL and the ability to hide deprecated data streams for new users of an integration.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the input @jamiehynds. Just to confirm it, are we going with:

  1. Create new data_stream asset_vulnerabilities in CEL and then also deprecate asset data_stream (based on httpjson) completely?
    OR
  2. Use existing asset data_stream, add CEL input, and deprecate httpjson input?

Option 1 will avoid significant breaking changes, because its a new data_stream.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just got clarification from @jamiehynds below:

Going with the new data stream seems like the best path, to avoid major breaking changes and hopefully new users to the integration will use that integration when they see the old one is deprecated.

@brijesh-elastic, please go with option 1 from above comment, i.e., creating new data_stream asset_vulnerability and deprecate existing asset data_stream.
I think we should close this PR and start on new one because the changes here are no longer valid.
Please create 1 PR adding this new data_stream (incrementing major version) that closes: #9354, #13775, and #13776

- set:
target: url.params.cursor
value: '[[if index .last_response.body.metadata "cursor"]][[.last_response.body.metadata.cursor]][[end]]'
fail_on_template_error: true
response.split:
target: body.data
ignore_empty_value: true
split:
target: body.same
keep_parent: true
ignore_empty_value: true
tags:
{{#if preserve_original_event}}
- preserve_original_event
Expand Down
Loading