[rapid7_insightvm] expand documents to map each vulnerability per asset #13878

brijesh-elastic · 2025-05-12T03:47:15Z

Proposed commit message

rapid7_insightvm: expand documents to map each vulnerability per asset.

Checklist

I have reviewed tips for building integrations and this pull request is aligned with them.
I have verified that all data streams collect metrics or logs.
I have added an entry to my package's changelog.yml file.
I have verified that Kibana version constraints are current according to guidelines.
I have verified that any added dashboard complies with Kibana's Dashboard good practices

How to test this PR locally

Clone integrations repo.
Install elastic package locally.
Start elastic stack using elastic-package.
Move to integrations/packages/rapid7_insightvm directory.
Run the following command to run tests.

elastic-package test

Related issues

Relates Change Rapid7 InsightVM integration to one doc per vulnerability #9354

elasticmachine · 2025-05-12T03:47:19Z

Pinging @elastic/security-service-integrations (Team:Security-Service Integrations)

elastic-vault-github-plugin-prod · 2025-05-12T04:09:31Z

🚀 Benchmarks report

Package `rapid7_insightvm` 👍(0) 💚(0) 💔(2)

Expand to view

Data stream	Previous EPS	New EPS	Diff (%)	Result
`asset`	3333.33	2551.02	-782.31 (-23.47%)	💔
`vulnerability`	4201.68	3021.15	-1180.53 (-28.1%)	💔

To see the full report comment with /test benchmark fullreport

efd6 · 2025-05-12T04:07:00Z

packages/rapid7_insightvm/_dev/deploy/docker/files/config.yml

@@ -4,7 +4,7 @@ rules:
    responses:
      - status_code: 200
        body: |
-          {"data":[{"assessed_for_policies":false,"assessed_for_vulnerabilities":true,"critical_vulnerabilities":0,"exploits":0,"id":"452534235-25a7-40a3-9321-28ce0b5cc90e-default-asset-199","ip":"10.1.0.128","last_assessed_for_vulnerabilities":"2020-03-20T19:19:42.611Z","last_scan_end":"2020-03-20T19:19:42.611Z","last_scan_start":"2020-03-20T19:18:13.611Z","malware_kits":0,"moderate_vulnerabilities":2,"os_architecture":"x86_64","os_description":"CentOS Linux 2.6.18","os_family":"Linux","os_name":"Linux","os_system_name":"CentOS Linux","os_type":"General","os_vendor":"CentOS","os_version":"2.6.18","risk_score":0,"severe_vulnerabilities":0,"tags":[{"name":"lab","type":"SITE"}],"total_vulnerabilities":2,"new":[],"remediated":[]},{"assessed_for_policies":false,"assessed_for_vulnerabilities":true,"critical_vulnerabilities":1,"exploits":9,"host_name":"host.domain.com","id":"452534235-25a7-40a3-9321-28ce0b5cc90e-default-asset-198","ip":"10.4.24.164","last_scan_end":"2020-03-20T19:12:39.766Z","last_scan_start":"2020-03-20T19:05:06.766Z","malware_kits":0,"moderate_vulnerabilities":11,"os_architecture":"","os_description":"Ubuntu Linux 12.04","os_family":"Linux","os_name":"Linux","os_system_name":"Ubuntu Linux","os_type":"","os_vendor":"Ubuntu","os_version":"12.04","risk_score":12251.76171875,"severe_vulnerabilities":16,"tags":[{"name":"all_assets2","type":"CUSTOM"},{"name":"all_assets","type":"CUSTOM"},{"name":"Linux","type":"CUSTOM"},{"name":"docker hosts","type":"SITE"},{"name":"lab","type":"SITE"}],"total_vulnerabilities":28,"new":[],"remediated":[],"unique_identifiers":{"id":"4421d73dfe04f594df731e6bcd8156a","source":"R7 Agent"}}],"metadata":{"number":0,"size":2,"totalResources":2195,"totalPages":1098,"cursor":null},"links":[{"href":"https://us.api.insight.rapid7.com:443/vm/v4/integration/assets?page=0&size=2","rel":"first"},{"href":"https://us.api.insight.rapid7.com:443/vm/v4/integration/assets?page=0&size=2","rel":"self"},{"href":"https://us.api.insight.rapid7.com:443/vm/v4/integration/assets?page=1&size=2&cursor=1542252837:::_S:::12474375-34a7-40a3-9821-28db0b5cc90e-default-asset-10","rel":"next"},{"href":"https://us.api.insight.rapid7.com:443/vm/v4/integration/assets?page=1097&size=2","rel":"last"}]}
+          {"data":[{"assessed_for_policies":false,"assessed_for_vulnerabilities":true,"critical_vulnerabilities":0,"exploits":0,"id":"452534235-25a7-40a3-9321-28ce0b5cc90e-default-asset-199","ip":"10.1.0.128","last_assessed_for_vulnerabilities":"2020-03-20T19:19:42.611Z","last_scan_end":"2020-03-20T19:19:42.611Z","last_scan_start":"2020-03-20T19:18:13.611Z","malware_kits":0,"moderate_vulnerabilities":2,"os_architecture":"x86_64","os_description":"CentOS Linux 2.6.18","os_family":"Linux","os_name":"Linux","os_system_name":"CentOS Linux","os_type":"General","os_vendor":"CentOS","os_version":"2.6.18","risk_score":0,"severe_vulnerabilities":0,"tags":[{"name":"lab","type":"SITE"}],"total_vulnerabilities":2,"new":[],"remediated":[]},{"assessed_for_policies":false,"assessed_for_vulnerabilities":true,"credential_assessments":[{"port":22,"protocol":"TCP","status":"NO_CREDS_SUPPLIED"}],"critical_vulnerabilities":1,"exploits":1,"id":"8bcfe121-1234-5678-9012-c4a6abcdabcde-default-asset-4","ip":"175.16.199.1","last_assessed_for_vulnerabilities":"2025-05-08T06:51:31.736Z","last_scan_end":"2025-05-08T06:51:31.736Z","last_scan_start":"2025-05-08T06:51:19.193Z","mac":"00:00:5E:00:53:00","malware_kits":0,"moderate_vulnerabilities":0,"os_architecture":"","os_description":"Ubuntu Linux","os_family":"Linux","os_name":"Linux","os_system_name":"Ubuntu Linux","os_type":"","os_vendor":"Ubuntu","risk_score":1268,"severe_vulnerabilities":1,"tags":[{"name":"test","type":"SITE"}],"total_vulnerabilities":2,"unique_identifiers":[],"new":[],"remediated":[],"same":[{"check_id":null,"first_found":"2025-05-08T06:51:31Z","key":"","last_found":"2025-05-08T06:51:31.736Z","nic":null,"port":22,"proof":"<p><ul><li>Running SSH service</li><li>Product OpenSSH exists -- OpenBSD OpenSSH 8.9p1</li><li>Vulnerable version of product OpenSSH found -- OpenBSD OpenSSH 8.9p1</li></ul><p>Vulnerable version of OpenSSH detected on Ubuntu Linux</p></p>","protocol":"TCP","solution_fix":"<p>Download and apply the upgrade from: <a href=\"https://ftp.openbsd.org/pub/OpenBSD/OpenSSH\">https://ftp.openbsd.org/pub/OpenBSD/OpenSSH</a></p>","solution_id":"openbsd-openssh-upgrade-latest","solution_summary":"Upgrade to the latest version of OpenSSH","solution_type":"rollup","status":"VULNERABLE_VERS","vulnerability_id":"openbsd-openssh-cve-2024-6387"},{"check_id":null,"first_found":"2025-05-08T06:51:31Z","key":"","last_found":"2025-05-08T06:51:31.736Z","nic":null,"port":22,"proof":"<p><ul><li>Running SSH service</li><li>Insecure MAC algorithms in use: hmac-sha1,[email protected]</li></ul></p>","protocol":"TCP","solution_fix":"<p>Consult the product documentation for instructions to disable any insecure MD5 or 96-bit HMAC algorithms within the SSH configuration.</p>","solution_id":"ssh-weak-message-authentication-code-algorithms","solution_summary":"Disable any MD5 or 96-bit HMAC algorithms within the SSH configuration","solution_type":"workaround","status":"VULNERABLE_VERS","vulnerability_id":"ssh-weak-message-authentication-code-algorithms"}]}],"metadata":{"number":0,"size":2,"totalResources":2195,"totalPages":1098,"cursor":null},"links":[{"href":"https://us.api.insight.rapid7.com:443/vm/v4/integration/assets?page=0&size=2","rel":"first"},{"href":"https://us.api.insight.rapid7.com:443/vm/v4/integration/assets?page=0&size=2","rel":"self"},{"href":"https://us.api.insight.rapid7.com:443/vm/v4/integration/assets?page=1&size=2&cursor=1542252837:::_S:::12474375-34a7-40a3-9821-28db0b5cc90e-default-asset-10","rel":"next"},{"href":"https://us.api.insight.rapid7.com:443/vm/v4/integration/assets?page=1097&size=2","rel":"last"}]}


Let's pretty print this and use { minify_json ….

efd6 · 2025-05-12T04:07:14Z

packages/rapid7_insightvm/changelog.yml

+  changes:
+    - description: Expand documents to map each vulnerability per asset.
+      type: enhancement
+      link: https://github.com/elastic/integrations/pull/1


Suggested change

link: https://github.com/elastic/integrations/pull/1

link: https://github.com/elastic/integrations/pull/13878

efd6 · 2025-05-12T04:10:11Z

packages/rapid7_insightvm/data_stream/asset/_dev/test/pipeline/test-asset.log-expected.json

+                            "proof": "\n\n\nRunning SSH service\n\nProduct OpenSSH exists -- OpenBSD OpenSSH 8.9p1\n\nVulnerable version of product OpenSSH found -- OpenBSD OpenSSH 8.9p1\n\n\nVulnerable version of OpenSSH detected on Ubuntu Linux\n\n",
+                            "protocol": "TCP",
+                            "solution": {
+                                "fix": "\nDownload and apply the upgrade from: https://ftp.openbsd.org/pub/OpenBSD/OpenSSH\n",


I don't think we want the scars from the html_strip processor here and in proof above. Suggest adding a trim left and right \n to these fields.

Is it worth trimming the other cases of html_strip?

/cc @kcreddy

Is it worth trimming the other cases of html_strip?

Yes that would make sense 👍🏼. Thanks @efd6.

kcreddy · 2025-05-12T09:57:16Z

packages/rapid7_insightvm/data_stream/asset/agent/stream/httpjson.yml.hbs

-  - set:
-      target: url.params.comparisonTime
-      value: '[[formatDate (now (parseDuration "-{{interval}}")) "RFC3339"]]'
 response.pagination:
-  - set:
-      target: url.params.comparisonTime
-      value: '[[.last_response.url.params.Get "comparisonTime"]]'
-      fail_on_template_error: true


May I know why this part if removed?

@efd6 @jamiehynds, I would like to get your thoughts on this. Its related to #9354 which requires splitting current documents to make one doc per asset vulnerability.

What we currently have is an incremental ingestion using url.params.comparisonTime inside httpjson. With this, as per the assets API doc, we are going to get 3 arrays: new[], same[], and remediated[] . If we go with incremental ingestion, we might need to split on both new[] and remediated[] (because thats where the incremental updates are going to be), which httpjson is not capable of. httpjson cannot split on 2 arrays.

To get around this limitation, following are our options:

Switching to full sync by removing url.params.comparisonTime, this way we always have only 1 array: same[] where split works using httpjson. This is what current PR is doing. I think if we are okay with this, we should atleast increment major version because of drastic increase in volume for users.

Rewrite in CEL and continue incremental ingestion by splitting on both new[] and remediated[]. A future enhancement (for CDR work) is going to happen which joins our current asset API data onto vulnerability API data. This join is also going to be easier to implement inside CEL instead of httpjson chain calls. Because of this, I think it is better to rewrite in CEL now itself.

WDYT?

Adding more points to consider after discussion with @efd6 on the above 2 options:

Both options will lead to increase in volume, the first permanently and the second until the collection reaches where the cursor was up to.

Both options leave users with duplicates as we don't have fingerprint processor.

Both options need a bump in major version because there's going to be duplicates and volume increase.

I think CEL option (both inputs cohabiting) might be lesser evil of the two options. We could leave the httpjson input for users that still want to use it.

@jamiehynds let me know your thoughts if we can go ahead with adding new CEL input.

@kcreddy If CEL is the better long-term technical fit, I think it makes sense to rewrite in CEL now. That said, we should avoid keeping both input types side-by-side long-term, as it inevitably creates user confusion — we’ve already seen this play out with the Google Workspace integration.

As part of the major version bump, could we mark the httpjson data stream within the integration as legacy (or something equivalent) and will be removed in a future update. We also need to clearly document what’s included in the update. We can surface the breaking changes to users through the new notification mechanism and required their acknowledgement before updating.

While it's not possible to hide inputs for users not currently using the integration, maybe this could be a forcing function and could bring to the Fleet team. i.e. for any user who doesn't currently use the Rapid7 integration, they'll only get presented with the CEL input.

On the topic of increased data volume — do we have a rough estimate of the scale (2x, 10x, etc.)? Even if it’s a temporary spike with CEL, we’ll need to give users a heads-up.

And regarding duplicates — are we getting a duplicate for every vulnerability, or is it certain events or records being duplicated? Just want to be clear on what exactly users will see.

@jamiehynds When @kcreddy and I were talking, we were thinking of the model that we've used in the mimecast integration, where the HTTPJSON input data stream is labelled legacy (without reference to the input) and the CEL input data stream is labelled v2.

The main prerequisite for this PR was to analyse existing data properly for CDR work item: https://github.com/elastic/security-team/issues/12534, but it looks like we need not merge this PR, as I was able to perform the analysis based on this PR code itself (without merging). I was able to unblock myself by testing this PR change and doing the analysis.

For now, I think it makes sense to mark this PR as Draft and work on adding CDR changes to this PR itself. That way we only have 1 breaking-change i.e., all changes for split docs + rewrite in CEL (deprecate httpjson) + CDR mappings (and transform) instead of having 2 breaking-changes. It would thus make more sense now to rewrite this in CEL.
cc: @brijesh-elastic

@efd6, with mimecast both httpjson and CEL co-existing would make more sense because CEL is for v2 API while httpjson is for older v1 API and also this older v1 API is not deprecated.
In Rapid7 case, both httpjson and CEL will be relying on same API. CEL will have more enriched data (because it calls vulnerability API as well) but essentially it ingests same asset vulnerabilities. In Rapid7, I believe there could be slightly more user confusion than mimecast if we make both inputs coexist. I think we should mark httpjson as (deprecated) by clearly documenting the update and hope our Fleet team can handle presenting only CEL input to new users. WDYT?

@jamiehynds, I think it makes sense to me to add a deprecation to the title suffix such as (deprecated) for httpjson input so that users become aware when they see it. If Fleet team can base their input off of this title suffix, it will be great because we don't have a proper way to deprecate input.

Regarding the questions:

On the topic of increased data volume — do we have a rough estimate of the scale (2x, 10x, etc.)? Even if it’s a temporary spike with CEL, we’ll need to give users a heads-up.

And regarding duplicates — are we getting a duplicate for every vulnerability, or is it certain events or records being duplicated? Just want to be clear on what exactly users will see.

The estimated increase in volume is directly proportional to number vulnerabilities in user's environment. If the user's environment has N vulnerabilities in their assets, the increase is going to be ~Nx.
This is because existing code was ingesting 1 document per asset (all vulnerabilities are bundled together in this document), but after this change we are going to ingest 1 document per asset per vulnerability.

The duplication is difficult to categorise. The initial pull of CEL will ingest all existing assets along with their vulnerabilities (historical) before it reaches where the previous cursor was for httpjson. One could say that an asset information, which was earlier in 1 document, will now be in 1 (httpjson)+ N (CEL) documents . The vulnerability information, which is in 1 document, will now be in 1 (httpjson)+ 1 (CEL) documents.

If we want to avoid all the fuss, we also have another option: We could add a new data_stream called asset_vulnerabilities on top of the existing data_streams asset or vulnerabilities. There won't be any duplicates if we take this approach. The existing data_streams can still continue to work.

asset data_stream.

Existing data_stream based on httpjson.

Relies on Search Assets API

Has details on assets belonging to user's environment along with top-level in information on all vulnerabilities per asset inside each document (proof, solution, fix, etc.)

vulnerability data_stream:

Existing data_stream based on httpjson.

Relies on Search Vulnerabilities API.

Has detailed vulnerability information of all known Rapid7 vulnerabilities (cve, score, severity, etc.)

asset_vulnerabilities data_stream:

New data_stream based on CEL.

Relies on both Search Assets API and Search Vulnerabilities API to enrich asset vulnerability data.

Has details on assets belonging to user's environment along with detailed information of all vulnerabilities belonging to these assets with 1 document per asset per vulnerability (proof, solution, fix, cve, score, severity etc.)

Let me know your thoughts @jamiehynds @efd6.

I think we should mark httpjson as (deprecated) by clearly documenting the update and hope our Fleet team can handle presenting only CEL input to new users. WDYT?

This is what I was thinking.

Leaving the asset_vulnerabilities query to @jamiehynds, but it does look appealing.

Leaving the httpjson input version of the asset_data_stream makes sense to avoid significant breaking changes for existing users. It's far from ideal, but the only solution we have right now is to mark it as deprecated. This situation is going to come up again I'm sure and reiterates the need for seamless method to move from httpjson to CEL and the ability to hide deprecated data streams for new users of an integration.

Thanks for the input @jamiehynds. Just to confirm it, are we going with:

Create new data_stream asset_vulnerabilities in CEL and then also deprecate asset data_stream (based on httpjson) completely?
OR

Use existing asset data_stream, add CEL input, and deprecate httpjson input?

Option 1 will avoid significant breaking changes, because its a new data_stream.

Just got clarification from @jamiehynds below:

Going with the new data stream seems like the best path, to avoid major breaking changes and hopefully new users to the integration will use that integration when they see the old one is deprecated.

@brijesh-elastic, please go with option 1 from above comment, i.e., creating new data_stream asset_vulnerability and deprecate existing asset data_stream.
I think we should close this PR and start on new one because the changes here are no longer valid.
Please create 1 PR adding this new data_stream (incrementing major version) that closes: #9354, #13775, and #13776

kcreddy · 2025-05-12T10:00:51Z

packages/rapid7_insightvm/data_stream/asset/sample_event.json

        "dataset": "rapid7_insightvm.asset",
-        "ingested": "2023-05-23T16:17:08Z",
+        "ingested": "2025-05-12T09:23:16Z",
        "kind": "state",
        "original": "{\"assessed_for_policies\":false,\"assessed_for_vulnerabilities\":true,\"critical_vulnerabilities\":0,\"exploits\":0,\"id\":\"452534235-25a7-40a3-9321-28ce0b5cc90e-default-asset-199\",\"ip\":\"10.1.0.128\",\"last_assessed_for_vulnerabilities\":\"2020-03-20T19:19:42.611Z\",\"last_scan_end\":\"2020-03-20T19:19:42.611Z\",\"last_scan_start\":\"2020-03-20T19:18:13.611Z\",\"malware_kits\":0,\"moderate_vulnerabilities\":2,\"new\":[],\"os_architecture\":\"x86_64\",\"os_description\":\"CentOS Linux 2.6.18\",\"os_family\":\"Linux\",\"os_name\":\"Linux\",\"os_system_name\":\"CentOS Linux\",\"os_type\":\"General\",\"os_vendor\":\"CentOS\",\"os_version\":\"2.6.18\",\"remediated\":[],\"risk_score\":0,\"severe_vulnerabilities\":0,\"tags\":[{\"name\":\"lab\",\"type\":\"SITE\"}],\"total_vulnerabilities\":2}",


Can you update config.yml so that one of the documents containing same is populated instead of this one (doesn't have same)

kcreddy

@brijesh-elastic don't close #9354 as the later part of the issue description indicates a requirement for preserving only 1 asset vulnerability inside the index. I think this can only be truly closed after implementing latest transform which stores latest state of each asset vulnerability inside destination index.

elasticmachine · 2025-05-13T12:42:35Z

💚 Build Succeeded

Buildkite Build
Commit: fa357a4

History

💚 Build #25854 succeeded 46f0cac
💚 Build #25836 succeeded 57e4f49

cc @brijesh-elastic

elastic-sonarqube · 2025-05-13T12:42:41Z

Quality Gate passed

Issues
0 New issues
0 Fixed issues
0 Accepted issues

Measures
0 Security Hotspots
98.9% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube

expand documents to map each vulnerability per asset

57e4f49

brijesh-elastic self-assigned this May 12, 2025

brijesh-elastic requested a review from a team as a code owner May 12, 2025 03:47

efd6 reviewed May 12, 2025

View reviewed changes

Resovle comment

46f0cac

brijesh-elastic requested a review from efd6 May 12, 2025 09:29

kcreddy reviewed May 12, 2025

View reviewed changes

kcreddy reviewed May 13, 2025

View reviewed changes

brijesh-elastic and others added 2 commits May 13, 2025 17:50

Resolve comment

fbe6ca4

Merge branch 'main' into rapid7_insightvm-1.17.0

fa357a4

brijesh-elastic marked this pull request as draft May 19, 2025 08:45

	link: https://github.com/elastic/integrations/pull/1
	link: https://github.com/elastic/integrations/pull/13878

[rapid7_insightvm] expand documents to map each vulnerability per asset #13878

Are you sure you want to change the base?

[rapid7_insightvm] expand documents to map each vulnerability per asset #13878

Conversation

brijesh-elastic commented May 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Proposed commit message

Checklist

How to test this PR locally

Related issues

Uh oh!

elasticmachine commented May 12, 2025

Uh oh!

elastic-vault-github-plugin-prod bot commented May 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🚀 Benchmarks report

Package rapid7_insightvm 👍(0) 💚(0) 💔(2)

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kcreddy May 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kcreddy left a comment

Choose a reason for hiding this comment

Uh oh!

elasticmachine commented May 13, 2025

💚 Build Succeeded

History

Uh oh!

elastic-sonarqube bot commented May 13, 2025

Quality Gate passed

Uh oh!

Uh oh!

brijesh-elastic commented May 12, 2025 •

edited

Loading

elastic-vault-github-plugin-prod bot commented May 12, 2025 •

edited

Loading

Package `rapid7_insightvm` 👍(0) 💚(0) 💔(2)

kcreddy May 13, 2025 •

edited

Loading