-
Notifications
You must be signed in to change notification settings - Fork 455
[rapid7_insightvm] expand documents to map each vulnerability per asset #13878
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
brijesh-elastic
wants to merge
4
commits into
elastic:main
Choose a base branch
from
brijesh-elastic:rapid7_insightvm-1.17.0
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+770
−209
Draft
Changes from all commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
386 changes: 382 additions & 4 deletions
386
packages/rapid7_insightvm/_dev/deploy/docker/files/config.yml
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
1 change: 1 addition & 0 deletions
1
packages/rapid7_insightvm/data_stream/asset/_dev/test/pipeline/test-asset.log
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,4 @@ | ||
{"assessed_for_policies":false,"assessed_for_vulnerabilities":true,"critical_vulnerabilities":0,"exploits":0,"id":"452534235-25a7-40a3-9321-28ce0b5cc90e-default-asset-199","ip":"10.1.0.128","last_assessed_for_vulnerabilities":"2020-03-20T19:19:42.611Z","last_scan_end":"2020-03-20T19:19:42.611Z","last_scan_start":"2020-03-20T19:18:13.611Z","malware_kits":0,"moderate_vulnerabilities":2,"os_architecture":"x86_64","os_description":"CentOS Linux 2.6.18","os_family":"Linux","os_name":"Linux","os_system_name":"CentOS Linux","os_type":"General","os_vendor":"CentOS","os_version":"2.6.18","risk_score":0,"severe_vulnerabilities":0,"tags":[{"name":"lab","type":"SITE"}],"total_vulnerabilities":2,"new":[],"remediated":[]} | ||
{"assessed_for_policies":false,"assessed_for_vulnerabilities":true,"critical_vulnerabilities":1,"exploits":9,"host_name":"HOST.domain.com","id":"452534235-25a7-40a3-9321-28ce0b5cc90e-default-asset-198","ip":"10.4.24.164","last_scan_end":"2020-03-20T19:12:39.766Z","last_scan_start":"2020-03-20T19:05:06.766Z","malware_kits":0,"moderate_vulnerabilities":11,"os_architecture":"","os_description":"Ubuntu Linux 12.04","os_family":"Linux","os_name":"Linux","os_system_name":"Ubuntu Linux","os_type":"","os_vendor":"Ubuntu","os_version":"12.04","risk_score":12251.76171875,"severe_vulnerabilities":16,"tags":[{"name":"all_assets2","type":"CUSTOM"},{"name":"all_assets","type":"CUSTOM"},{"name":"Linux","type":"CUSTOM"},{"name":"docker hosts","type":"SITE"},{"name":"lab","type":"SITE"}],"total_vulnerabilities":28,"new":[],"remediated":[],"unique_identifiers":{"id":"4421d73dfe04f594df731e6bcd8156a","source":"R7 Agent"}} | ||
{"data":[],"metadata":{"number":0,"size":0,"totalResources":2195,"totalPages":2195,"cursor":null},"links":[{"href":"https://us.api.insight.rapid7.com:443/vm/v4/integration/assets?page=0&size=2","rel":"first"},{"href":"https://us.api.insight.rapid7.com:443/vm/v4/integration/assets?page=0&size=2","rel":"self"},{"href":"https://us.api.insight.rapid7.com:443/vm/v4/integration/assets?page=1097&size=2","rel":"last"}]} | ||
{"assessed_for_policies":false,"assessed_for_vulnerabilities":true,"credential_assessments":[{"port":22,"protocol":"TCP","status":"NO_CREDS_SUPPLIED"}],"critical_vulnerabilities":1,"exploits":1,"id":"8bcfe121-1234-5678-9012-c4a6abcdabcde-default-asset-4","ip":"175.16.199.1","last_assessed_for_vulnerabilities":"2025-05-08T06:51:31.736Z","last_scan_end":"2025-05-08T06:51:31.736Z","last_scan_start":"2025-05-08T06:51:19.193Z","mac":"00:00:5E:00:53:00","malware_kits":0,"moderate_vulnerabilities":0,"os_architecture":"","os_description":"Ubuntu Linux","os_family":"Linux","os_name":"Linux","os_system_name":"Ubuntu Linux","os_type":"","os_vendor":"Ubuntu","risk_score":1268,"severe_vulnerabilities":1,"tags":[{"name":"test","type":"SITE"}],"total_vulnerabilities":2,"unique_identifiers":[],"new":[],"remediated":[],"same":{"check_id":null,"first_found":"2025-05-08T06:51:31Z","key":"","last_found":"2025-05-08T06:51:31.736Z","nic":null,"port":22,"proof":"<p><ul><li>Running SSH service</li><li>Product OpenSSH exists -- OpenBSD OpenSSH 8.9p1</li><li>Vulnerable version of product OpenSSH found -- OpenBSD OpenSSH 8.9p1</li></ul><p>Vulnerable version of OpenSSH detected on Ubuntu Linux</p></p>","protocol":"TCP","solution_fix":"<p>Download and apply the upgrade from: <a href=\"https://ftp.openbsd.org/pub/OpenBSD/OpenSSH\">https://ftp.openbsd.org/pub/OpenBSD/OpenSSH</a></p>","solution_id":"openbsd-openssh-upgrade-latest","solution_summary":"Upgrade to the latest version of OpenSSH","solution_type":"rollup","status":"VULNERABLE_VERS","vulnerability_id":"openbsd-openssh-cve-2024-6387"}} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May I know why this part if removed?
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@efd6 @jamiehynds, I would like to get your thoughts on this. Its related to #9354 which requires splitting current documents to make one doc per asset vulnerability.
What we currently have is an incremental ingestion using
url.params.comparisonTime
inside httpjson. With this, as per the assets API doc, we are going to get 3 arrays:new[]
,same[]
, andremediated[]
. If we go with incremental ingestion, we might need to split on bothnew[]
andremediated[]
(because thats where the incremental updates are going to be), which httpjson is not capable of. httpjson cannot split on 2 arrays.To get around this limitation, following are our options:
url.params.comparisonTime
, this way we always have only 1 array:same[]
where split works using httpjson. This is what current PR is doing. I think if we are okay with this, we should atleast increment major version because of drastic increase in volume for users.new[]
andremediated[]
. A future enhancement (for CDR work) is going to happen which joins our current asset API data onto vulnerability API data. This join is also going to be easier to implement inside CEL instead of httpjson chain calls. Because of this, I think it is better to rewrite in CEL now itself.WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding more points to consider after discussion with @efd6 on the above 2 options:
I think CEL option (both inputs cohabiting) might be lesser evil of the two options. We could leave the httpjson input for users that still want to use it.
@jamiehynds let me know your thoughts if we can go ahead with adding new CEL input.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kcreddy If CEL is the better long-term technical fit, I think it makes sense to rewrite in CEL now. That said, we should avoid keeping both input types side-by-side long-term, as it inevitably creates user confusion — we’ve already seen this play out with the Google Workspace integration.
As part of the major version bump, could we mark the httpjson data stream within the integration as legacy (or something equivalent) and will be removed in a future update. We also need to clearly document what’s included in the update. We can surface the breaking changes to users through the new notification mechanism and required their acknowledgement before updating.
While it's not possible to hide inputs for users not currently using the integration, maybe this could be a forcing function and could bring to the Fleet team. i.e. for any user who doesn't currently use the Rapid7 integration, they'll only get presented with the CEL input.
On the topic of increased data volume — do we have a rough estimate of the scale (2x, 10x, etc.)? Even if it’s a temporary spike with CEL, we’ll need to give users a heads-up.
And regarding duplicates — are we getting a duplicate for every vulnerability, or is it certain events or records being duplicated? Just want to be clear on what exactly users will see.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jamiehynds When @kcreddy and I were talking, we were thinking of the model that we've used in the mimecast integration, where the HTTPJSON input data stream is labelled legacy (without reference to the input) and the CEL input data stream is labelled v2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main prerequisite for this PR was to analyse existing data properly for CDR work item: https://github.com/elastic/security-team/issues/12534, but it looks like we need not merge this PR, as I was able to perform the analysis based on this PR code itself (without merging). I was able to unblock myself by testing this PR change and doing the analysis.
For now, I think it makes sense to mark this PR as
Draft
and work on adding CDR changes to this PR itself. That way we only have 1breaking-change
i.e., all changes for split docs + rewrite in CEL (deprecate httpjson) + CDR mappings (and transform) instead of having 2breaking-changes
. It would thus make more sense now to rewrite this in CEL.cc: @brijesh-elastic
@efd6, with mimecast both httpjson and CEL co-existing would make more sense because CEL is for v2 API while httpjson is for older v1 API and also this older v1 API is not deprecated.
In Rapid7 case, both httpjson and CEL will be relying on same API. CEL will have more enriched data (because it calls vulnerability API as well) but essentially it ingests same asset vulnerabilities. In Rapid7, I believe there could be slightly more user confusion than mimecast if we make both inputs coexist. I think we should mark httpjson as
(deprecated)
by clearly documenting the update and hope our Fleet team can handle presenting only CEL input to new users. WDYT?@jamiehynds, I think it makes sense to me to add a deprecation to the title suffix such as
(deprecated)
for httpjson input so that users become aware when they see it. If Fleet team can base their input off of this title suffix, it will be great because we don't have a proper way to deprecate input.Regarding the questions:
The estimated increase in volume is directly proportional to number vulnerabilities in user's environment. If the user's environment has
N
vulnerabilities in their assets, the increase is going to be~Nx
.This is because existing code was ingesting 1 document per asset (all vulnerabilities are bundled together in this document), but after this change we are going to ingest 1 document per asset per vulnerability.
The duplication is difficult to categorise. The initial pull of CEL will ingest all existing assets along with their vulnerabilities (historical) before it reaches where the previous cursor was for httpjson. One could say that an asset information, which was earlier in 1 document, will now be in
1 (httpjson)+ N (CEL)
documents . The vulnerability information, which is in 1 document, will now be in1 (httpjson)+ 1 (CEL)
documents.If we want to avoid all the fuss, we also have another option: We could add a new data_stream called
asset_vulnerabilities
on top of the existing data_streamsasset
orvulnerabilities
. There won't be any duplicates if we take this approach. The existing data_streams can still continue to work.asset
data_stream.vulnerability
data_stream:asset_vulnerabilities
data_stream:Let me know your thoughts @jamiehynds @efd6.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what I was thinking.
Leaving the
asset_vulnerabilities
query to @jamiehynds, but it does look appealing.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leaving the httpjson input version of the
asset_data_stream
makes sense to avoid significant breaking changes for existing users. It's far from ideal, but the only solution we have right now is to mark it as deprecated. This situation is going to come up again I'm sure and reiterates the need for seamless method to move from httpjson to CEL and the ability to hide deprecated data streams for new users of an integration.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the input @jamiehynds. Just to confirm it, are we going with:
asset_vulnerabilities
in CEL and then also deprecateasset
data_stream (based on httpjson) completely?OR
asset
data_stream, add CEL input, and deprecatehttpjson
input?Option 1 will avoid significant breaking changes, because its a new data_stream.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just got clarification from @jamiehynds below:
@brijesh-elastic, please go with option 1 from above comment, i.e., creating new data_stream
asset_vulnerability
and deprecate existingasset
data_stream.I think we should close this PR and start on new one because the changes here are no longer valid.
Please create 1 PR adding this new data_stream (incrementing major version) that closes: #9354, #13775, and #13776