Returning List of Applications #1026

mtobias-getty · 2023-08-22T17:45:34Z

mtobias-getty
Aug 22, 2023

Hello! I'm looking to pull a list of application details via the API but I'm running into some issues. My initial thought was to use the query_applications endpoint in FalconPy to pull all of the application IDs, then since there will be duplicates, dedup that list, then use that list of application IDs to query the get_applications endpoint in FalconPy.

I'm familiar with pagination in the Crowdstrike APIs but what I'm finding is that since applications are returned (seemingly) in a deviceId_applicationId format, I'm having to pull a LOT of data. Like in the low millions. Of course deduping that information hopefully will make that a much easier list to put against the get_applications endpoint, but I'm finding that even with pagination, I'm quickly running into the 10K total record return limit. Should I be going about pagination a different way? I'm not opposed to trying to cut down on some of the data using an FQL filter, but I'm finding that most filters that I'm trying such as last_seen_within don't apply to the query_applications endpoint.

Should I be thinking about this in a different way? Happy to provide any specifics that might be more helpful in understanding what I'm doing.

Specific error message the API is returning:

[400] offset 10000 and limit 100 are invalid; offset + limit must be less than or equal to 10000

Which based on some code examples I've found in this repo seems to be expected, I'm correctly paginating out to 10K, but that seems to be the hard limit.

Answered by mtobias-getty

Aug 29, 2023

Alright! I have my solution.

This was complicated to initially get my head around but I've done something similar for pulling Spotlight data. I think the Crowdstrike APIs excel when you have specific data you are looking for, they are a bit more restrictive when wanting to pull broad bits of information that you don't know in advance (per the earlier suggestions, if I knew specific applications I wanted to report on, creating a report in Crowdstrike and just pulling that data would be a much better solution, but I didn't want to assume any knowledge of what was in my environment).

One of the BEST things about the Crowdstrike API is that the rate limit is very forgiving. I have ~6,000 call…

View full answer

jshcodes · 2023-08-23T21:09:38Z

jshcodes
Aug 23, 2023
Maintainer

Hi @mtobias-getty -

I checked the Spyglass sample we have posted, and it approaches this problem in a similar fashion. (The example does support filtering, but the default behavior is to pull everything available which will hit the limit in large environments.)

Digging into the display_applications method, this example dedupes right before the sort.

Thinking through this, it may make more sense to move this logic to right before enrichment happens so we potentially reduce API calls for extended detail.

The query result maximum is a hard limit in this specific API, so filtering may be our only option if we're going the API route at that scale.

This points out at a gap in our documentation on falconpy.io regarding these filters. We'll get this updated. In the interim, here are some of them:
- id
- cid
- name
- vendor
- version
- name_vendor
- name_vendor_version
- first_seen_timestamp
- last_seen_timestamp
- last_updated_timestamp
You can also consider putting together a scheduled report, and then pull the report results from the API.

2 replies

mtobias-getty Aug 23, 2023
Author

Really appreciate the feedback. I'll try to take a look at some of those options tomorrow and will reply back here as soon as I can with the direction I went (for help in the future for anyone else running into this).

mtobias-getty Aug 24, 2023
Author

Taking a look today at this. Would I be correct in thinking that your mention of a dedupe before the sort is in API code, or can I do that? I may be missing in that example where I might be able to do that, because it would be ideal, I'm mostly looking to understand all applications in my environment, I can get details such as what applications are on what hosts other ways.

Taking a look at the scheduled report route, it looks like I can only add 50 applications to a scheduled report at a time, so I would be needing to create multiple scheduled reports all tied to static application groups, little bit less dynamic than I would hope.

I'm poking around at some of the filters, and I think there would be some hope with recently used applications, so filtering on date, and looking at an application at a time, the struggle here I think is that the API to get the list of applications in my environment is this one, I'm not familiar with another API that will return just a list of applications that I have, that I could then use as a filter for the query_applications API, to then use those IDs in the get_applications endpoint.

I feel like you might be giving me gold with the thoughts around dedupes but I'm not picking up on the specifics! Again, really appreciate your time helping me out!

jshcodes · 2023-08-24T23:12:26Z

jshcodes
Aug 24, 2023
Maintainer

Taking a look today at this. Would I be correct in thinking that your mention of a dedupe before the sort is in API code, or can I do that? I may be missing in that example where I might be able to do that, because it would be ideal, I'm mostly looking to understand all applications in my environment, I can get details such as what applications are on what hosts other ways.

Nothing super elegant I'm afraid. As we prepare the terminal display output in the display_applications method, we loop through the list applications_list. Each iteration, another list called app_list is populated with a dictionary containing application details, but only if there isn't a dictionary like the current iteration in the list. This newly created list is then sent to be sorted and used for the tabular output display. This happens after the entire list is enriched, so there may be an optimization for this example to try and dedupe just the IDs before sending the list to get_applications to get the extended application detail.

I'm poking around at some of the filters, and I think there would be some hope with recently used applications, so filtering on date, and looking at an application at a time, the struggle here I think is that the API to get the list of applications in my environment is this one, I'm not familiar with another API that will return just a list of applications that I have, that I could then use as a filter for the query_applications API, to then use those IDs in the get_applications endpoint.

Depending on the size of your environment, this may take some experimentation. There should be a combination of filters that can help us get this dataset size down. (Don't forget about FQL complex expressions.)

Again, really appreciate your time helping me out!

Happy to assist! 😃

1 reply

mtobias-getty Aug 25, 2023
Author

Ah! I think I understand what you are saying about the dedupe before going to get_applications. I was for sure expecting to do that, my struggle of course is getting the full list from the query_applications API!

For what its worth, I currently pull Spotlight data out of the API, much like the applications the API is a combo of AID and the vulnerability ID. I've found great success using the Pandas Python library to move that data around, including the dedupe.

def dedup_cve_list():
    print("De-Duplicating CSVs...\n")
    df = pd.read_csv('spotlight_cve_duplicated.csv')
    df.drop_duplicates(subset='id', inplace=True)
    df.to_csv('spotlight_cve.csv', index=False)

mtobias-getty · 2023-08-29T15:50:51Z

mtobias-getty
Aug 29, 2023
Author

Alright! I have my solution.

This was complicated to initially get my head around but I've done something similar for pulling Spotlight data. I think the Crowdstrike APIs excel when you have specific data you are looking for, they are a bit more restrictive when wanting to pull broad bits of information that you don't know in advance (per the earlier suggestions, if I knew specific applications I wanted to report on, creating a report in Crowdstrike and just pulling that data would be a much better solution, but I didn't want to assume any knowledge of what was in my environment).

One of the BEST things about the Crowdstrike API is that the rate limit is very forgiving. I have ~6,000 calls to play with and throughout all of this, which results in about 500K total application IDs, since application IDs are per application per host, I never dipped below ~5,500 API calls available at any time since they replenish so quickly.

I'm positive that there is optimization I can include in my process as well, I'm approaching this from a very conservative list size perspective and doing way more file write operations than I likely need to, this drastically slows down the run. In my testing I can get a full listing of every application installed on every host in about 30 minutes, which for a repository of data I'll probably pull once per day, is really not a bad thing. Thank you @jshcodes as always for your very quick and very helpful feedback! We can consider this question closed from my perspective.

Sort of accurate breakdown of how I'm approaching this problem

Query hosts via filter
- Applying a filter of systems seen in the last 3 days and looping via platform, so collecting data from Windows, then Mac, then Linux
- Loop 100 hosts at a time and pull back their metadata, specifically looking for their IDs ()
- query_hosts Discovery API
Use host metadata ID to pull AID list
- get_hosts Discovery API
Use host AID list to individually pull Application IDs, 100 at a time
- Saves application IDs to a temp file ~50MB
- query_applications Discovery API
Use application IDs to pull application metadata, 100 lines at a time
Read application IDs, 100 at a time, and use the get_applications API to return application metadata

And a terrible Mermaid doc (I think) illustrating my flow

graph TD
    A[Loop list of CIDs] --> B[Auth to Falcon API]
    B --> C[Loop Platform List Windows,Mac,Linux]
    C --> D[Query Hosts API with recently seen filter and Platform, return Host IDs]
    D --> E[Host IDs to get host AID via get_hosts Discovery API]
    E --> F[AIDs to get List of Application IDs via query_applications Discovery API]
    F --> G[Write App IDs to CSV]
    G --> A
    
    H[Read Application IDS from App ID file]
    H --> I[Use App IDs to query get_applications Discovery API to get app metadata]

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Returning List of Applications #1026

{{title}}

Replies: 3 comments 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Returning List of Applications #1026

mtobias-getty Aug 22, 2023

Alright! I have my solution.

Replies: 3 comments · 3 replies

jshcodes Aug 23, 2023 Maintainer

mtobias-getty Aug 23, 2023 Author

mtobias-getty Aug 24, 2023 Author

jshcodes Aug 24, 2023 Maintainer

mtobias-getty Aug 25, 2023 Author

mtobias-getty Aug 29, 2023 Author

Alright! I have my solution.

Sort of accurate breakdown of how I'm approaching this problem

And a terrible Mermaid doc (I think) illustrating my flow

mtobias-getty
Aug 22, 2023

Replies: 3 comments 3 replies

jshcodes
Aug 23, 2023
Maintainer

mtobias-getty Aug 23, 2023
Author

mtobias-getty Aug 24, 2023
Author

jshcodes
Aug 24, 2023
Maintainer

mtobias-getty Aug 25, 2023
Author

mtobias-getty
Aug 29, 2023
Author