abnormal_security.ai_security_mailbox: items in scanning state are not properly updated

The integration's data source may include items that are not currently classified; they are returned in with `judgementStatus` state of "Scanning".

The current algorithm for collecting documents from the API is (blurring heavily over the details):
```
start ← currently stored last cursor value or value calculated from user configuration if there is no last cursor
end ← now
while we can paginate
    get data from endpoint between start and end
    publish data
cursor ← end
```

These events progress the cursor (even though we do not gain useful information from them), meaning that the integration never revisits the event to obtain it when the `judgementStatus` _has_ been resolved to a useful vales. We could pin the cursor timestamp to before the earliest `judgementStatus`:"Scanning" to ensure that we always retry collecting these documents (falling back to the current time as we currently do if there are none), but this would result in re-ingestion of all documents after that time, including documents that were in a resolved state. We could instead add a look-back time which would be improve the situation probabilistically; we could get arbitrarily close to 100% _also_ at the cost of potential re-ingestion.

Instead of these approaches, I propose that we change the algorithm to maintain a list of work items that are in the scanning state:
```
start ← currently stored last cursor value or value calculated from user configuration if there is no last cursor
end ← now
scanning list ← [] if not present in cursor, other wise cursor scanning list
while we can paginate including items in scanning list
    get data from endpoint between start and end
    if data is in scanning state
        add to scanning list
    else
        publish data
cursor ← {end, scanning list}
```
This has the potential to have unbounded growth on the scanning list in pathological cases (i.e. where abnormal security have a continued behaviour of not moving items out of scanning), so we may want to have a condition to check that to avoid the issue, publishing and error in the case that we exceed some high water; this would require dumping all events in the scanning list and emitting an error to ingest, but the situation could be improved to dumping only the oldest (or newest) items on the scanning list after the two-parameter `tail`, and `front` list functions become available.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

abnormal_security.ai_security_mailbox: items in scanning state are not properly updated #12932

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

abnormal_security.ai_security_mailbox: items in scanning state are not properly updated #12932

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions