Skip to content

Latest commit

 

History

History
215 lines (156 loc) · 6.88 KB

2022-05-17-11532-chronicle-sink.md

File metadata and controls

215 lines (156 loc) · 6.88 KB

RFC 11532 - 2022-05-17 - Google Chronicle Sink

This RFC proposes two new sinks that send data into Google Chronicle. Chronicle is a Security Information and Event Management (SIEM) platform to store, aggregate and search security telemetry.

Context

Cross cutting concerns

  • The schema support required for the UDM events relies on this PR being merged.

Scope

In scope

  • A sink that sends data into Google Chronicle. This includes UDM events and unstructured log events.

Out of scope

  • Adding any explicit knowledge of SIEM events as a separate event type into Vector will not be considered.

Pain

Proposal

User Experience

This adds two new sinks to Vector to send data into Google Chronicle.

UDM (Unified Data Model) events

The gcp_chronicle_udm sink.

https://cloud.google.com/chronicle/docs/reference/ingestion-api#udmevents https://cloud.google.com/chronicle/docs/unified-data-model/udm-usage

UDM events are events that conform to a strict schema. An example event:

{
  "metadata": {
    "event_timestamp": "2019-10-23T04:00:00.000Z",
    "event_type": "NETWORK_HTTP",
    "product_name": "Acme Proxy",
    "vendor_name": "Acme"
  },
  "network": {
    "http": {
      "method": "GET",
      "response_code": 200
    }
  },
  "principal": {
    "hostname": "host0",
    "ip": "10.1.2.3",
    "port": 60000
  },
  "target": {
    "hostname": "www.altostrat.com",
    "ip": "198.51.100.68",
    "port": 443,
    "url": "www.altostrat.com/images/logo.png"
  }
}

Events can be batched and sent in a JSON object that also includes the customer_id:

{
"customer_id": "c8c65bfa-5f2c-42d4-9189-64bb7b939f2c",
"events": [...]
}

UDM events can be submitted as either JSON or Proto3. We will submit using JSON as this fits in easier with Vectors data model. The newly developed schema support for Vector will be used to ensure the JSON is formed according to the Chronicle requirements.

Vector has no internal knowledge of SIEM events so it will be up to the user to ensure the event is well formed - either the source will receive the data in the correct structure, or it will be transformed into such before reaching the sink.

Unstructured log data events

The gcp_chronicle_unstructured sink.

Unstructured log events are events in plain text with no strict schema attached.

{
  "log_text": "26-Feb-2019 13:35:02.187 client 10.120.20.32#4238: query: altostrat.com IN A + (203.0.113.102)",
  "ts_epoch_microseconds": 1551188102187000
}

Events are batched and sent in a JSON object that includes the customer_id and a log_type:

{
  "customer_id": "c8c65bfa-5f2c-42d4-9189-64bb7b939f2c",
  "log_type": "BIND_DNS",
  "entries": [...]
}

log_type has to be one of the values returned by the logtypes endpoint.

If event_type == "unstructured" we will require another configuration field - log_type to be set. This field will be templatable. The field can be removed from the final event via encoding.exclude_fields.

The encoding of the message (text or json) can be set via an encoding field.

Schema

A UDM message has the following sections. (To keep this RFC succinct, this is not a comprehensive list, the full list can be found here.)

  • Metadata Contains metadata about the event. The fields available for metadata can be found here. Mandatory fields are event_type and event_timestamp.

    The value of event_type has repercussions for which fields are mandatory in the rest of the event. See here for the list. For example, if event_type == "FILE_COPY" then src.file becomes mandatory. **Currently it is not possible to validate a field based on the value of another field within Vector schema, so this will need to be left for future improvement.

  • Principal The principal is the entity that originates the event. This field is mandatory and can contain any of the fields defined here.

  • Src The source entity being acted upon. Depending on the event_type this field is optional. Can contain any of the fields defined here.

  • Target The target entity being acted upon. Depending on the event_type this field is optional. Can contain any of the fields defined here.

Implementation

Both event types require different batching strategies. Unstructured log events are partitioned by log_type. UDM events do not require partitioning.

Error handling

According to the Chronicle docs:

Clients should retry on 500 and 503 errors with exponential backoff. The minimum delay should be 1s unless it is documented otherwise. For 429 errors, the client may retry with minimum 30s delay. For all other errors, retry may not be applicable.

I believe the current GCP sinks implement this behaviour so we can reuse the logic from these sinks.

Rationale

There is significant user demand for sending data into Google Chronicle.

Drawbacks

Adding a new sink does place an additional maintenance burden on the team. In particular since Chronicle is a third party service, developing integration tests to ensure the sink continues to work correctly is burdensome.

Prior Art

Alternatives

An option that was considered was to use the HTTP sink and enhance it to allow GCP authentication. This would nearly work for UDM events, but unfortunately since the payload is a JSON object with a customer_id field and an array of events, the structure is slightly different to what is possible with the HTTP sink.

Outstanding Questions

Plan Of Attack

Incremental steps to execute this change. These will be converted to issues after the RFC is approved:

  • Create the unstructured log events sink.
  • Create the UDM sink (without schema support).
  • Add schema support for UDM events.

Future Improvements

  • Consider sending the event data as Proto3 rather than JSON. We would likely need definitive profiling data to assure ourselves that the performance enhancements would be worth the extra complexity of protobufs.
  • Enhance Vector schema to enforce mandatory fields based on the value of metadata.event_type.