Skip to content

Commit

Permalink
chore(docs): developer docs on plugin system communication
Browse files Browse the repository at this point in the history
Signed-off-by: jlanson <[email protected]>
  • Loading branch information
j-lanson committed Dec 19, 2024
1 parent a02ceea commit c062616
Show file tree
Hide file tree
Showing 7 changed files with 286 additions and 0 deletions.
4 changes: 4 additions & 0 deletions site/content/docs/contributing/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,4 +30,8 @@ A walkthrough of running Hipcheck for the first time.
A walkthrough of running Hipcheck for the first time.
{% end %}

{% waypoint(title="Developer Docs", path="@/docs/contributing/developer-docs/_index.md") %}
Documentation for Hipcheck developers.
{% end %}

</div>
21 changes: 21 additions & 0 deletions site/content/docs/contributing/developer-docs/_index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
---
title: Developer Docs
template: docs.html
sort_by: weight
page_template: docs_page.html
weight: 3
---

# Hipcheck Developer Docs

<div class="grid grid-cols-2 gap-8 mt-8">

{% waypoint(title="Architecture", path="@/docs/contributing/developer-docs/architecture.md") %}
Hipcheck's distributed architecture and how plugins get started.
{% end %}

{% waypoint(title="Query System", path="@/docs/contributing/developer-docs/plugin-query-system.md") %}
The life of a plugin query from inception, through gRPC, to SDK, and back.
{% end %}

</div>
56 changes: 56 additions & 0 deletions site/content/docs/contributing/developer-docs/architecture.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
---
title: The Hipcheck Architecture
weight: 2
---

# The Hipcheck Architecture and Plugin Startup

This document describes the distributed architecture of Hipcheck and how plugins
get started.

Hipcheck is a relatively simple multiprocessed tool that follows a star
topology. Users invoke the main Hipcheck binary, often referred to as "Hipcheck
core" or `hc`, on the command line, and provide a [policy file][policy_file]
which specifies the set of top-level plugins to use during analysis. Once
Hipcheck resolves these plugins and their dependencies, it starts each plugin in
a separate child process. Once all plugins are started and initialized, Hipcheck
core enters the analysis phase. During this phase it acts as a simple hub for
querying top-level plugins and relaying queries between plugins, as plugins are
intended to only communicate with each other through the core.

## Plugin Startup

Hipcheck core uses the `plugins/manager.rs::PluginExecutor` struct to start
plugins. The `PluginExecutor` has fields like `max_spawn_attempts` and
`backoff_interval` for controlling the startup process. These fields can be
configured using the `Exec.kdl` file.

The main function in `PluginExecutor` is `start_plugin()`, which takes a
description of a plugin on file and returns a `Result` containing a handle to
the plugin process, called `PluginContext`.

In `start_plugin()`, once the `PluginExecutor` has done the work of locating the
plugin entrypoint binary on disk, it moves into a loop of attempting to start
the plugin, at most `max_spawn_attempts` times. For each spawn attempt, it will
call `PluginExecutor::get_available_port()` to get valid local port to tell the
plugin to listen on. The executor creates a
`std::process::Command` object for the child process, with `stdout/stderr`
forwarded to Hipcheck core.

Within each spawn attempt, `PluginExecutor` will try to connect to the port on
which the plugin should be listening. Since process startup and port
initialization can take differing amounts of time, the executor does a series of
up to `max_conn_attempts` connection attempts. For each failed connection,
the executor waits `backoff_interval`, which increases linearly with the number of
failed connections. The calculated backoff is modulated by a random `jitter`
between 0 and `jitter_percent`.

Overall, the sleep duration between failed connections is equal to

(backoff * conn_attempts) * (1.0 +/- jitter)

As soon as `PluginExecutor::start_plugin()` successfully starts and connects to the child
process, it stores the process and plugin information in a `PluginContext` and
returns it to the caller. It however returns an error if `max_spawn_attempts` is reached.

[policy_file]: @/docs/guide/config/policy-file.md
202 changes: 202 additions & 0 deletions site/content/docs/contributing/developer-docs/plugin-query-system.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,202 @@
---
title: The Hipcheck Plugin Query System
weight: 2
---

# The Hipcheck Plugin Query System

This document describes the control flow through [Hipcheck core][hc_core], down
to gRPC, into the [Rust SDK][rust_sdk], and back to the core during a plugin
system query. This document assumes the plugins are already started and
configured, and that we have established a gRPC stream with them over which to
send and receive messages defined by our `hipcheck-common/proto` protobuf
schema.

{{ image(path="images/developer_docs_plugin_grpc.png") }}

## Overview and Design Requirements

Hipcheck plugins are child processes of the Hipcheck core process that it
communicates with over distinct gRPC channels. Each Hipcheck plugin defines a
set of query endpoints that act as remote functions. They each receive a JSON-
serialized key and return a JSON-serialized result. During a query endpoint's
execution, it may need to invoke another plugin's query endpoint(s). All
communicaton between plugins goes through Hipcheck core, so just as Hipcheck
core issues a request to a given endpoint, the endpoint can request Hipcheck
core to issue another request to a different endpoint and report back the
reponse so that the original endpoint can complete its own behavior.

A "session" describes the series of messages between Hipcheck core and a query
endpoint needed to complete a single query. This includes those queries made by
the endpoint to other plugins as part of answering the original query. Hipcheck
expects each plugin to be able to handle multiple active "sessions," such that
if a queried endpoint is waiting for it's own request to be responded to by
Hipcheck core, the plugin process is not blocked from receiving and handling new
queries, including to the same query endpoint. Thus, each query object sent to
and from the plugin has a session ID field to associate it with a particular
session.

In Rust, the gRPC channel is accessed with using a `mpsc::{Sender, Receiver}`
pair. The `Sender` can be cloned many times, meaning that many threads can
`send` messages on the channel without needing exclusive access to any resource.
However, a `Receiver` cannot be shared. The Rust SDK addresses this restriction
and the above multiple-session, single-channel requirement by having a
`HcSessionSocket` object that has the exclusive `Receiver` to the plugin's gRPC
channel with Hipcheck core. The `HcSessionSocket` is responsible for tracking
the set of live sessions, and detecting whether a new message from the gRPC
channel should be forwarded to a live session, or constitues an entirely new
session. Each session is represented by a `PluginEngine` instance. The
`HcSessionSocket` sets up its own `mpsc` channel with the new `PluginEngine`
and gives it a clone of the gRPC channel `Sender`. In summation, all gRPC
messages received by the Rust SDK must go through `HcSessionSocket` for
demultiplexing, but each `PluginEngine` can send messages on the gRPC channel
directly.

### `hipcheck-common` and Chunking

The actual type that we can send over our gRPC channel to the live plugin is
called `PluginQuery`, and is automatically defined by the Rust code generated
from the protobuf definitions in `hipcheck-common/proto`. We choose to define
this high-level `Query` object to allow us to control the Hipcheck-facing struct
definition. For instance, `PluginQuery`'s `state` field is an `i32`, but for
`Query` we can make `state` a custom `enum` and translate from
`PluginQuery.state` to improve readability.

An additional complexity is that gRPC has a maximum per-message size of 4MB. To
abstract this reality from users, the `hipcheck-common` crate defines a chunking
algorithm used by both Hipcheck core and the Rust SDK. Each code-facing `Query`
object is chunked into one or more `PluginQuery` objects before being sent on
the wire, and on the listening side the message is de-fragemented with a
`hipcheck-common::QuerySyntesizer`.

## Part 1: Sending a request to a plugin

The plugin query system begins with a call to `score_results()`, which
iterates through all the policy file's top-level analyses one-by-one.
For each, `score_results()` calls `HcEngine::query()`, which is the
entrypoint for all queries to plugins. `HcEngine::query()` is memo-ized
using the `salsa` crate, so the running `hc` core binary caches all
queries and responses sent through `HcEngine::query()`. If later in
execution `HcEngine::query()` is called again for the same set of
parameters, it will return the cached output value without involving
the plugin process.

As described in the Overview, Hipcheck core has a unique gRPC channel with each
running plugin, so the first thing `HcEngine::query()` must do is find the
appropriate channel handle for the target plugin. The `HcPluginCore` object that
powers `HcEngine` under the hood (set with `HcEngine::set_core()`) has a map
containing all the plugin handles. `HcEngine::query()` keys this map using the
target publisher/plugin pair to get the appropriate plugin handle, which is an
object of type `ActivePlugin`. It then forwards the target query endpoint and
key to `ActivePlugin::query()`.

Now that we have the active plugin handle, and therefore the right gRPC channel
for this query, we can formulate a query message. `ActivePlugin::query()`
formulates the high-level `Query` object and forwards it to the `query()`
function of the contained `PluginTransport` type. `ActivePlugin` is merely a
thin wrapper around `PluginTransport` with some additional state tracking
the next session ID to use.

Inside `PluginTransport::query()` is where the `Query` object gets chunked into
a `Vec<PluginQuery>` and each one gets sent over the gRPC channel. We have now
successfully sent out a query.

## Part 2 - Receiving Queries from gRPC

Meanwhile, the plugin process (if using the Rust SDK), has been
listening on the gRPC channel with `HcSessionSocket.rx::recv()`. As mentioned in
the Overview, there is one `HcSessionSocket` instance that receives
all `PluginQuery` messages off the wire. Each message is returned
to the `HcSessionSocket::listen()` function, which determines if the message's
session ID matches its list of active sessions. If not, this newly-received `PluginQuery`
object marks the start of a new session, so the `HcSessionSocket` creates and
initializes a `PluginEngine` instance to handle it. `HcSessionSocket` creates
a one-way `mpsc` channel for it to forward `PluginQuery` objects with the
appropriate session ID to this `PluginEngine`. Thus, when a `PluginEngine`
called `recv()` on its channel that it shares with `HcSessionSocket`, it can be sure that all messages
have the same session ID. The last thing `HcSessionSocket::listen()` does
is forward the `PluginQuery` over this channel, then goes back to listening
for gRPC messages.

The `PluginQuery` travels up through `PluginEngine::recv_raw()` into
`PluginEngine::recv()`, where it is de-fragmentized with zero or more
other messages to produce a software-facing `Query` object.

If this is the first `Query` to a new `PluginEngine`, the object is
received by `PluginEngine::handle_session_fallible()`. The `PluginEngine`
doesn't yet know which query endpoint to call, so it has to match
`Query.name` against the output of `Plugin.queries()` to find the right
one. Once we have the right endpoint, we take the key (the argument) from
`Query.key` and call the endpoint with it.

## Part 3 - Querying other plugins

Now we are actually executing query endpoint code. Over the course of its
execution, the endpoint may need information from another plugin. To enable
the query endpoint to do so, each query endpoint is provided a handle to
its associated `PluginEngine` along with the query key. The endpoint can then
call `PluginEngine::query()` with the plugin publisher and name, the target
query endpoint name, and the query key. Within `PluginEngine::query()`, these
parameters are formulated into a `Query` object and forwarded to
`PluginEngine::send()`. The `send()` function uses the chunking algorithm from
`hipcheck-common` to produce a `Vec<PluginQuery>` and send them out over the
gRPC channel `Sender` with `PluginEngine.tx::send()`. As a reminder, this does
not go back through the `HcSessionSocket`, the `PluginEngine` can send messages
to Hipcheck core directly.

## Part 4 - Receiving and Interpreting Messages from Plugins

When we last left the Hipcheck core, it had just sent its `Vec<PluginQuery>`
over gRPC with `PluginTransport.tx::send()`. Note that this is just one thread
of execution in Hipcheck core. Just as a plugin process must be able to handle
multiple live sessions, the Hipcheck core may have multiple tasks each executing
independent queries. Thus, Hipcheck has the same issue of ensuring messages
received from the gRPC channel make it to the correct `PluginTransport` objects,
but it solves this problem differently than the Rust SDK does.

Each `PluginTransport` object shares a `Mutex` that guards the
`MultiplexedQueryReceiver` object. While the `PluginTransport` waits for a
message from the `PluginEngine` session that was spawned remotely to handle its
request, it enters a loop. In each iteration of the loop, it blocks until it can
acquire the `MulitplexedQueryReceiver`. Once it has acquired the receiver, it
checks the receiver's backlog for any messages matching its target session ID.
If none are found, it listens on the gRPC wire directly for the next message. If
the next message matches our session, we take the message, otherwise we put it
in the backlog to save it for the `PluginTransport` that does want that message.
After this, we drop our lock on the `Mutex<MultiplexedQueryReceiver>` and
restart the loop. The reason we drop and re-acquire the lock is so that one
`PluginTransport` that spends a very long time waiting for its message(s) does
not prevent other `PluginTransport` instances from receiving their messages. By
dropping and trying to re-acquire the `Mutex` lock, we give other
`PluginTransport` instances a chance to acquire the receiver.

The `PluginTransport` continues this loop until it has received all the
`PluginQuery` objects it needs to de-fragment into a `Query` object. It then
returns the `Query` to the caller, which is `ActivePlugin::query()`. This
function does the job of converting `Query` into a Hipcheck core-specific type
called `PluginResponse`. Until now, the Hipcheck core has not really checked the
content of the `Query`, but now it needs to decide whether the `Query` is the
query endpoint returning a value or requesting additional information. The
`PluginResponse` enum separates these two possibilities, plus an additional
error variant.

`ActivePlugin::query()` returns the `PluginResponse` up to the caller, namely
`HcEngine::query()`. Here, if the `PluginResponse` was `Completed`, we have
finished the query and return its output value that was stored as a field in
`Completed`. Otherwise, we have to recursively call `HcEngine::query()` with the
query information stored in `PluginResponse::AwaitingResult`.

Once this recursive call completes, we must forward the output of that query to
forward to our original query endpoint who asked for it. We do this by passing
that output to `ActivePlugin::resume_query()`. One of the main differences of
this function is that the generated `Query` object uses an existing session ID
instead of a newly-generated one, since this `Query` is part of an ongoing
session.

The original query endpoint may return a `PluginResponse::AwaitingResult` zero
or more times, but eventually we will get a `PluginResponse::Completed`, and by
passing the contained output up to the calling function, we have completed a
query using the plugin system!

[hc_core]: @/docs/contributing/developer-docs/architecture.md
[rust_sdk]: @/docs/guide/making-plugins/rust-sdk.md
1 change: 1 addition & 0 deletions site/static/assets/developer_docs_plugin_grpc.drawio
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
<mxfile host="charts.mitre.org" modified="2024-12-16T18:24:52.210Z" agent="5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36 Edg/131.0.0.0" version="20.8.5" etag="K-q-bwHREtJwMxqPlC3R" type="device"><diagram id="kli-Tog3HTVBQzar_-YL" name="Page-1">7Vxrc+K2Gv41zLQfyPgCBj4mJDl7prd0t+e0/ZQRtsDqGotIdoD99dXVNxnigsHebHdmJ7Ysy+LV+7x3aeDO17v/ELAJf8IBjAaOFewG7v3AcRxrPGN/eMtetkwd1bAiKJBNdt7wCX2BqtFSrSkKIC11TDCOErQpN/o4jqGflNoAIXhb7rbEUfmrG7CCRsMnH0Rm6+8oSEL1K8ZW3v4BolWov2xb6ska6M6qgYYgwNtCk/swcOcE40RerXdzGHHiabrI9x4PPM0mRmCcNHnh6Zffv4DlyPdmT788v/rYI3QxHI/kMK8gStUvVrNN9poEBKdxAPko1sC924YogZ82wOdPt2zRWVuYrCN2Z7PLJYqiOY4wEe+6wRhOgxFrpwnBn2HhydRZuJ7HnqgJQJLA3cGfZmcEY5wG8RomZM+6aDbzFI0Vk9kjdb/Nl2yi1yEsLNdYvwgUm6yysXNKsgtFzH9AWHt6YcICOF36dYT1/ClcLNsh7DDj3iOUHU2vSlnPIOz/KOS/iuBXxMnJLtNFhGgICf9+LFqidIViPnS8xEKGDPgv8yI23bsF6+etEkFk0dvHhHBhwn4Sez2C4j3+VsgvKeRP8JIP7ifoFdYOJb9IsxksMdkCwq8GnvWSQkYJTiL1lF1/hqqFzc29NZgFBkwuqVtMkhCvcAyih7z1rsxOeZ8fMd4oJvoLJsleCVmQ8g8VWUx+k3/oOLuweeGU+PAY+ytRDcgKJsfkTz37ERgBQdmSOG8dowYnwZitGbwhlK+Ae/vBfxAN8k4s2nffnwfjFkA5KkPSmZqQdKwaSI4uhUhTh/w3gQQkHBj4VaAQAj8UrL0ZRvBV2AiAseaeIlqLngAyKAosF4C3wRHy9xlmmJKOOkIKWymy/4O/fzPWt3+q4cTN/a50t1d3LSLMaYiwA7x0HYQ5BmdQJltzgIm7ZwJpGiX0XGhVNORs9vg4m10EcrbXNeRmBmF/S0ksfxNgn4CECuwIPQfY/1+FxmF4XPwlNFsVcFUlJV5EzKYmAmIeWHNqxwu6ySj6zUFu3BByky4hNzY4Q9s+j2scZMi7FZbLk3zUZ/U2srrGmnZFiySFhEFlzZVQmMafUbzi0IprcCaBxGzTIPXrDUWOzv9DX5uUcknKwzAszzO1R6H4IzVqdTAcQ/1NqTOZHQvRq5xgZgSz6fiQUq2ev1kdOmmqQ89VourVJ8wkcs7pdoXVvSoLy4mptypcnE3jdMaeHBYVyX4DaSYsTD6TXPobATHdsLXvtQQZO51LENPTOJXSN8lOmU5MDPSP1l5NfODKtG4Q0aIh2PDLZQR3tzxIKIRMoC7v/QhQimRwBZBEN8dcuJboVyY23KGkIM/Y3Z9a1LHrXJrxGy3MijYr83lAKlb/kGi03xCNYrqPiBOsbVFpN/XonWk9vxQYwq5jiKzxTKFajQWOZxVOkz/VEKrGQCO3PNDEbiadGbuAfaHbhnegzbXAZGQdnZc7PdqfXcgZtKoqbNOsXH18mgsDCDAvIaq1IH4EC/akBBkQoVXMMcY4kJsed1z6IObK36oHaxQE0sCAFH0BCzEeZ2ZFRzb4+G4wvj8mvlQyQL08yIKYRcY/IjwOh0OtG2tmTUrkd1vhWrs86FDf6xHwcinCju0bAbbpSjLV85LytMTdB7TxQ+h/Zs/nzFPPH5h6Ksy6PlLiG+zAlii5HCdw65JKQ9M2YuJKbhdFrWpqQf8Nq6GBGnfFtq+pAB0zQI5kOI4Kr4AprRWPxQ085RDccLsNQR5/0fFnx5KR8HyRC2upeWCBSQDJ0Jc0vRWfJN8Nh8X27yWv5FwjeGUNEj+UPsnPYA0Dw2liztXFPq4/Jj0jEWNcEsx9udAXsoPzufK1SCoi+CipFW8XdpDqbYMWdbqWXW/r9E7D9HqahvX8XAnXS1O5GLKvEVMiofNMmfuLcPy8BFGEmGTpgSXtVPR6D+L6JuERD1G8KASh9SaCa/bzGBOI4AeBLykiQs4EIAE1+TUJNBDjRGbntBs0F6SiVLybxynyPB7vYI4XM+khn+k55S0SwTyr1mFK7dQgx+Wx3zSY6XpHuda68Txvcp4NdHkjZzT6d/FPWvxZl4J/ZAaotiGMC2hn1qKfRlr8+JgLJGHnzMUkk5Rw/V2VGh9FvmnIpfhG6H/5gWNi4qvzZEbHPRkG25kzKvsc7XgybmlQ172WH+OYHirT7JltIK0+ccmMuh6q++5zio5ZslRvZx0KUj6obrsbZlu/9pDG3UcnXdOkysRUMQsk8zjcC7LW0iEp5YWMtI9OEzH0goJHI3NADSI1HWnCtwKal9eEs6aa0OlSEzpmkOYUaCr5x8D5TMC2hwDtPlXjmqmaHKA8VUtRvIoMQOahCwXAg75PXyB5IPNQm8PI0ha1OYxTwd0mjKcNYeyca9DWGz0TU9HcWIV/TnnApimI6rBj7+iwBxIS13S0XNMKC+BQFEnkpbB0HzMU8HhtffGf0IOq4ijKZZjWffLBSmoCXeNUftjDOqXOVd3ZGqxZOsp1rEZc2RrHOS2rxh6qxe5roEa2QZRvOYDiek3Nxk5rAUemR8fL1qSf8JJCmhgCtFmBKH1nwZI3cnw8yGlVoiXqm+fmfUuDXi3p65o5wtOFZl/KvjLnoT8JlBrddGrl6HXLRoup0F5Ui15epDctBnUPFDhdR6S7h4o2T0FuT2oIq8DtQRSgnVCo7JPsbnpK6O7joXrgf+3KQr7tGnblSaWK2bZ37ZdUt5xX+nvO0f6XKVUcmyH23OilGxxT2MTqbbXm6F0bz1Y9K+bG89Stljcqu+jcUt/SoFfLNGqQvvN6pInXN3N6bHotOdQ+psKb/XT/w7EqVRrwAlUi+lbo+14rVMeTcWkd6ypU605wuNgyeg2OxrjMDo22d1sU1zAfr7oL45R9IW3aFE1zI7bl1jPStbZuTCsSx5tWWLBp3qRqdxgDtbR1o/odna85NK9hRqn6Fy5jEHmmvnqPeze8A375e967oTOARraatS14kb3IsJkBosP1I6Lynk2dYh+BRFTHZeW+QW64dp9nu7gvpsRhI8HZ7TE21qEC+RO2l5LdszrzqE8JsaFj9S1yYeuEQPFwkyyxXQVcHpEtHMphGKe6tMQIBavdKBravStn7zy5rSDYCKyd1jRnE313u+4NjHYfxrWtmr1wXBkOPAXPjzoiJHa+oThg5kxyoCxFlZeLWvN5hsVC7bg8korvVumLOuxBaZZtNT9l40Ak6Uz7zLC6Mya8UomKrTfPX5cljvqnFZe56rNeQWA3ZotuxbWZb/sqDgk0pHH34TvbMgM/zNvjP7PO5smOLqsX1lXprGS3TKhvQ6g2GKqTA/WOwhDQzJDizo0xijDR9EYiYXnRrRhI+EWoUDojvlR9X0xaO0zmNse+6IVWgdw0Z6Ww1B2UD8Xtv5qj0QxQd18YaNecjvbGGY8W/6KBnMtL1cr5kPx0yLbOhzQW5pK7uXgCIzs1XNoo+dnr7sPf</diagram></mxfile>
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 2 additions & 0 deletions site/templates/shortcodes/image.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
{% set image_url = get_url(path=path) %}
<img src="{{ image_url }}">

0 comments on commit c062616

Please sign in to comment.