Skip to content
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.

Commit 1898611

Browse files
Adan Urban ReyesAdan Urban Reyes
Adan Urban Reyes
authored and
Adan Urban Reyes
committedDec 6, 2023
feat(scale-agent): doc scale agent horizontal scaling feature
Jira REQ: https://armory.atlassian.net/browse/CDSH-801
1 parent 91dd0ac commit 1898611

File tree

3 files changed

+177
-4
lines changed

3 files changed

+177
-4
lines changed
 
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,122 @@
1+
---
2+
title: Horizontal Scaling Architecture and Features
3+
linkTitle: Horizontal Scaling
4+
description: >
5+
Learn how the Horizontal Scaling feature helps by distributing operations across Armory Scale Agent replicas in your Armory Continuous Deployment or Spinnaker environment.
6+
aliases:
7+
- /scale-agent/tasks/horizontal-scaling/
8+
---
9+
10+
## Overview of Horizontal Scaling
11+
12+
Rather than sending operations to the first Scale Agent instance that could handle it, horizontal Scaling provides a way to improve operations by distributing them across all the Scale Agent replicas that could handle it.
13+
14+
### How to enable and use Horizontal Scaling
15+
16+
First, familiarize yourself with the architecture and features in this guide. Then you can:
17+
18+
1. {{< linkWithTitle "plugins/scale-agent/tasks/horizontal-scaling/operations-enable.md" >}}
19+
20+
## Horizontal Scaling glossary
21+
22+
- **K8s Operation**: an abstraction of a K8s operation; Get, List, Add, Delete, Patch etc.
23+
- **Dynamic account Operation**: an abstraction of a dynamic account operation; Add or Unregister accounts
24+
- **Endpoint**: the URL segment after the Clouddriver root
25+
- **Request**: an instruction that isn’t fulfilled immediately and can have different outcomes; a request can be done through HTTP by the admin or internally by one of the services.
26+
27+
## Architecture
28+
29+
First is important to understand the main difference between K8s operations and Dynamic account operations.
30+
31+
|K8s |Dynamic account |
32+
|---------------------------------------------|-------------------------------------------------------|
33+
|Are handled by a single Scale Agent Instance |Could be handled by more than one Scale Agent Instance |
34+
|Are processed on every polling cycle |Are processed on demand |
35+
36+
37+
The Scale Agent stores K8s and Dynamic Account operations data in dedicated tables that act like a queue:
38+
- `clouddriver.kubesvc_operation`: Has the information of new received operations
39+
- `clouddriver.kubesvc_operation_single_assign`: Has the information of K8s operations that could be assigned just to a single Scale Agent Instance
40+
- `clouddriver.kubesvc_operation_multiple_assign`: Has the information of dynamic account operations that could be assigned to multiple Scale Agent Instances
41+
- `clouddriver.kubesvc_operation_history`: Has the information of K8s and dynamic account operations responses
42+
43+
### K8s Operations
44+
45+
The Scale Agent Plugin creates a job per Scale Agent Instance registration, this job is in charge of:
46+
1. Fetching pending K8s operations from `clouddriver.kubesvc_operation` table
47+
2. Assigning pending K8s operations on clouddriver.kubesvc_operation_single_assign table
48+
3. Fetch assigned K8s operations from `clouddriver.kubesvc_operation_single_assign` table and send it to Scale Agent
49+
50+
Some important thing to know about it, is that when getting a bad operation response and there is still time to do a retry (based on `kubesvc.cache.operationWaitMs` property), the Scale Agent Plugin does the following:
51+
The Scale Agent Plugin does:
52+
1. Stored the response on `clouddriver.kubesvc_operation_history` table
53+
2. Unassigns the operation from `clouddriver.kubesvc_operation_single_assign` table, so that another or the same Scale Agent instance can take it again
54+
55+
```mermaid
56+
C4Deployment
57+
title Scale Agent Horizontal Scaling Registration Jobs
58+
Boundary(spin, "Armory Continuous Deployment or Spinnaker", "Instance", $borderColor="#0FC2C0") {
59+
Boundary(cd, "Clouddriver", "Service", $borderColor="orange") {
60+
System(sap, "Scale Agent Plugin<br/>", "For each registration creates a job to assign and send<br/>every N milliseconds the maximum number of K8s operations.<br/><br/>N = kubesvc.operations.database.scan.initialDelay | maxDelay<br/>maximum number = kubesvc.operations.database.scan.batchSize")
61+
System(saj0, "Scale Agent Job 0", "")
62+
System(saj1, "Scale Agent Job 1", "")
63+
System(saj2, "Scale Agent Job 2", "")
64+
UpdateElementStyle(saj0, $bgColor="#04AA6D", $borderColor="none")
65+
UpdateElementStyle(saj1, $bgColor="#f44336", $borderColor="none")
66+
UpdateElementStyle(saj2, $bgColor="#555555", $borderColor="none")
67+
}
68+
Boundary(sa, "Armory Scale Agent", "Service", $borderColor="purple") {
69+
System(sar0, "Replica 0", "")
70+
System(sar1, "Replica 1", "")
71+
System(sar2, "Replica 2", "")
72+
UpdateElementStyle(sar0, $bgColor="#04AA6D", $borderColor="none")
73+
UpdateElementStyle(sar1, $bgColor="#f44336", $borderColor="none")
74+
UpdateElementStyle(sar2, $bgColor="#555555", $borderColor="none")
75+
}
76+
Rel(sar0, sap, "Registration", "")
77+
UpdateRelStyle(sar0, sap, $textColor="black", $lineColor="#04AA6D")
78+
Rel(sar1, sap, "Registration", "")
79+
UpdateRelStyle(sar1, sap, $textColor="black", $lineColor="#f44336")
80+
Rel(sar2, sap, "Registration", "")
81+
UpdateRelStyle(sar2, sap, $textColor="black", $lineColor="#555555")
82+
Rel(sap, saj0, "Create")
83+
UpdateRelStyle(sap, saj0, $textColor="black", $lineColor="#04AA6D")
84+
Rel(sap, saj1, "Create")
85+
UpdateRelStyle(sap, saj1, $textColor="black", $lineColor="#f44336", $offsetX="-30", $offsetY="55")
86+
Rel(sap, saj2, "Create")
87+
UpdateRelStyle(sap, saj2, $textColor="black", $lineColor="#555555", $offsetX="-60", $offsetY="155")
88+
BiRel(sar0, saj0, "HandleOp", "request/response")
89+
UpdateRelStyle(sar0, saj0, $textColor="black", $lineColor="#04AA6D", $offsetX="-100", $offsetY="30")
90+
BiRel(sar1, saj1, "HandleOp", "request/response")
91+
UpdateRelStyle(sar1, saj1, $textColor="black", $lineColor="#f44336")
92+
BiRel(sar2, saj2, "HandleOp", "request/response")
93+
UpdateRelStyle(sar2, saj2, $textColor="black", $lineColor="#555555")
94+
}
95+
UpdateLayoutConfig($c4ShapeInRow="1", $c4BoundaryInRow="2")
96+
```
97+
98+
### Dynamic account Operations
99+
100+
Since dynamic account operations requests are less usual, the Scale Agent Plugin flow is as follows:
101+
102+
1. Receive and store the new dynamic account operation on `clouddriver.kubesvc_operation` table
103+
2. Assign the dynamic account operation on `clouddriver.kubesvc_operation_multiple_assign` table; it could be assigned to all connected Scale Agent instance or to instances with the recived zoneId
104+
3. Notify to all instances to fetch pending dynamic account operations from `clouddriver.kubesvc_operation_multiple_assign` table
105+
4. Each instance reads and sends pending dynamic account operations to Scale Agent
106+
5. Wait and send the response back
107+
108+
```mermaid
109+
sequenceDiagram
110+
actor User
111+
participant Plugin
112+
participant Service
113+
114+
User->>Plugin: Send dynamic account operation
115+
Plugin->>Plugin: Store in clouddriver.kubesvc_operation
116+
Plugin->>Plugin: Assign on clouddriver.kubesvc_operation_multiple_assign
117+
Plugin->>Plugin: Notify all to read and send pending operations
118+
Plugin->>Service: gRPC HandleOp
119+
Service-->>Plugin: return
120+
Plugin->>Plugin: Store response in clouddriver.kubesvc_operation_history
121+
Plugin-->>User: return
122+
```
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
---
2+
title: Enable and Configure Operations Horizontal Scaling in the Armory Scale Agent
3+
linkTitle: Enable Operations Horizontal Scaling
4+
description: >
5+
Learn how to enable and configure the Operations Horizontal Scaling feature in Armory Scale Agent for Spinnaker and Kubernetes.
6+
---
7+
8+
## {{% heading "prereq" %}}
9+
10+
* You are familiar with {{< linkWithTitle "plugins/scale-agent/concepts/horizontal-scaling" >}}.
11+
12+
## Scale Agent plugin
13+
14+
> Operations Horizontal Scaling was introduce starting with plugin versions v0.13.20/0.12.21/0.11.56.
15+
16+
You should enable Operations Horizontal Scaling by setting `kubesvc.cluster: database` in your plugin configuration. For example:
17+
18+
{{< highlight bash "linenos=table,hl_lines=27-28">}}
19+
spec:
20+
spinnakerConfig:
21+
profiles:
22+
clouddriver:
23+
spinnaker:
24+
extensibility:
25+
repositories:
26+
armory-agent-k8s-spinplug-releases:
27+
enabled: true
28+
url: https://raw.githubusercontent.com/armory-io/agent-k8s-spinplug-releases/master/repositories.json
29+
plugins:
30+
Armory.Kubesvc:
31+
enabled: true
32+
version: 0.13.20 # Replace with a version compatible with your Armory CD version
33+
extensions:
34+
armory.kubesvc:
35+
enabled: true
36+
# Plugin config
37+
kubesvc:
38+
cluster: database
39+
operations:
40+
database:
41+
scan:
42+
batchSize: <int> # (Optional) # requires kubesvc.cluster: database be enable
43+
initialDelay:<int> # (Optional) # requires kubesvc.cluster: database be enable
44+
maxDelay:<int> # (Optional) # requires kubesvc.cluster: database be enable
45+
{{< /highlight >}}
46+
47+
`operations.database.scan`:
48+
49+
* **batchSize**: (Optional) default: 5; The max number of operations that could be assigned to an Scale Agent instance per cycle
50+
* **initialDelay**: (Optional) default: 250; Milliseconds to wait per cycle, when there are pending operations
51+
* **maxDelay**: (Optional) default: 2000; Milliseconds to wait per cycle, when there are not pending operations

‎static/csv/agent/agent-plugin-config-options.csv

+4-4
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ Setting|Type|Default|Description
88
<code>kubesvc.cache.namespaceExpiryMinutes</code>|integer|0|Disabled by default, set it to a value greater than 0 to enable. Specifies minutes to keep namespace definitions in memory to reduce calls to the database.
99
<code>kubesvc.cache.onDemandQuickWaitMs</code>|integer|10000|How long to wait for a recache operation.
1010
<code>kubesvc.cache.operationWaitMs</code>|integer|30000|How long to wait for a Kubernetes operation like deploy, scale, delete, or others
11-
<code>kubesvc.cluster</code>|string|none|Type of clustering.<br><code>local</code>: for development only; don’t try to coordinate with other Clouddriver instances<br><code>redis</code>: use Redis to coordinate via pubsub. Redis will be deprecated in a future release.<br><span class='badge badge-primary'>0.10.24+</span><span class='badge badge-primary'>0.9.40</span><span class='badge badge-primary'>0.8.48</span> <code>kubernetes</code>:(Recommended) Requires additional <code>cluster-kubernetes</connected> configuration.
11+
<code>kubesvc.cluster</code>|string|none|Type of clustering.<br><code>local</code>: for development only; don’t try to coordinate with other Clouddriver instances<br><code>redis</code>: use Redis to coordinate via pubsub. Redis will be deprecated in a future release.<br><span class='badge badge-primary'>0.10.24+</span><span class='badge badge-primary'>0.9.40</span><span class='badge badge-primary'>0.8.48</span> <code>kubernetes</code>:(Recommended) Requires additional <code>cluster-kubernetes</code> configuration.<br><span class='badge badge-primary'>0.13.19+</span><span class='badge badge-primary'>0.12.20+</span><span class='badge badge-primary'>0.11.56+</span> <code>database</code>: Makes database act like a queue to coordinate, improves operations distribution, requires additional <code>operations.database.scan</connected> configuration.
1212
<code>kubesvc.cluster-kubernetes.kubeconfigFile</code><br><code>kubesvc.cluster-kubernetes.verifySsl</code><br><code>kubesvc.cluster-kubernetes.namespace</code><br><code>kubesvc.cluster-kubernetes.httpPortName</code><br><code>kubesvc.cluster-kubernetes.clouddriverServiceNamePrefix</code>|string<br>boolean<br>string<br>string<br>string<br>|null<br>true<br>null<br>http<br>spin-clouddriver|(Optional) If configured, the plugin uses this file to discover Endpoints. If not configured, it will use the service account mounted to the pod.<br>(Optional) Whether to verify the Kubernetes API cert or not.<br>(Optional) If configured, the plugin watches Endpoints in this namespace. If null, it watches endpoints in the namespace indicated in the file <code>/var/run/secrets/kubernetes.io/serviceaccount/namespace</code><br>(Optional) Name of the port configured in clouddriver Service that forwards traffic to clouddriver http port for REST requests.<br>(Optional) Name prefix of the Kubernetes Service pointing to the Clouddriver standard HTTP port.
1313
<code>kubesvc.credentials.poller.reloadFrequencyMs</code>|long|30000|<span class='badge badge-primary'>2.23.0+</span> <span class='badge badge-primary'>1.23.0+</span> How often the plugin will refresh account credentials to clouddriver in case <code>credentials.poller.enabled</code> is disabled. Otherwise the standard properties of <code>credentials.poller.enabled</code> and <code>credentials.poller.types.kubernetes.reloadFrequencyMs</code> are respected
1414
<code>kubesvc.disableV2Provider</code>|boolean|false|If you don’t need the V2 provider account, set that to true to speed up caching deserialization.
@@ -41,6 +41,6 @@ Setting|Type|Default|Description
4141
<code>kubesvc.v2-cache-eviction.batch-size</code>|integer|5|<span class='badge badge-primary'>0.10.3+</span> How many Kubernetes kinds to evict for each eviction event.
4242
<code>kubesvc.v2-cache-eviction.millis</code>|integer|200|<span class='badge badge-primary'>0.10.3+</span> The time between evictions in milliseconds. Using a low value can lead to a spike in resource usage when migration occurs.
4343
<code>kubesvc.ops.processTime.metric.result.maxLength</code>|integer|255|How many characters as a maximum could have the <code>kubesvc.ops.processTime.result</code> attribute metric
44-
45-
46-
44+
<code>kubesvc.operations.database.scan.batchSize</code>|integer|5|The max number of operations that could be assigned to an Scale Agent instance per cycle
45+
<code>kubesvc.operations.database.scan.initialDelay</code>|integer|250|Milliseconds to wait per cycle, when there are pending operations
46+
<code>kubesvc.operations.database.scan.maxDelay</code>|integer|2000|Milliseconds to wait per cycle, when there are not pending operations

0 commit comments

Comments
 (0)
Please sign in to comment.