Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add network delay scenario and automated test run by tag #66

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,6 @@ target/
*.parquet
.vscode
/analyze
*.yaml
*.yml
.env*
/book
Expand Down
80 changes: 80 additions & 0 deletions scenarios/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
# Custom cluster testing

These scenarios require the gcloud cluster to be already set up, see below if your cluster is not set up yet or see if there is already a cluster in place with istio installed.

## Once your cluster is set up

Option 1: Run script on public tag

`exer_image.py [js ceramic tag]`

Option 2: Run the workflow manually

```
kc config set-context --current --namespace=keramik

# edit the network-with-cas.yaml to specify the desired image
# edit the meta tag accordingly
kc apply -f network-with-cas.yaml # defines the ceramic version

kc config set-context --current --namespace=keramic-[your label]

kc edit statefulsets cas

####### add
- name: SQS_QUEUE_URL
value: ""
- name: MERKLE_CAR_STORAGE_MODE
value: disabled
###########

kc label namespace keramik-[your label] istio-injection=enabled

kc apply -f delay-cas.yaml

# edit write-only.yaml to match the namespace
kc apply -f write-only.yaml # runs the simulation

```

To see the results, go to https://us3.datadoghq.com/apm/home

Datadog -> APM-> (pick name) -> click -> service overview


## But first, one time, set up your testing cluster

```
gcloud config set project box-benchmarking-ipfs-testing

gcloud config set compute/zone us-central1-c

gcloud container clusters create [your cluster]

gcloud container node-pools create e2-standard-4 --cluster [your cluster] \
--machine-type=e2-standard-4 --num-nodes=3

# one time get credentials into kubectl
gcloud container clusters get-credentials [your cluster]

# if not already installed, see https://istio.io/
# curl -L https://istio.io/downloadIstio | sh -

# install istio virtual network overlay
istioctl install --set profile=demo

# set up namespace for datadog
kubectl create ns datadog-operator
helm install -n datadog-operator datadog-operator datadog/datadog-operator
# add the creds to the datadog namespace
kubectl create secret generic datadog-secret --from-literal=api-key=<YOUR APIP-KEY> \
--from-literal=app-key=<YOUR APP-KEY> -n datadog-operator

kubectl apply -f datadogAgent.yaml -n datadog-operator

# set up keramik
kubectl create ns keramik
cargo run --bin crdgen | kubectl create -f -
kubectl apply -k k8s/operator/

```
11 changes: 11 additions & 0 deletions scenarios/basic.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# basic.yaml
---
apiVersion: "keramik.3box.io/v1alpha1"
kind: Simulation
metadata:
name: basic
namespace: keramik-db
spec:
scenario: ceramic-write-only
users: 1000
run_time: 30
34 changes: 34 additions & 0 deletions scenarios/datadogAgent.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
kind: DatadogAgent
apiVersion: datadoghq.com/v2alpha1
metadata:
name: datadog
spec:
global:
kubelet:
tlsVerify: false
clusterName: gke_ipfs-ceramic-service-headless
kubelet:
tlsVerify: false
site: us3.datadoghq.com
credentials:
apiSecret:
secretName: datadog-secret
keyName: api-key
appSecret:
secretName: datadog-secret
keyName: app-key
features:
npm:
enabled: true
apm:
enabled: true
hostPortConfig:
enabled: true
admissionController:
enabled: true
mutateUnlabelled: false
otlp:
receiver:
protocols:
grpc:
endpoint: 0.0.0.0:4317
21 changes: 21 additions & 0 deletions scenarios/delay-cas.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: delay-cas
spec:
hosts:
- cas.keramik-load-with-network-errors-for-2-34-0.svc.cluster.local
http:
- match:
- port: 8081
fault:
delay:
percent: 100
fixedDelay: 10s
abort:
httpStatus: 500
percentage:
value: 10
route:
- destination:
host: cas.keramik-load-with-network-errors-for-2-34-0.svc.cluster.local
104 changes: 104 additions & 0 deletions scenarios/exer_image.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
import os
import subprocess
import sys
import re
from time import sleep


try:
img_tag = sys.argv[1]

except:
print("Image tag is required.")
print("Choose an available tag from https://hub.docker.com/r/ceramicnetwork/js-ceramic/tags")
exit(0)

label = 'load-with-network-errors-for-' + img_tag

# replace for valid chars only
label = re.sub(r'\.', '-', label)

os.system('kubectl config set-context --current --namespace=keramik')

# set the image tag
os.system("perl -pi -e 's/js-ceramic:.*$/js-ceramic:{}/g' network-with-cas.yaml".format(img_tag))

# apply the label to the network config
os.system("perl -pi -e 's/^ name:.*$/ name: {}/g' network-with-cas.yaml".format(label))

# apply the label to the simulation
os.system("perl -pi -e 's/^ namespace:.*$/ namespace: keramik-{}/g' write-only.yaml".format(label))

# apply the label to the delay config
os.system("perl -pi -e 's/cas\..*\.svc\.cluster\.local/cas.keramik-{}.svc.cluster.local/g' delay-cas.yaml".format(label))

# create the network
os.system('kubectl apply -f network-with-cas.yaml')

# switch to the network namespace
os.system('kubectl config set-context --current --namespace=keramik-{}'.format(label))

do_edit = """

kubectl patch statefulset cas --type='json' -p='[
{
"op": "add",
"path": "/spec/template/spec/containers/0/env/-",
"value": {"name": "SQS_QUEUE_URL", "value": ""}
},
{
"op": "add",
"path": "/spec/template/spec/containers/0/env/-",
"value": {"name": "MERKLE_CAR_STORAGE_MODE", "value": "disabled"}
}
]'

"""

os.system(do_edit)

os.system('kubectl label namespace keramik-{} istio-injection=enabled'.format(label))

os.system('kubectl apply -f delay-cas.yaml')

# restart the pods to make sure the delays are applied
os.system('kubectl delete pod ceramic-0 -n keramik-{}'.format(label))
os.system('kubectl delete pod cas-0 -n keramik-{}'.format(label))

# sleep after pods start to avoid initialization issues
sleep(90)

os.system('kubectl apply -f write-only.yaml')

sleep(60)

# check for errors, restart if needed
def get_good_pod():
command = "kubectl get pods | grep 'simulate-manager' | grep -v 'Error' | awk '{print $1}'"
pod_name = subprocess.check_output(command, shell=True).decode('utf-8').strip()
return pod_name

pod_name = get_good_pod()
num_errors = 0
while not pod_name:
num_errors += 1
print("Restarting simulation, error")
os.system('kubectl delete -f write-only.yaml')
sleep(30)
os.system('kubectl apply -f write-only.yaml')
sleep(60)
pod_name = get_good_pod()

if not pod_name and num_errors > 3:
print("Too many errors, Terminating run for " + label)
exit(0)

print("Running simulation for " + label)

print("See https://us3.datadoghq.com/apm/home for results")

print("to clean up in 15 minutes run `kubectl delete -f network-with-cas.html`")




34 changes: 34 additions & 0 deletions scenarios/network-with-cas.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# network configuration
---
apiVersion: "keramik.3box.io/v1alpha1"
kind: Network
metadata:
# this is what the label will be in datadog
# add the version of ceramic being tested here
name: load-with-network-errors-for-2-34-0
spec:
replicas: 3
datadog:
enabled: true
version: "normal-cas"
profilingEnabled: true
cas:
image: ceramicnetwork/ceramic-anchor-service:latest
casResourceLimits:
cpu: "2000m"
memory: "1Gi"
ceramic:
image: ceramicnetwork/js-ceramic:2.34.0
imagePullPolicy: Always
resourceLimits:
cpu: "2000m"
memory: "2Gi"
ipfs:
go:
image: ceramicnetwork/go-ipfs-daemon:develop
imagePullPolicy: IfNotPresent
resourceLimits:
cpu: "1000m"
memory: "512M"
#commands:
# - ipfs config Routing.Type none
9 changes: 9 additions & 0 deletions scenarios/write-only.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
apiVersion: "keramik.3box.io/v1alpha1"
kind: Simulation
metadata:
name: ceramic-write-only-simulation
namespace: keramik-load-with-network-errors-for-2-34-0
spec:
scenario: ceramic-write-only
users: 3000
run_time: 75