For the EIDA Technical Committee and EIDA Management Board that need to improve there services quality, Oculus is a central monitoring and alerting system that tests all the services at EIDA nodes. Unlike the previous situation where the monitoring was very scattered and uneven, OCULUS will provide a global view of the services status and indicators for keeping track of service quality evolution.
- How to monitor a new thing
- Deploying Oculus Zabbix and Grafana on Kubernetes using Helm
- Zabbix configuration
- Deploying Oculus Grafana
So you woud like to monitor something related to EIDA federation ?
Please create a new issue using the template "New Monitoring".
In order to edit Nodes values is in this procedures
- Kubernetes cluster (version 1.20 or later) configured and running
kubectl
installed and configuredgit
installed and configured- Helm CLI (version 3 or later) installed https://helm.sh/docs/intro/install
- Plugin Helm secret https://github.com/jkroepke/helm-secrets
- Sops core https://github.com/getsops/sops
- Sufficient resources in the cluster to run Zabbix components
git clone https://github.com/EIDA/oculus-monitoring-backend
cd zabbix_server/helm_values
helm repo add zabbix-community https://zabbix-community.github.io/helm-zabbix
helm repo update
kubectl create namespace eida-monitoring
CREATE USER oculus WITH PASSWORD '{password}';
CREATE DATABASE oculus_zabbix OWNER oculus;
We recommend to use pgcli
Usage :
pgcli postgres://{user}@{netloc}/{dbname}
Example:
pgcli postgres://[email protected]/oculus_zabbix
cd oculus-monitoring-backend/zabbix_server/helm_values
sops decrypt values.yaml
/!\ TODO
Apply Helm Chart
export ZABBIX_CHART_VERSION='7.0.6'
helm secrets upgrade --install oculus-zabbix zabbix-community/zabbix \
--dependency-update \
--version $ZABBIX_CHART_VERSION \
-f values.yaml -n eida-monitoring --debug
-
Port forward
kubectl port-forward service/oculus-zabbix-zabbix-web 8888:80 -n eida-monitoring
-
Default credentials:
- Username: Admin
- Password: zabbix
Create a configuration file for each agent in oculus-zbx-agent-deployments
(for instant epos-france.yaml
).
Set the content according to this template:
Complet template example here
---
node: Epos-France # Will be used as identifier for the agent
endpoint: ws.resif.fr # The endpoint to test
routingFile: routing/eida_routing.xml
onlineCheck: # Set default test parameters for each services
net: FR
sta: CIEL
loc: "00"
cha: HHZ
start: 2025-02-01T00:00:00
end: 2025-02-01T00:00:05
Then deploy (or update) the agent using helm:
helm upgrade -i epos-france oculus-zbx-agent --set-file zbx_lld=oculus-zbx-agent-deployments/epos-france.yaml -n eida-monitoring
for f in $(find oculus-zbx-agent-deployments -type f); do name=$(basename $f|cut -f1 -d'.'); echo $name; echo $f; helm upgrade -i $name oculus-zbx-agent --set-file zbx_lld=$f -n eida-monitoring; done
Go to "Data collection > Templates"
- Select "Import" in the top right corner and select the files "zbx_export_templates.yaml" (OR "zbx_export_templates_discovery.xml" and "zbx_export_web_templates.yaml" ) location :
zabbix_server/templates
- Rules: all checked
- Click on "Import"
Go to "Alerts > Actions > Autoregistration actions" and create a new action with the following parameters:
- Action:
- Name: EIDA nodes autoregistration
- Enabled: checked
- Operations:
- Add host
- Add to host groups: Discovered hosts
- Link templates: Template discovery (Templates/EIDA)
- Link templates: Linux by Zabbix agent (Templates/Operating Systems)
- Ennable hosts
- Click "Add"
For activate mail triggers
- Go to "Alerts > Media types" and Enabled email, click on Email, and configure with your SMTP server, username, password etc.
- Enabled: checked
- Click "Update"
For deploying playbook with Ansible, you need to install Ansible
Groups must be created for each EIDA node, as well as users.
cd ansible/playbooks
ansible-playbook create_users.yaml
- Go to "Alerts > Actions > Trigger actions"
- Click "Create action"
- Name: Reports problems
- Type of calculation: And/or
- Conditions, click "Add"
- Type : Trigger
- Operator: equals
- Trigger source: Template
- Triggers: click "Select"
- Select "Template/EIDA > Template Webservice", select all
- Click "Add"
- Enabled: checked
- Click "Operations"
- Default operations step duration: 1h
- Operations, (create a step for each EIDA Nodes) click "Add"
- Steps: 1 - 1
- Step duration : 0
- Send to user groups > Select {EIDA_nodes_name}
- Send only to : Email
- Click "Add"
- Update operations (create a step for each EIDA Nodes) click "Add"
- Operation : Send message
- Send to user groups > Select {EIDA_nodes_name}
- Send only to : Email
- Click "Add"
- Pause operations for symptom problems: checked
- Pause operations for suppressed problems: checked
- Notify about canceled escalations: checked
- Click "Add"
- Check if status is "Enabled"
- Kubernetes cluster (version 1.20 or later) configured and running
kubectl
installed and configuredgit
installed and configured- Helm CLI (version 3 or later) installed https://helm.sh/docs/intro/install
- Plugin Helm secret https://github.com/jkroepke/helm-secrets
- Sops core https://github.com/getsops/sops
- Sufficient resources in the cluster to run Grafana components
git clone https://github.com/EIDA/oculus-monitoring-backend
cd grafana_server/helm_values
helm repo add grafana https://grafana.github.io/helm-charts
helm repo update
cd oculus-monitoring-backend/grafana_server/helm_values
sops decrypt values.yaml
helm secrets upgrade --install oculus-grafana grafana/grafana \
-f values.yaml -n eida-monitoring
- Port forward
kubectl port-forward service/oculus-grafana 3000:3000 -n eida-monitoring
- localhost:3000
- Default credentials:
- Username: /
- Password: /
You must first create a new user and user groups for Grafana in Zabbix.
Go to "Users > User groups"
- Click "Create user group"
- Group name: API-RO
- Enabled: check
- Click "Template permissions"
- click "Add"
- Click "Select"
- Select: All Template groups
- Click "Select"
- permissions: Read
- Click "Host permissions"
- Click "Add"
- Click "Select"
- Select: All Host groups
- Click "Select"
- Click "Problem tag filter"
- Click "Add"
- Click "Select"
- Select: All Host groups EXCEPT "Application", "Databases", "Hypervisors", "Linux servers", "Virtual machines" and "Zabbix servers"
- Click: "Select"
- Click "Add"
- Click "Update"
Go to "Users > Users"
- Click "Create User"
- Username: grafana
- Groups: API-RO and No access to the frontend
- Password: {passwd_user_grafana}
- Click "Permissions"
- Role: Select "User role"
- Click "Add"
Go to "Users > API token"
- Click "Create API token"
- Name: grafana
- User: grafana
- Set expiration date and time: uncheck
- Enabled: check
- Click "Add"
- Copy the {auth_token}
Normally, the Zabbix plugin is installed, but if this is not the case, install it manually:
Go to "Administration > General > Plugins and data"
- Plugins:
- Search "Zabbix"
- Click "Install"
Go to "Connections > Data sources"
- Click "+ Add new data source"
- select Zabbix
- Rename in "oculus-zabbix-datasource"
- Connection:
- url:
http://oculus-zabbix-zabbix-web:8888/api_jsonrpc.php
- url:
- Authentication
- Select "Basic authentication"
- user : grafana
- password : {passwd_user_grafana}
- Select "Basic authentication"
- Zabbix Connection
- API token : {auth_token}
- Trends : Enable
- Click "Save & test