-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Collector remains in failed state after disk full #308
Comments
Hi, |
Thank you very much for you super fast reply. |
Hi, I wrote a shell script which tries to restart failing collector through Graylog API. Before use #!/usr/bin/env bash
set -o errexit
set -o nounset
#set -o xtrace
###
## Redirect ALL output to a FILE
## LOG="[LOG file location]"
## exec >> $LOG 2>&1
###
GL_USER='janos.vincze'
GL_PASS=''
GL_HOST=''
GL_PORT='80'
function LOG() {
echo "[ $(date +%F\ %T) ] - ${1}"
}
TMPFILE_COLLECTORS=$( mktemp /tmp/gl-tmp.XXXXXXXXX )
TMPFILE_FAILED=$( mktemp /tmp/gl-tmp.XXXXXXXXXX )
i=0
LOG "========================= SCRIPT STARTED ========================="
LOG "Query All Collectors And Status"
curl "http://${GL_USER}:${GL_PASS}@${GL_HOST}:${GL_PORT}/api/plugins/org.graylog.plugins.collector/collectors" 2>/dev/null > ${TMPFILE_COLLECTORS}
LOG "Collecting collectors where the status of filebeat backend is not null (0), but the collector itself is in ACTIVE state"
cat ${TMPFILE_COLLECTORS} | jq -c '.collectors[] | select(.active == true and .node_details.status.backends.filebeat.status!=0)' | jq -r '.id' > ${TMPFILE_FAILED}
while IFS='' read -r ID || [[ -n "$ID" ]]; do
NODE_NAME=$( cat ${TMPFILE_COLLECTORS} | jq -c ".collectors[] | select(.id==\"${ID}\")" | jq -r '.node_id' )
LOG "Restarting collector sidecar on node: ${NODE_NAME} (ID: ${ID})"
LOG "###### RESPONSE ######"
curl -i -X PUT "http://${GL_USER}:${GL_PASS}@${GL_HOST}:${GL_PORT}/api/plugins/org.graylog.plugins.collector/collectors/${ID}/action" -H 'Content-Type: application/json' -d'
[
{
"backend": "filebeat",
"properties": {
"restart": true
}
}
]' 2>/dev/null | while read line; do echo "--------------------------> $line"; done
LOG "######################"
i=$((i+1))
done < ${TMPFILE_FAILED}
rm ${TMPFILE_COLLECTORS} ${TMPFILE_FAILED}
[ $i -eq 0 ] && LOG "Yuuupi, there are no Failing collectors" || LOG "There were $i failed collectors"
LOG "========================= SCRIPT FINISHED =========================" Best Regards, |
Hi,
Problem description
If the host run out of disk space (no space left on device) and after free up some space the collector remains failed state until it is manually restarted on Graylog interface.
Steps to reproduce the problem
Environment
Do you have any suggestion?
How can I configure sidecar and/or filebeat not to give up trying the restart after 3 tries?
Thank you in advance,
Janos Vincze
The text was updated successfully, but these errors were encountered: