Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collector remains in failed state after disk full #308

Open
jvincze84 opened this issue Nov 9, 2018 · 3 comments
Open

Collector remains in failed state after disk full #308

jvincze84 opened this issue Nov 9, 2018 · 3 comments

Comments

@jvincze84
Copy link

Hi,

Problem description

If the host run out of disk space (no space left on device) and after free up some space the collector remains failed state until it is manually restarted on Graylog interface.

Steps to reproduce the problem

  1. Wait for disk full
  2. Free up some space
  3. See the collector status on Graylog Web interface. It should be in Failed state.
  4. After restart the collector if will be in Running sate.

Environment

  • Sidecar Version: 0.1.5
  • Graylog Version: Graylog 2.4.4+4659dbe
  • Operating System: Red Hat Enterprise Linux Server release 7.1 (Maipo)
  • Elasticsearch Version: 5.6.5
  • MongoDB Version: mongodb-linux-x86_64-rhel70-3.6.2

Do you have any suggestion?
How can I configure sidecar and/or filebeat not to give up trying the restart after 3 tries?

Thank you in advance,
Janos Vincze

@mariussturm
Copy link
Contributor

Hi,
thanks for the feedback!
Currently there is no way to say that the Sidecar should not stop trying to restart the collector. But we could consider of adding this to the next major release.

@jvincze84
Copy link
Author

Thank you very much for you super fast reply.

@jvincze84
Copy link
Author

jvincze84 commented Nov 10, 2018

Hi,

I wrote a shell script which tries to restart failing collector through Graylog API.
Maybe not the best solution, but I hope it can help to somebody else as well.
Be aware this script is not fully tested.

Before use GL_* variables must be set.

#!/usr/bin/env bash
set -o errexit
set -o nounset
#set -o xtrace

###
## Redirect ALL output to a FILE
## LOG="[LOG file location]"  
## exec >> $LOG 2>&1  
###

GL_USER='janos.vincze'
GL_PASS=''
GL_HOST=''
GL_PORT='80'


function LOG() {
echo "[ $(date +%F\ %T) ]  - ${1}"
}


TMPFILE_COLLECTORS=$( mktemp /tmp/gl-tmp.XXXXXXXXX )
TMPFILE_FAILED=$( mktemp /tmp/gl-tmp.XXXXXXXXXX )
i=0

LOG "========================= SCRIPT STARTED ========================="
LOG "Query All Collectors And Status"
curl "http://${GL_USER}:${GL_PASS}@${GL_HOST}:${GL_PORT}/api/plugins/org.graylog.plugins.collector/collectors" 2>/dev/null > ${TMPFILE_COLLECTORS}
LOG "Collecting collectors where the status of filebeat backend is not null (0), but the collector itself is in ACTIVE state"
cat ${TMPFILE_COLLECTORS} | jq -c '.collectors[] |  select(.active == true and .node_details.status.backends.filebeat.status!=0)' | jq -r '.id' > ${TMPFILE_FAILED}

while IFS='' read -r ID || [[ -n "$ID" ]]; do
NODE_NAME=$( cat ${TMPFILE_COLLECTORS} | jq -c ".collectors[] | select(.id==\"${ID}\")" | jq -r '.node_id' )
LOG "Restarting collector sidecar on node: ${NODE_NAME} (ID: ${ID})"
LOG "###### RESPONSE ######"
curl -i -X PUT "http://${GL_USER}:${GL_PASS}@${GL_HOST}:${GL_PORT}/api/plugins/org.graylog.plugins.collector/collectors/${ID}/action" -H 'Content-Type: application/json' -d'
[
  {
    "backend": "filebeat",
    "properties": {
      "restart": true
    }
  }
]' 2>/dev/null |  while read line; do echo "--------------------------> $line"; done
LOG "######################"

i=$((i+1))

done < ${TMPFILE_FAILED}



rm ${TMPFILE_COLLECTORS} ${TMPFILE_FAILED}
[ $i -eq 0 ] && LOG "Yuuupi, there are no Failing collectors" || LOG "There were $i failed collectors"
LOG "========================= SCRIPT FINISHED ========================="

Best Regards,
Janos Vincze

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants