Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Cronicle service not starting after it has stopped #877

Open
2 tasks done
Sederfo opened this issue Mar 3, 2025 · 7 comments
Open
2 tasks done

[Bug]: Cronicle service not starting after it has stopped #877

Sederfo opened this issue Mar 3, 2025 · 7 comments
Labels

Comments

@Sederfo
Copy link

Sederfo commented Mar 3, 2025

Is there an existing issue for this?

  • I have searched the existing issues

What happened?

Cronicle service failed on 2 days ago:

# systemctl status cronicle
× cronicle.service - Node Cronicle
     Loaded: loaded (/etc/systemd/system/cronicle.service; enabled; vendor preset: enabled)
     Active: failed (Result: exit-code) since Sat 2025-03-01 00:00:07 CET; 2 days ago
   Main PID: 1759 (code=exited, status=7)
        CPU: 10h 37min 3.109s

Mar 01 00:00:07 server systemd[1]: cronicle.service: Main process exited, code=exited, status=7/NOTRUNNING


I cannot start it at all:

[root@server:/opt/cronicle/bin$]
# ./control.sh status
status: Cronicle Server not running (pid 2891013?)
[root@server:/opt/cronicle/bin$]
# ./control.sh start
./control.sh start: Starting up Cronicle Server...
./control.sh start: Cronicle Server started
[root@server:/opt/cronicle/bin$]
# ./control.sh status
status: Cronicle Server not running (pid 2891862?)

this was the error i got using:

# /opt/cronicle/bin/debug.sh
[1740989556.66][2025-03-03 09:12:36][server][2891955][Storage][debug][1][Beginning database recovery, see logs/recovery.log for details][]
Error: ENOSPC: no space left on device, open 'logs/recovery.log'
    at Object.writeFileSync (node:fs:2367:20)
    at Object.appendFileSync (node:fs:2448:6)
    at Logger.print (/opt/cronicle/node_modules/pixl-logger/logger.js:242:26)
    at constructor.logDebug (/opt/cronicle/node_modules/pixl-server/component.js:71:16)
    at /opt/cronicle/node_modules/pixl-server-storage/transaction.js:323:9
    at FSReqCallback.oncomplete (node:fs:187:23) {
  errno: -28,
  code: 'ENOSPC',
  syscall: 'open',
  path: 'logs/recovery.log'
}

the file it tried to open was logs/Storage.log and it was really small (77kb) but even nano could not open it... it gave me the same error.

Space is not a problem, we have enough space free on disk. What may be the issue? The service stopped on saturday and we lost a lot of important logs and actions.

Operating System

Ubuntu 22.04

Node.js Version

v20.18.0

Cronicle Version

0.9.71

Server Setup

Single Server

Storage Setup

Local Filesystem

Relevant log output

# /opt/cronicle/bin/debug.sh
[1740989556.66][2025-03-03 09:12:36][server][2891955][Storage][debug][1][Beginning database recovery, see logs/recovery.log for details][]
Error: ENOSPC: no space left on device, open 'logs/recovery.log'
    at Object.writeFileSync (node:fs:2367:20)
    at Object.appendFileSync (node:fs:2448:6)
    at Logger.print (/opt/cronicle/node_modules/pixl-logger/logger.js:242:26)
    at constructor.logDebug (/opt/cronicle/node_modules/pixl-server/component.js:71:16)
    at /opt/cronicle/node_modules/pixl-server-storage/transaction.js:323:9
    at FSReqCallback.oncomplete (node:fs:187:23) {
  errno: -28,
  code: 'ENOSPC',
  syscall: 'open',
  path: 'logs/recovery.log'
}



job logs at time of failure:
node:events:510
    throw err; // Unhandled 'error' event
    ^

Error [ERR_UNHANDLED_ERROR]: Unhandled error. ('Error in output stream: write EPIPE')
    at __construct.emit (node:events:508:17)
    at Socket.<anonymous> (/opt/cronicle/node_modules/pixl-json-stream/json-stream.js:99:10)
    at Socket.emit (node:events:519:28)
    at emitErrorNT (node:internal/streams/destroy:169:8)
    at emitErrorCloseNT (node:internal/streams/destroy:128:3)
    at process.processTicksAndRejections (node:internal/process/task_queues:82:21) {
  code: 'ERR_UNHANDLED_ERROR',
  context: 'Error in output stream: write EPIPE'
}

Node.js v20.18.0

Code of Conduct

  • I agree to follow this project's Code of Conduct
@Sederfo Sederfo added the bug label Mar 3, 2025
@Sederfo
Copy link
Author

Sederfo commented Mar 3, 2025

After looking at previous issues, I have found a similar issue to mine. Apologies for the duplicate. #570

I still would need some assistance on this matter though..

Image

as you can see, it's trying to write to a file that gives nano an error message.

this is the output (last 2 entries) of debug.sh

[1740994458.568][2025-03-03 10:34:18][fr10204vmx][2926429][crash][debug][1][Emergency shutdown: Could not rollback transaction: _cleanup/2025/04/02: Failed to restore record: _cleanup/2025/04/02/0: Failed to write file: _cleanup/2025/04/02/0: data/_temp/fe4dc2dc998e9c47295cf9bf980b1761.json.tmp.1: ENOSPC: no space left on device, open 'data/_temp/fe4dc2dc998e9c47295cf9bf980b1761.json.tmp.1'][]
[1740994458.568][2025-03-03 10:34:18][fr10204vmx][2926429][Storage][debug][1][Exiting][]

Could we maybe convert this issue from a bug to a feature request? The feature request would be to add a max number of log files that cronicle can hold at any time and add an option for each job to not write logs (maybe not write logs if file size exceeds X kb?).
This current configuration completely eats away at the available inodes if filesystem is used as storage.

@jhuckaby
Copy link
Owner

jhuckaby commented Mar 3, 2025

It sounds like you're running a LOT of jobs, or you have a VERY small amount of INODEs on your disk. Either way, I highly suggest you decrease this configuration parameter:

https://github.com/jhuckaby/Cronicle/blob/master/docs/Configuration.md#job_data_expire_days

Set it very low, like 30 days or even lower, if you don't need long data retention for your jobs.

However, note that this parameter does not retroactively affect existing jobs, only new ones, so you may need to do an export, wipe (delete), then import:

https://github.com/jhuckaby/Cronicle/blob/master/docs/CommandLine.md#data-import-and-export

The data export / import does not include historical job data.

I'd highly recommend you do this anyway, because running out of disk space (or INODES) leaves Cronicle's "database" in an indeterminate / corrupted state.

Also, for a high job volume setup, please consider using something other than the local Filesystem. Cronicle can use S3 or any S3-compatible service (MinIO, etc.).

Good luck, and I'm very sorry you ran into this.

@Sederfo
Copy link
Author

Sederfo commented Mar 4, 2025

I ended up removing files from /data and I messed up the instance badly haha, but I reinstalled it fresh. Indeed, we are running a lot of jobs, at least 5 every 5 minutes and more will come, even at 2 minutes interval. I set it up with couchbase now, do you think it's a reliable solution for storing the logs? From what I have researched, couchbase would not eat up the inodes as badly as filesystem and the only constraint would be space on partition.
Thank you for the response and thank you for taking your time to write this amazing tool!

@jhuckaby
Copy link
Owner

jhuckaby commented Mar 4, 2025

Couchbase is very reliable in my experience. Just note that they have a 20 MB limit on object size (or at least they did the last time I checked). So make sure your job logs are smaller than 20 MB each 😊

@tkimmm
Copy link

tkimmm commented Mar 13, 2025

Hi Joseph,

However, note that this parameter does not retroactively affect existing jobs, only new ones, so you may need to do an export, wipe (delete), then import:

Are there any documented steps for a wipe (delete)? We've also had a crash due to inodes and would like to reset to reduce the inode usage, but hesitant to touch the data folder.

There seems to be a lot of files also in _cleanup are these safe to remove?

Thanks for your continued efforts on this project

@tkimmm
Copy link

tkimmm commented Mar 13, 2025

I thought I would loop back as I ended up testing and running the process myself

  • Took an export of the Cronicle data using
    /opt/cronicle/bin/control.sh export /path/to/cronicle-data-backup.txt --verbose
  • I'm using FS storage so followed the migration instructions and pointed to a different folder in config.json
  • Stopped the cronicle server and ran /opt/cronicle/bin/control.sh migrate
  • Switched NewStorage key in the json as Storage, removed the old storage reference
  • Checked for functionality after the migrate, was working fine for me without any loss of data
  • Started manually removing files in the old data folder, the main culprit for us was the jobs folder with all the nested data, once I started removing data from this folder I saw my inode usage considerably decrease

@jhuckaby
Copy link
Owner

Very glad you were able to figure it out, and complete a successful migration!

Are there any documented steps for a wipe (delete)?

Not really, it's just rm -rf /opt/cronicle/data after you export all the essential data.

There seems to be a lot of files also in _cleanup are these safe to remove?

Those files are part of a database table that tracks expirations of the job logs, so it knows when to delete them all. I would not remove those directly, unless you are wiping everything.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants