[Bug]: Cronicle service not starting after it has stopped #877

Sederfo · 2025-03-03T08:59:41Z

Is there an existing issue for this?

I have searched the existing issues

What happened?

Cronicle service failed on 2 days ago:

# systemctl status cronicle
× cronicle.service - Node Cronicle
     Loaded: loaded (/etc/systemd/system/cronicle.service; enabled; vendor preset: enabled)
     Active: failed (Result: exit-code) since Sat 2025-03-01 00:00:07 CET; 2 days ago
   Main PID: 1759 (code=exited, status=7)
        CPU: 10h 37min 3.109s

Mar 01 00:00:07 server systemd[1]: cronicle.service: Main process exited, code=exited, status=7/NOTRUNNING

I cannot start it at all:

[root@server:/opt/cronicle/bin$]
# ./control.sh status
status: Cronicle Server not running (pid 2891013?)
[root@server:/opt/cronicle/bin$]
# ./control.sh start
./control.sh start: Starting up Cronicle Server...
./control.sh start: Cronicle Server started
[root@server:/opt/cronicle/bin$]
# ./control.sh status
status: Cronicle Server not running (pid 2891862?)

this was the error i got using:

# /opt/cronicle/bin/debug.sh
[1740989556.66][2025-03-03 09:12:36][server][2891955][Storage][debug][1][Beginning database recovery, see logs/recovery.log for details][]
Error: ENOSPC: no space left on device, open 'logs/recovery.log'
    at Object.writeFileSync (node:fs:2367:20)
    at Object.appendFileSync (node:fs:2448:6)
    at Logger.print (/opt/cronicle/node_modules/pixl-logger/logger.js:242:26)
    at constructor.logDebug (/opt/cronicle/node_modules/pixl-server/component.js:71:16)
    at /opt/cronicle/node_modules/pixl-server-storage/transaction.js:323:9
    at FSReqCallback.oncomplete (node:fs:187:23) {
  errno: -28,
  code: 'ENOSPC',
  syscall: 'open',
  path: 'logs/recovery.log'
}

the file it tried to open was logs/Storage.log and it was really small (77kb) but even nano could not open it... it gave me the same error.

Space is not a problem, we have enough space free on disk. What may be the issue? The service stopped on saturday and we lost a lot of important logs and actions.

Operating System

Ubuntu 22.04

Node.js Version

v20.18.0

Cronicle Version

0.9.71

Server Setup

Single Server

Storage Setup

Local Filesystem

Relevant log output

# /opt/cronicle/bin/debug.sh
[1740989556.66][2025-03-03 09:12:36][server][2891955][Storage][debug][1][Beginning database recovery, see logs/recovery.log for details][]
Error: ENOSPC: no space left on device, open 'logs/recovery.log'
    at Object.writeFileSync (node:fs:2367:20)
    at Object.appendFileSync (node:fs:2448:6)
    at Logger.print (/opt/cronicle/node_modules/pixl-logger/logger.js:242:26)
    at constructor.logDebug (/opt/cronicle/node_modules/pixl-server/component.js:71:16)
    at /opt/cronicle/node_modules/pixl-server-storage/transaction.js:323:9
    at FSReqCallback.oncomplete (node:fs:187:23) {
  errno: -28,
  code: 'ENOSPC',
  syscall: 'open',
  path: 'logs/recovery.log'
}



job logs at time of failure:
node:events:510
    throw err; // Unhandled 'error' event
    ^

Error [ERR_UNHANDLED_ERROR]: Unhandled error. ('Error in output stream: write EPIPE')
    at __construct.emit (node:events:508:17)
    at Socket.<anonymous> (/opt/cronicle/node_modules/pixl-json-stream/json-stream.js:99:10)
    at Socket.emit (node:events:519:28)
    at emitErrorNT (node:internal/streams/destroy:169:8)
    at emitErrorCloseNT (node:internal/streams/destroy:128:3)
    at process.processTicksAndRejections (node:internal/process/task_queues:82:21) {
  code: 'ERR_UNHANDLED_ERROR',
  context: 'Error in output stream: write EPIPE'
}

Node.js v20.18.0

Code of Conduct

I agree to follow this project's Code of Conduct

The text was updated successfully, but these errors were encountered:

Sederfo · 2025-03-03T09:44:35Z

After looking at previous issues, I have found a similar issue to mine. Apologies for the duplicate. #570

I still would need some assistance on this matter though..

as you can see, it's trying to write to a file that gives nano an error message.

this is the output (last 2 entries) of debug.sh

[1740994458.568][2025-03-03 10:34:18][fr10204vmx][2926429][crash][debug][1][Emergency shutdown: Could not rollback transaction: _cleanup/2025/04/02: Failed to restore record: _cleanup/2025/04/02/0: Failed to write file: _cleanup/2025/04/02/0: data/_temp/fe4dc2dc998e9c47295cf9bf980b1761.json.tmp.1: ENOSPC: no space left on device, open 'data/_temp/fe4dc2dc998e9c47295cf9bf980b1761.json.tmp.1'][]
[1740994458.568][2025-03-03 10:34:18][fr10204vmx][2926429][Storage][debug][1][Exiting][]

Could we maybe convert this issue from a bug to a feature request? The feature request would be to add a max number of log files that cronicle can hold at any time and add an option for each job to not write logs (maybe not write logs if file size exceeds X kb?).
This current configuration completely eats away at the available inodes if filesystem is used as storage.

jhuckaby · 2025-03-03T16:49:58Z

It sounds like you're running a LOT of jobs, or you have a VERY small amount of INODEs on your disk. Either way, I highly suggest you decrease this configuration parameter:

https://github.com/jhuckaby/Cronicle/blob/master/docs/Configuration.md#job_data_expire_days

Set it very low, like 30 days or even lower, if you don't need long data retention for your jobs.

However, note that this parameter does not retroactively affect existing jobs, only new ones, so you may need to do an export, wipe (delete), then import:

https://github.com/jhuckaby/Cronicle/blob/master/docs/CommandLine.md#data-import-and-export

The data export / import does not include historical job data.

I'd highly recommend you do this anyway, because running out of disk space (or INODES) leaves Cronicle's "database" in an indeterminate / corrupted state.

Also, for a high job volume setup, please consider using something other than the local Filesystem. Cronicle can use S3 or any S3-compatible service (MinIO, etc.).

Good luck, and I'm very sorry you ran into this.

Sederfo · 2025-03-04T08:50:46Z

I ended up removing files from /data and I messed up the instance badly haha, but I reinstalled it fresh. Indeed, we are running a lot of jobs, at least 5 every 5 minutes and more will come, even at 2 minutes interval. I set it up with couchbase now, do you think it's a reliable solution for storing the logs? From what I have researched, couchbase would not eat up the inodes as badly as filesystem and the only constraint would be space on partition.
Thank you for the response and thank you for taking your time to write this amazing tool!

jhuckaby · 2025-03-04T17:04:49Z

Couchbase is very reliable in my experience. Just note that they have a 20 MB limit on object size (or at least they did the last time I checked). So make sure your job logs are smaller than 20 MB each 😊

tkimmm · 2025-03-13T08:53:20Z

Hi Joseph,

However, note that this parameter does not retroactively affect existing jobs, only new ones, so you may need to do an export, wipe (delete), then import:

Are there any documented steps for a wipe (delete)? We've also had a crash due to inodes and would like to reset to reduce the inode usage, but hesitant to touch the data folder.

There seems to be a lot of files also in _cleanup are these safe to remove?

Thanks for your continued efforts on this project

tkimmm · 2025-03-13T13:10:38Z

I thought I would loop back as I ended up testing and running the process myself

Took an export of the Cronicle data using
/opt/cronicle/bin/control.sh export /path/to/cronicle-data-backup.txt --verbose
I'm using FS storage so followed the migration instructions and pointed to a different folder in config.json
Stopped the cronicle server and ran /opt/cronicle/bin/control.sh migrate
Switched NewStorage key in the json as Storage, removed the old storage reference
Checked for functionality after the migrate, was working fine for me without any loss of data
Started manually removing files in the old data folder, the main culprit for us was the jobs folder with all the nested data, once I started removing data from this folder I saw my inode usage considerably decrease

jhuckaby · 2025-03-13T16:26:46Z

Very glad you were able to figure it out, and complete a successful migration!

Are there any documented steps for a wipe (delete)?

Not really, it's just rm -rf /opt/cronicle/data after you export all the essential data.

There seems to be a lot of files also in _cleanup are these safe to remove?

Those files are part of a database table that tracks expirations of the job logs, so it knows when to delete them all. I would not remove those directly, unless you are wiping everything.

Sederfo added the bug label Mar 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Cronicle service not starting after it has stopped #877

[Bug]: Cronicle service not starting after it has stopped #877

Sederfo commented Mar 3, 2025 •

edited

Loading

Sederfo commented Mar 3, 2025

jhuckaby commented Mar 3, 2025

Sederfo commented Mar 4, 2025

jhuckaby commented Mar 4, 2025

tkimmm commented Mar 13, 2025

tkimmm commented Mar 13, 2025

jhuckaby commented Mar 13, 2025

[Bug]: Cronicle service not starting after it has stopped #877

[Bug]: Cronicle service not starting after it has stopped #877

Comments

Sederfo commented Mar 3, 2025 • edited Loading

Is there an existing issue for this?

What happened?

Operating System

Node.js Version

Cronicle Version

Server Setup

Storage Setup

Relevant log output

Code of Conduct

Sederfo commented Mar 3, 2025

jhuckaby commented Mar 3, 2025

Sederfo commented Mar 4, 2025

jhuckaby commented Mar 4, 2025

tkimmm commented Mar 13, 2025

tkimmm commented Mar 13, 2025

jhuckaby commented Mar 13, 2025

Sederfo commented Mar 3, 2025 •

edited

Loading