launch command worker earlier #470

sni · 2024-07-04T13:51:06Z

since the command worker forks from the main naemon process, it inherits all open files like ex.: pidfile, logfiles, etc... It will keep those references open, even if the main process rotates and reopens those files.

This patch closes query handler and pid file references after starting the command worker and also moves starting the command worker before initializing the neb modules, so it won't inherit open logfiles from neb modules.

references:

naemon's logrotate leaves open file: livestatus.log.1 ConSol-Monitoring/omd#146

since the command worker forks the main naemon process, it inherits all open files like ex.: pidfile, logfiles, etc... It will keep those references open, even if the main process rotates and reopens those files. This patch closes query handler and pid file references after starting the command worker and also moves starting the command worker before initializing the neb modules, so it won't inherit open logfiles from neb modules. references: - ConSol-Monitoring/omd#146 Signed-off-by: Sven Nierlein <[email protected]>

nook24 · 2024-07-07T12:46:09Z

The only issue I see with this is that the command file worker completely loses the ability to communicate. Currently, the command file worker is able to write to the naemon.log until this line:

naemon-core/src/naemon/commands.c

Line 392 in 22f0fb6

free_memory(get_global_macros());

After that, it can only write to stdout to log any errors. Not optimal, but better than nothing, I guess.

Due to the change, the command file worker can't even write to stdout anymore, so any error messages are completely gone. The static int command_file_worker method makes some calls to nm_log, which now get completely blackholed.

This change also introduces a sort of "breaking change". Currently, whenever Naemon logs the message
Successfully launched command file worker with pid 12135, we know that Naemon has done all initial processing and is up and running.

I personal only use this when working or debugging Naemon directly, but it turns out that some scripts rely on this. For example, this one which is part of our repository:

naemon-core/features/steps/naemon.py

Lines 138 to 155 in 22f0fb6

    
           def naemon_started_and_ready(context, timeout_s): 
        
               timeout_s = int(timeout_s or 30) 
        
               timeout = time.monotonic() + timeout_s 
        
               ready = False 
        
               while not ready and time.monotonic() < timeout: 
        
                   try: 
        
                       log = slurp_file('naemon.log') 
        
                       # When we see this line in the log, we'll wait 1 more second and 
        
                       # then Naemon should be ready, with signal handlers setup so that a 
        
                       # test can SIGTERM it. 
        
                       if 'Successfully launched command file worker with pid' in log: 
        
                           ready = True 
        
                           time.sleep(1) 
        
                           break 
        
                   except OSError: 
        
                       pass 
        
                   time.sleep(1) 
        
               assert ready, "Naemon did not start within %d seconds" % timeout_s

I think we should add a new message to the log that clearly indicates that Naemon is ready. Something like

Naemon initialization complete. Monitoring initiated.

sni · 2024-07-07T16:34:10Z

i am not to happy yet either, maybe we keep the order as it was but send a USR1 signal to the command worker so it can rotate/reopen the logfile as well.

sni · 2024-07-08T08:13:19Z

btw, the message Successfully launched command file worker with is logged from the main naemon process and should still be printed.

nook24 · 2024-07-08T08:40:59Z

maybe we keep the order as it was but send a USR1 signal to the command worker so it can rotate/reopen the logfile as well.

Currently the command file worker can only write to stdout. I guess this is due to it's a fork() and could otherwise break the log file if multiple processes wring to the log at the same time.

I think this is also the reason why the nm_core_worker is using printf instead of nm_log

btw, the message Successfully launched command file worker with is logged from the main naemon process and should still be printed.

Yes, this message gets still logged as before, but due due it is now before the broker modules get loaded, it does not indicate that Naemon is ready anymore.
To be fair, this message should never indicate that Naemon is ready so I think we should not be too concerned about this. But I would still recommend to add a new log message to make it possible to detect if Naemon is ready.

sni mentioned this pull request Jul 4, 2024

naemon's logrotate leaves open file: livestatus.log.1 ConSol-Monitoring/omd#146

Open

sni force-pushed the launch_command_worker_earlier branch from ddc4d76 to e842ca7 Compare July 5, 2024 09:09

sni added the WIP label Jul 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

launch command worker earlier #470

launch command worker earlier #470

sni commented Jul 4, 2024

nook24 commented Jul 7, 2024

sni commented Jul 7, 2024

sni commented Jul 8, 2024

nook24 commented Jul 8, 2024

launch command worker earlier #470

Are you sure you want to change the base?

launch command worker earlier #470

Conversation

sni commented Jul 4, 2024

nook24 commented Jul 7, 2024

sni commented Jul 7, 2024

sni commented Jul 8, 2024

nook24 commented Jul 8, 2024