Send an email when a file transfer fails #1353

andreleblanc11 · 2024-12-20T16:00:02Z

We have a client that would like to have an email be sent when a transfer fails.

I've been thinking this might be possible with message reports.

the report messages have an added field "report" ... and if the original messages contained the data, that is removed to... (no embedded content.) so it's mostly the same as a normal message.

 "report" { "code": 999  - HTTP style response code.
                   "timeCompleted": "YYYYMMDDTHHMMSS.ss" - UTC date/timestamp.
                   "message" :  - status report message documented in `Report Messages`_
                 }

I'm thinking if we can have the reported message include the transfer error and feed that message to an email sender that this might work. I haven't checked the code to confirm this or not.

I'm not sure of another way to do this. Possibly another option could be to introduce a new flowCB entry point?

This could also be good for our team to have this implemented if critical data feeds start having transfer problems. Could send an email to NetOps.

The text was updated successfully, but these errors were encountered:

andreleblanc11 · 2024-12-20T16:06:15Z

This could also be good for our team to have this implemented if critical data feeds start having transfer problems. Could send an email to NetOps.

#1350 should also already help in a similar way.

petersilva · 2024-12-20T19:04:13Z

We don't need reports or any special entry_points ... there are multiple worklists for exacly this purpose...

worklist.incoming ... messages received but not transferred.
worklist.rejected ... messages for which transfers will not be attempted.
worklist.ok ... messages for files which were successfully transferred.
worklist.failed ... messages for files where the transfers failed.

when a send fails, the corresponding message should be in worklist.failed.
you can write an after_work plugin that sends an email for each message in that worklist.

The original message and all it's fields are available at that point.
It will only generate the mail after it has tried 3 times (based on attempts setting.)

but that message will go into the retry queue, and a be retried five minutes later... so if it fails again, there will be an email every five minutes ... for about 3 days (based on default settings.)

andreleblanc11 · 2024-12-20T19:11:19Z

I didn't even think about the fact that worklist.failed stuff will go through an after_work entry point. That's definitely a better option then what I was thinking.

andreleblanc11 · 2024-12-20T19:51:16Z

To avoid multiple emails being sent, I think we could probably leverage msg_get_from_file in Diskqueue.py to check if the file message is already in the retry queue or not.

If we also add callback_prepend work.my-plugin in the config, I think we should be able to run the plugin before the retry queue gets appended.

petersilva · 2024-12-20T22:33:50Z

when I said the message and "all it's fields" ... that includes the "report" field you mentioned... so that could be leveraged in writing the mail message.

andreleblanc11 · 2024-12-27T21:24:33Z

I was able to get an email to send when a transfer failed in my test plugin.

However, I had to work around the sendTo option to do this. Both the email sender and a regular sender use sendTo.

This is what I did to work around the problem (in the __init__).

        self.o.add_option('email_server', 'str', default_value='')
        # Hacky way of having a correct mail server
        self.sendTo = self.o.sendTo
        self.o.sendTo = self.o.email_server

        self.email = sarracenia.flowcb.send.email.Email(self.o)
        self.o.sendTo = self.sendTo

A work around in the email plugin could be to have a new option that uses self.o.sendTo as a default.

self.o.add_option('email_server', 'str', default_value=f"{self.o.sendTo}")

andreleblanc11 · 2024-12-30T15:26:34Z

I've been trying to integrate the diskqueue in the plugin (to avoid multiple sendings of emails) and have gotten unsatisfactory results.

In the housekeeping, before files get retried, I'm not able to find the diskqueue file within the configs cache directory. This is what is seen from the running process.

# Housekeeping runs, gets the files from the diskqueue
2024-12-30 14:38:25,561 [DEBUG] sarracenia.diskqueue on_housekeeping work_retry_01 on_housekeeping, 0 msgs in queue file, 1 in new file
2024-12-30 14:38:25,563 [DEBUG] sarracenia.diskqueue on_housekeeping has queue False
2024-12-30 14:38:25,564 [DEBUG] sarracenia.diskqueue msg_get_from_file DEBUG /net/local/home/leblanca/.cache/sr3/sender/test_email_on_failure/diskqueue_work
_retry_01.new open read
2024-12-30 14:38:25,565 [DEBUG] sarracenia.diskqueue on_housekeeping retrieved 1 from the 1 retry
2024-12-30 14:38:25,566 [INFO] sarracenia.diskqueue on_housekeeping work_retry_01 Number of messages in retry list 1
2024-12-30 14:38:25,567 [DEBUG] sarracenia.diskqueue on_housekeeping on_housekeeping elapse 0.004850
2024-12-30 14:38:25,567 [DEBUG] sarracenia.diskqueue on_housekeeping post_retry_001 on_housekeeping, 0 msgs in queue file, 0 in new file
2024-12-30 14:38:25,568 [DEBUG] sarracenia.diskqueue on_housekeeping has queue False
2024-12-30 14:38:25,568 [DEBUG] sarracenia.diskqueue msg_get_from_file DEBUG /net/local/home/leblanca/.cache/sr3/sender/test_email_on_failure/diskqueue_post
_retry_001.new open read
2024-12-30 14:38:25,569 [DEBUG] sarracenia.diskqueue on_housekeeping retrieved 0 from the 0 retry
2024-12-30 14:38:25,569 [DEBUG] sarracenia.diskqueue on_housekeeping post_retry_001 No retry in list
2024-12-30 14:38:25,570 [DEBUG] sarracenia.diskqueue on_housekeeping on_housekeeping elapse 0.003351

# The retry files are now in diskqueue_work_retry_01
2024-12-30 14:38:39,919 [DEBUG] sarracenia.diskqueue msg_get_from_file DEBUG /net/local/home/leblanca/.cache/sr3/sender/test_email_on_failure/diskqueue_work_retry_01 open read


# It retries to send the file.
# When my after_work plugin is called, the retry file doesn't exist, so we can't compare with what we are trying to filter out.
2024-12-30 14:38:25,580 [DEBUG] work.send_email_on_failure after_work Checking if message in retry queue
# os.listdir of the configs' cache directory
2024-12-30 14:38:25,580 [CRITICAL] work.send_email_on_failure after_work Files ['subscriptions.json', 'sender_test_email_on_failure_01.pid', 'sender.test_email_on_failure.tfeed.qname']
# Checking if the file pointer exists. It doesn't because we can't find the file.
2024-12-30 14:38:25,580 [DEBUG] work.send_email_on_failure after_work FP : None . queue file 
/net/local/home/leblanca/.cache/sr3/sender/test_email_on_failure/diskqueue_work_retry_01

I checked back the diskqueue logic, and see that when a get is made after the housekeeping runs, the retry file gets discarded.
😢

sarracenia/sarracenia/diskqueue.py

Lines 271 to 279 in 5c78ad8

    
           # after getting the last message from the file, close it 
        
           if self.msg_count == 0: 
        
               try: 
        
                   os.unlink(self.queue_file) 
        
               except: 
        
                   pass 
        
               self.queue_fp = None 
        
           return ml

andreleblanc11 · 2024-12-30T15:42:47Z

A band-aid fix work around for this is to append the retried messages to a list, and check if they exist within the list every time the plugin is called.

This is not a good long-term work around though. There's no way to clear the list if the retried file gets eventually sent. If the process also gets restarted, the emails will get sent again.

(in the __init__)
       self.accumulated_msgs = []

(in the after_work)
            if msg in self.accumulated_msgs: continue
            else: self.accumulated_msgs.append(msg)

petersilva · 2024-12-30T16:43:58Z

I might not understand what you are trying to do. If you want to prevent retries after you have sent the email... all you need to do is remove the messages from the worklist.failed.
That should be it.

so the loop should be something like:

to_mail=worklist.failed
worklist.failed=[]

for m in to_mail:
     whatever the mail logic is.

petersilva · 2024-12-30T16:50:18Z

Do you want to suppress retrying of sending the file,... or just prevent multiple emails (but keep retrying so it gets sent eventually.)

I guess I the problem here is that you are trying to use the unmodified email sender... sounded like a great idea at first... but it probably doesn't quite match (need to do different things with the worklists vs. the built-in email send thing. ...

I think you might need a custom callback that re-implements mail logic.

andreleblanc11 · 2024-12-30T17:16:17Z

I might not understand what you are trying to do. If you want to prevent retries after you have sent the email... all you need to do is remove the messages from the worklist.failed.

That won't work because we want the file to keep retrying. The email sent would just be to notify the client that a transfer failed, that's why we would only want to send it once. We don't want to spam the client, but we want to try to resend the file normally.

petersilva · 2024-12-30T17:51:09Z

OK then look at the fields in the message... I think there is a field set when a message is a retry... something like msg['retry'] or msg['isRetry'] and you don't send the mail if that field is set.

andreleblanc11 · 2024-12-30T18:18:57Z

There is no retry field available in the message when it retries, even with report True set.

{'_format': 'v02', '_deleteOnPost': {'exchange', 'new_dir', 'new_baseUrl', '_mask_index', 'post_format', '_format', 'local_offset', 'topic', 'new_subtopic', 'new_file', 'new_relPath', 'subtopic'}, 'to_clusters': 'ALL', 'mtime': '20241108T204642.551517248', 'atime': '20241108T204802.322180033', 'mode': '755', 'pubTime': '20241230T143100.155074835', 'baseUrl': 'file:', 'relPath': '/net/local/home/leblanca/kill_orphaned_children.sh', 'subtopic': ['net', 'local', 'home', 'leblanca'], 'identity': {'method': 'md5', 'value': 'HHIlcIoBRG1Y5GYS4unzgw=='}, 'size': 3270, 'exchange': 'xs_tfeed', 'source': 'tsource', 'topic': 'v02.post.net.local.home.leblanca', 'local_offset': 0, '_mask_index': 0, 'new_dir': '/net/local/home/leblanca/test', 'new_file': 'kill_orphaned_children.sh', 'post_format': 'v03', 'new_baseUrl': 'file:', 'new_relPath': 'net/local/home/leblanca/test/kill_orphaned_children.sh', 'new_subtopic': ['net', 'local', 'home', 'leblanca', 'test'], 'contentType': 'text/x-shellscript'}

However, we can still add the field manually in the message. Adding the below works in my plugin.

# When checking if the field is there

            # We don't want to resend emails for messages that already have passed
            if 'isRetry' in msg:
                if msg['isRetry']: continue

# When setting the field

            # The message will get retried. Add a field in the message so that we can check for future occurences
            msg['isRetry'] = True
            # We still want to delete the field if ever it posts
            msg['_deleteOnPost'] |= set(['isRetry'])

andreleblanc11 added enhancement New feature or request NewUseCase needed to address a use case, we can't yet support. UserStory interesting to read to consider improving Sugar nice to have features... not super important. labels Dec 20, 2024

andreleblanc11 added the likely-fixed likely fix is in the repository, success not confirmed yet. label Jan 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Send an email when a file transfer fails #1353

Send an email when a file transfer fails #1353

andreleblanc11 commented Dec 20, 2024

andreleblanc11 commented Dec 20, 2024

petersilva commented Dec 20, 2024 •

edited

Loading

andreleblanc11 commented Dec 20, 2024 •

edited

Loading

andreleblanc11 commented Dec 20, 2024

petersilva commented Dec 20, 2024

andreleblanc11 commented Dec 27, 2024 •

edited

Loading

andreleblanc11 commented Dec 30, 2024

andreleblanc11 commented Dec 30, 2024 •

edited

Loading

petersilva commented Dec 30, 2024 •

edited

Loading

petersilva commented Dec 30, 2024

andreleblanc11 commented Dec 30, 2024 •

edited

Loading

petersilva commented Dec 30, 2024

andreleblanc11 commented Dec 30, 2024 •

edited

Loading

Send an email when a file transfer fails #1353

Send an email when a file transfer fails #1353

Comments

andreleblanc11 commented Dec 20, 2024

andreleblanc11 commented Dec 20, 2024

petersilva commented Dec 20, 2024 • edited Loading

andreleblanc11 commented Dec 20, 2024 • edited Loading

andreleblanc11 commented Dec 20, 2024

petersilva commented Dec 20, 2024

andreleblanc11 commented Dec 27, 2024 • edited Loading

andreleblanc11 commented Dec 30, 2024

andreleblanc11 commented Dec 30, 2024 • edited Loading

petersilva commented Dec 30, 2024 • edited Loading

petersilva commented Dec 30, 2024

andreleblanc11 commented Dec 30, 2024 • edited Loading

petersilva commented Dec 30, 2024

andreleblanc11 commented Dec 30, 2024 • edited Loading

petersilva commented Dec 20, 2024 •

edited

Loading

andreleblanc11 commented Dec 20, 2024 •

edited

Loading

andreleblanc11 commented Dec 27, 2024 •

edited

Loading

andreleblanc11 commented Dec 30, 2024 •

edited

Loading

petersilva commented Dec 30, 2024 •

edited

Loading

andreleblanc11 commented Dec 30, 2024 •

edited

Loading

andreleblanc11 commented Dec 30, 2024 •

edited

Loading