Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Smart Remove behavior to really "keep one snapshot per week for n weeks" #1819

Closed
wants to merge 7 commits into from

Conversation

emtiu
Copy link
Member

@emtiu emtiu commented Jul 29, 2024

Smart Remove doesn't actually "keep one snapshot per week for the last n weeks", as the setting indicates. Instead, it only works for n–1 weeks.

The deeper reason is that at one point, the Smart Remove code considers a "week" to span 8 days:

backintime/common/snapshots.py

Lines 1689 to 1697 in 22f468c

for i in range(0, keep_one_per_week):
keep |= self.smartRemoveKeepFirst(
snapshots,
d,
d + datetime.timedelta(days=8),
keep_healthy=True)
d -= datetime.timedelta(days=7)

This leads to the loss of the oldest snapshot in the last of the weeks to be considered. For details, see the discussion in #1094.

Git blame shows that this code has been unchanged since the "keep one snapshot per week for the last n weeks" function was first introduced in 2010 (version 1.0.6). It has probably had this bug since the beginning.

While here, improve debug logging of some Smart Remove functions to print SIDs .withoutTag, because they represent date boundaries only (and the randomly assigned tags contain no information related to any real snapshots). Most of the Smart Remove code already does this.

Fixes #1094.

@emtiu emtiu changed the title Smartremove Fix Smart Remove behavior to really "keep one snapshot per week for n weeks" Jul 29, 2024
@emtiu
Copy link
Member Author

emtiu commented Jul 29, 2024

Okay, I see the failing unittest. I need to wrap my head around that :)

@buhtz
Copy link
Member

buhtz commented Jul 30, 2024

Very cool. Also won't block this because of missing unit tests. I can take the refactoring and improving of the unit tests in a later step.

Okay, I see the failing unittest. I need to wrap my head around that :)

Must not be because of the Python version. To be sure please edit the .travis.yml file and uncomment the exclude: section to enable all Python versions.

@buhtz buhtz added this to the Upcoming release milestone Jul 30, 2024
@emtiu emtiu marked this pull request as draft July 30, 2024 11:28
@emtiu
Copy link
Member Author

emtiu commented Jul 30, 2024

I will also need to consider the impact of this change to existing archives with many snapshots.

At first glance, no additional snapshots should be deleted as long as the duration considered by smartRemoveKeepFirst is shortened instead of lengthened. But I should do some tests to make sure.

@buhtz
Copy link
Member

buhtz commented Sep 16, 2024

FYI: There is a new related issue #1874 . It is about smart remove with in leap years having a snapshot at 29th february.

@buhtz buhtz changed the title Fix Smart Remove behavior to really "keep one snapshot per week for n weeks" fix: Smart Remove behavior to really "keep one snapshot per week for n weeks" Nov 19, 2024
@buhtz
Copy link
Member

buhtz commented Nov 24, 2024

Hello,
I just want to inform. I tried to investigate and understand the related parts of that code base. I created some unit tests to support my investigation. My intention was also to provide you with some unit test to support your PR: But I am not finished yet and I still lack of understanding. The behavior is not predictable for me.

Can be found in draft PR #1944

@buhtz
Copy link
Member

buhtz commented Nov 26, 2024

It might be that the reported issue #1094 and your PR do focus on something on something else than I do report now.

I was able to reproduce a problem. Based on that I tend to say that your PR won't fix that problem.

Steps to reproduce:

  • Create 4 snapshots on Saturdays: 2nd, 9th, 16th and 23th November 2024
  • Keep one snapshots per week for the last 3 weeks
  • Run auto/smart remove on Thusday date 26th November 2024

Expected:
Keep the following snapshots: 9th, 16th and 23th November 2024

Real/problematic behavior:
Only 16th and 23th is kept. The 9th is lost and not kept.

I added your fix but the problematic behavior did not change.

@buhtz
Copy link
Member

buhtz commented Nov 26, 2024

Short: This PR is ready merge by Michael.

Long:
I try to summarize to make this PR able to merge.

  • The "issue" Weekly snapshots not being retained #1094 lacks of details and it is unclear what the problem is. So I tread it as "invalid".
  • Michael's PR here really fix a minor bug. I would like to keep it as it is and merge it.

There is much more to discuss and decide in context of auto/smart remove. I will open a new meta issue for this.

About the reproducible example I described myself:

  • Create 4 snapshots on Saturdays: 2nd, 9th, 16th and 23th November 2024
  • Keep one snapshots per week for the last 3 weeks
  • Run auto/smart remove on Tuesday date 26th November 2024

What happens:
Only 16th and 23th is kept. The 9th is lost and not kept.

Looking into the code that behavior seems to be intended by the original author. The algorithm starts in the current running (not yet complete) week, not the previous week. In my example there is no backup in the current week, so nothing to keep, but one iteration done. The next two iterations result in two backups to keep. That it. 3 iterations but only 2 backups to keep because in the first iteration (the current week) there was no backup available.

Of course that behavior should be discussed. But not in this PR. Because all the other smart remove rules are affected. The entire system must be thoroughly researched, documented, and then rethought. If necessary, the behavior or wording in the GUI must be adjusted accordingly.

@buhtz buhtz marked this pull request as ready for review December 16, 2024 14:30
@@ -1683,7 +1683,7 @@ def smartRemoveList(self,
keep |= self.smartRemoveKeepFirst(
snapshots,
d,
d + datetime.timedelta(days=8),
d + datetime.timedelta(days=7),
Copy link
Member

@buhtz buhtz Dec 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After reviewing the original code (especially smartRemoveKeepFirst()) and its behavior I am quit sure this wont work as intended. The two parameters in smartRemoveKeepFirst(), min_date and max_date are used like this:

d >= min_date and d < max_date

Because of the latest operator < a range of 8 is IMHO fine.

I will improve the test situation in near future. But I also have to do some refactoring first. So this topic won't be lost.

@buhtz buhtz added the PR: Unlikely to merge PR is unlikely to be merged due to project goals, technical limitations, or other constraints. label Dec 16, 2024
@buhtz buhtz marked this pull request as draft December 16, 2024 14:43
@buhtz
Copy link
Member

buhtz commented Dec 29, 2024

Closing this based on the comment above.

@buhtz buhtz closed this Dec 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
PR: Unlikely to merge PR is unlikely to be merged due to project goals, technical limitations, or other constraints.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Weekly snapshots not being retained
2 participants