Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ami,gce]: reduce snapshot size to minimum allowed #491

Closed
wants to merge 3 commits into from

Conversation

yaronkaikov
Copy link
Collaborator

@yaronkaikov yaronkaikov commented Dec 5, 2023

Every AMI we create today holds a 30Gb EBS snapshot, if we take into account the fact that we copy those images to other regions and have multiple images (dev, debug, releases) it is adding up to our cost.

This commit reduces the size of the snapshot to 8Gb (it's the minimum value allowed for a snapshot)

In addition, it's reducing also the time for the build process from 30 minutes to ~12 minutes (which is also good)

Refs: https://github.com/scylladb/scylla-pkg/issues/3712

@yaronkaikov yaronkaikov force-pushed the reduce-ami-snapshot branch 2 times, most recently from 861e66e to a8b5b64 Compare December 5, 2023 08:59
@yaronkaikov
Copy link
Collaborator Author

fixed typos in commit message :-)

@benipeled
Copy link
Contributor

This change might cause disk-space issues due to grow-log-files etc. on the OS level,
Please make sure this change is fully verified with QA tests (both tier1 and tier2)
Other than that, lgtm

@yaronkaikov
Copy link
Collaborator Author

yaronkaikov commented Dec 5, 2023

This change might cause disk-space issues due to grow-log-files etc. on the OS level, Please make sure this change is fully verified with QA tests (both tier1 and tier2) Other than that, lgtm

Running https://jenkins.scylladb.com/job/scylla-master/job/releng-testing/job/longevity/job/longevity-100gb-4h-test/32/console, the log shows that the instance is with 30Gb root disk

12:11:00  root_disk_size_db: 30

So i assume it's good, will wait for @fruch to review as well to make sure

@fruch
Copy link
Collaborator

fruch commented Dec 5, 2023

what about the swap ? isn't that create on build time ? or it's not anymore ?

@fruch
Copy link
Collaborator

fruch commented Dec 5, 2023

Also, I think SCT would be o.k. with that, but places like cloudformation and such, might have usability issue and should extend to pick bigger root disks.

@@ -4,40 +4,6 @@
"name": "aws",
"type": "amazon-ebs",
"access_key": "{{user `access_key`}}",
"ami_block_device_mappings": [
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how this is related to the snapshot size ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not been used. so I removed it. I can update the commit message or split it to different commit

@fruch
Copy link
Collaborator

fruch commented Dec 5, 2023

Also what about GCP/AZure ? there we have smaller disks ?

@yaronkaikov
Copy link
Collaborator Author

what about the swap ? isn't that create on build time ? or it's not anymore ?

Swap creation including the calculation is been handled in https://github.com/scylladb/scylla-machine-image/blob/next/packer/scylla_install_image

@yaronkaikov
Copy link
Collaborator Author

Also what about GCP/AZure ? there we have smaller disks ?

No, they still using 30Gb. I will adjust it in a follow up patch

@yaronkaikov
Copy link
Collaborator Author

yaronkaikov commented Dec 5, 2023

Also, I think SCT would be o.k. with that, but places like cloudformation and such, might have usability issue and should extend to pick bigger root disks.

Not sure if anyone is using the cloudformation, and even if someone is using it, shouldn't it be for POC or testing/evaluating Scylla?

@yaronkaikov yaronkaikov requested a review from fruch December 5, 2023 11:24
@yaronkaikov
Copy link
Collaborator Author

@fruch Added another commit with the clenaup

@yaronkaikov
Copy link
Collaborator Author

yaronkaikov commented Dec 5, 2023

Also what about GCP/AZure ? there we have smaller disks ?

No, they still using 30Gb. I will adjust it in a follow up patch

Only GCE has this configuration, trying to do it in the PR to see if it's working

@yaronkaikov
Copy link
Collaborator Author

@yaronkaikov yaronkaikov changed the title [ami]: reduce snapshot size to minimum allowed [ami,gce]: reduce snapshot size to minimum allowed Dec 5, 2023
@fruch
Copy link
Collaborator

fruch commented Dec 5, 2023

what about the swap ? isn't that create on build time ? or it's not anymore ?

Swap creation including the calculation is been handled in https://github.com/scylladb/scylla-machine-image/blob/next/packer/scylla_install_image

that was the reason for enlarging the root disk in the fist place, so we can create the swap upfront.
if we don't do that anymore, should be an issue to resize it back to minimum size

@syuu1228
Copy link
Contributor

syuu1228 commented Dec 6, 2023

Every AMI we create today holds a 30Gb EBS snapshot, if we take into account the fact that we copy those images to other regions and have multiple images (dev, debug, releases) it is adding up to our cost.

This commit reduces the size of the snapshot to 8Gb (it's the minimum value allowed for a snapshot)

Why we incleased rootfs size from 10GB to 30GB was because there was not enough disk space for the swapfile (c22807f).
Reducing image size may work, but our setup script automatically shrink swap size to 'half of diskfree'.

(Recommended swap size for Scylla is either total_mem/3 or 16GB - lower of the two, so 10GB of rootfs is almost always not enough)

I think we probably should move swapfile to /var/lib/scylla (data volume) instead of rootfs, then we will have enough space without enlarge rootfs .
We already doing similar thing in Azure, Azure has "Resouce Disk" which is the disk just for swapfile, so we allocate swapfile there.

@roydahan
Copy link
Contributor

roydahan commented Dec 6, 2023

We can have the AMI set to 10GB and in the documentation or anywhere it's used (terraform?) we ask to set with bigger disk.
However, this must be coordinated with product and in all other places that uses it.

syuu1228 added a commit to syuu1228/scylla-machine-image that referenced this pull request Dec 11, 2023
This reverts commit 069318b.

Related with scylladb#491, we need to reduce snapshot size of the rootfs, we
should not pre-allocate swapfile while building image.
syuu1228 added a commit to syuu1228/scylla-machine-image that referenced this pull request Dec 11, 2023
This reverts commit 069318b.

Related with scylladb#491, we need to reduce snapshot size of the rootfs, we
should not pre-allocate swapfile while building image.
syuu1228 added a commit to syuu1228/scylla-machine-image that referenced this pull request Dec 11, 2023
This reverts commit 069318b.

Related with scylladb#491, we need to reduce snapshot size of the rootfs, we
should not pre-allocate swapfile while building image.
syuu1228 added a commit to syuu1228/scylla-machine-image that referenced this pull request Dec 18, 2023
This reverts commit 069318b.

Related with scylladb#491, we need to reduce snapshot size of the rootfs, we
should not pre-allocate swapfile while building image.
yaronkaikov pushed a commit that referenced this pull request Dec 20, 2023
This reverts commit 069318b.

Related with #491, we need to reduce snapshot size of the rootfs, we
should not pre-allocate swapfile while building image.
@yaronkaikov
Copy link
Collaborator Author

@yaronkaikov
Copy link
Collaborator Author

We can have the AMI set to 10GB and in the documentation or anywhere it's used (terraform?) we ask to set with bigger disk. However, this must be coordinated with product and in all other places that uses it.

@roydahan When someone wants to set an instance based on our images, we must set the size of the root disk anyway. So I don't think we should have any impact on that.

@tzach Any reason not to reduce the size to the minimum?

@yaronkaikov
Copy link
Collaborator Author

@benipeled @Annamikhlin Please review

@benipeled
Copy link
Contributor

@benipeled @Annamikhlin Please review

Nothing changed since my last review - #491 (comment) - if it's verified by SCT and the swap change handled, go ahead

@roydahan
Copy link
Contributor

I tried to look in our documentation and I couldn't find anyware in AMI launch for scylladb, explaination on how much to configure for root FS (swap but also logs and other things).
This is what I found:
https://opensource.docs.scylladb.com/stable/operating-scylla/procedures/cluster-management/ec2-dc.html
https://opensource.docs.scylladb.com/stable/getting-started/install-scylla/launch-on-aws.html
https://www.scylladb.com/product/release-notes/scylla-ami-for-aws/

@yaronkaikov
Copy link
Collaborator Author

I tried to look in our documentation and I couldn't find anyware in AMI launch for scylladb, explaination on how much to configure for root FS (swap but also logs and other things). This is what I found: https://opensource.docs.scylladb.com/stable/operating-scylla/procedures/cluster-management/ec2-dc.html https://opensource.docs.scylladb.com/stable/getting-started/install-scylla/launch-on-aws.html https://www.scylladb.com/product/release-notes/scylla-ami-for-aws/

I am not sure we have such

we do have https://opensource.docs.scylladb.com/stable/kb/set-up-swap.html . so i am not sure why not set it during scylla_image_setup. it shouldn't take long

@tzach
Copy link
Contributor

tzach commented Dec 24, 2023

@tzach Any reason not to reduce the size to the minimum?

Not that I'm aware of. Primarily, we need to run all standard tests with this image.

@roydahan
Copy link
Contributor

Not that I'm aware of. Primarily, we need to run all standard tests with this image.

It won't matter because the tests are configuring it with the required root FS size that will have enough space for swap.
The product question here is about the fact that we won't have the minimum size by default and one needs to set it correctly when deploying from our AMI.

TBH, I think it's a change that confuse users and may break some users automations.

Every AMI we create today holds a 30Gb EBS snapshot, if we take into account the fact that we copy those images to other regions and have multiple images (dev, debug, releases) it is adding up to our cost.

This commit reduces the size of the snapshot to 8Gb (it's the minimum value allowed for a snapshot)

In addition, it's reducing also the time for the build process from 30 minutes to ~12 minutes (which is also good)

Refs: scylladb/scylla-pkg#3712
@tzach
Copy link
Contributor

tzach commented Dec 28, 2023

It won't matter because the tests are configuring it with the required root FS size that will have enough space for swap.

@roydahan Does it make sense to do this automatically for all users?

I'm not comfortable updating production image settings without testing!

@fruch
Copy link
Collaborator

fruch commented Dec 28, 2023

It won't matter because the tests are configuring it with the required root FS size that will have enough space for swap.

@roydahan Does it make sense to do this automatically for all users?

the only why you might force people to set bigger root disks, if the image root disk is bigger
so if they set it too small, it would fails to create an instance

you can automate in any other way, if you are giving them ami-id.

now it's 30Gb, and no on can create something smaller
and the by product of it, that all of our images created take 30Gb of storage, and that's why yaron is trying to make it smaller

it's a UX question, cause now if user were counting someone on their process on the default size to o.k. for them and with the swap size that comes default,
now on the next release they might get something else out of the box, that won't have enough swap.

all other users that specific the sizes of root-disks regardless, won't notice it. (they already use >=30Gb now)
I think also scylla-cloud those that, and doesn't rely on the default

I'm not comfortable updating production image settings without testing!

It would be test once it hit master, but with tools that all specify the root disk size with >=30Gb

@roydahan
Copy link
Contributor

TBH I think this change doesn't worth the pain and bad UX for fresh install.
I suggest we won't do this for now.

@mykaul
Copy link

mykaul commented Dec 28, 2023

TBH I think this change doesn't worth the pain and bad UX for fresh install. I suggest we won't do this for now.

It is thousands of dollars on EBS costs, btw.
If not now - when?

@roydahan
Copy link
Contributor

TBH I think this change doesn't worth the pain and bad UX for fresh install. I suggest we won't do this for now.

It is thousands of dollars on EBS costs, btw. If not now - when?

It's thousands of dollars if you don't delete the snapshots, but AFAIU we do now, and won't save more than 180 days.
So, these extra 22GB will cost us 1.76$ per month, per AMI (0.08$ * 22GB).
Assuming we have around 1 AMI per day in avg, it shouldn't cost thousands of dollars, probably few hundreds.

I think it's worth the user experience of launching our AMI with the defaults value and scylla fails to setup since there is not enough disk space to configure swap.

We're spending much much more money on instances that people don't terminate and live for months...

@mykaul
Copy link

mykaul commented Dec 31, 2023

TBH I think this change doesn't worth the pain and bad UX for fresh install. I suggest we won't do this for now.

It is thousands of dollars on EBS costs, btw. If not now - when?

It's thousands of dollars if you don't delete the snapshots, but AFAIU we do now, and won't save more than 180 days. So, these extra 22GB will cost us 1.76$ per month, per AMI (0.08$ * 22GB). Assuming we have around 1 AMI per day in avg, it shouldn't cost thousands of dollars, probably few hundreds.

Just 1 AMI per day would be great - but it's per region, no?

I think it's worth the user experience of launching our AMI with the defaults value and scylla fails to setup since there is not enough disk space to configure swap.

We can work on improving the user experience too.
Those 30GB may or may not be enough, depending on the instance type anyway, no?

We're spending much much more money on instances that people don't terminate and live for months...

We'll get there too. The fact we have instances running >1 week that are not production ones - should be one of our next item.
We'll also reduce container images (SCT's for example). Every minute run counts. Eventually, things add up.

@yaronkaikov
Copy link
Collaborator Author

TBH I think this change doesn't worth the pain and bad UX for fresh install. I suggest we won't do this for now.

It is thousands of dollars on EBS costs, btw. If not now - when?

It's thousands of dollars if you don't delete the snapshots, but AFAIU we do now, and won't save more than 180 days. So, these extra 22GB will cost us 1.76$ per month, per AMI (0.08$ * 22GB). Assuming we have around 1 AMI per day in avg, it shouldn't cost thousands of dollars, probably few hundreds.

Just 1 AMI per day would be great - but it's per region, no?

Actually, we are creating many more AMI's , during December we created 173 AMI's in us-east-1 only , in other testing regions such as eu-west- we have 111 AMIs

Some of them for debug and some are official releases/master

I think it's worth the user experience of launching our AMI with the defaults value and scylla fails to setup since there is not enough disk space to configure swap.

We can work on improving the user experience too. Those 30GB may or may not be enough, depending on the instance type anyway, no?

We're spending much much more money on instances that people don't terminate and live for months...

We'll get there too. The fact we have instances running >1 week that are not production ones - should be one of our next item. We'll also reduce container images (SCT's for example). Every minute run counts. Eventually, things add up.

@roydahan
Copy link
Contributor

roydahan commented Jan 7, 2024

Reducing the AMI size saves only 500$ in a course of 6 months, calculating 2 AMIs for master and 2 AMIs for enterprise.
It's less than 0.4% of our costs in 6 moths period.

There are other places where we can save for snapshots, taking them in this task: https://github.com/scylladb/scylla-pkg/issues/3712

@roydahan roydahan closed this Jan 7, 2024
@yaronkaikov yaronkaikov deleted the reduce-ami-snapshot branch January 7, 2024 15:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants