Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lxc copy --refresh deletes latest snapshot and resends it even when trees are already in sync. ZFS storage backend #14472

Open
6 tasks
manfromafar opened this issue Nov 15, 2024 · 9 comments
Labels
Improvement Improve to current situation
Milestone

Comments

@manfromafar
Copy link

manfromafar commented Nov 15, 2024

Required information

  • Distribution: Ubuntu
  • Distribution version: 24.04.1
  • The output of "snap list --all lxd core20 core22 core24 snapd":
Name    Version      Rev    Tracking       Publisher   Notes
core22  20241001     1663   latest/stable  canonical✓  base
lxd     6.1-78a3d8f  30130  latest/stable  canonical✓  -
snapd   2.63         21759  latest/stable  canonical✓  snapd
  • The output of "lxc info" or if that fails:
    • Kernel version:
    • LXC version:
    • LXD version:
    • Storage backend in use:
  config:
  core.https_address: '[::]:8443'
  images.auto_update_interval: "0"
api_extensions:
- storage_zfs_remove_snapshots
- container_host_shutdown_timeout
- container_stop_priority
- container_syscall_filtering
- auth_pki
- container_last_used_at
- etag
- patch
- usb_devices
- https_allowed_credentials
- image_compression_algorithm
- directory_manipulation
- container_cpu_time
- storage_zfs_use_refquota
- storage_lvm_mount_options
- network
- profile_usedby
- container_push
- container_exec_recording
- certificate_update
- container_exec_signal_handling
- gpu_devices
- container_image_properties
- migration_progress
- id_map
- network_firewall_filtering
- network_routes
- storage
- file_delete
- file_append
- network_dhcp_expiry
- storage_lvm_vg_rename
- storage_lvm_thinpool_rename
- network_vlan
- image_create_aliases
- container_stateless_copy
- container_only_migration
- storage_zfs_clone_copy
- unix_device_rename
- storage_lvm_use_thinpool
- storage_rsync_bwlimit
- network_vxlan_interface
- storage_btrfs_mount_options
- entity_description
- image_force_refresh
- storage_lvm_lv_resizing
- id_map_base
- file_symlinks
- container_push_target
- network_vlan_physical
- storage_images_delete
- container_edit_metadata
- container_snapshot_stateful_migration
- storage_driver_ceph
- storage_ceph_user_name
- resource_limits
- storage_volatile_initial_source
- storage_ceph_force_osd_reuse
- storage_block_filesystem_btrfs
- resources
- kernel_limits
- storage_api_volume_rename
- network_sriov
- console
- restrict_devlxd
- migration_pre_copy
- infiniband
- maas_network
- devlxd_events
- proxy
- network_dhcp_gateway
- file_get_symlink
- network_leases
- unix_device_hotplug
- storage_api_local_volume_handling
- operation_description
- clustering
- event_lifecycle
- storage_api_remote_volume_handling
- nvidia_runtime
- container_mount_propagation
- container_backup
- devlxd_images
- container_local_cross_pool_handling
- proxy_unix
- proxy_udp
- clustering_join
- proxy_tcp_udp_multi_port_handling
- network_state
- proxy_unix_dac_properties
- container_protection_delete
- unix_priv_drop
- pprof_http
- proxy_haproxy_protocol
- network_hwaddr
- proxy_nat
- network_nat_order
- container_full
- backup_compression
- nvidia_runtime_config
- storage_api_volume_snapshots
- storage_unmapped
- projects
- network_vxlan_ttl
- container_incremental_copy
- usb_optional_vendorid
- snapshot_scheduling
- snapshot_schedule_aliases
- container_copy_project
- clustering_server_address
- clustering_image_replication
- container_protection_shift
- snapshot_expiry
- container_backup_override_pool
- snapshot_expiry_creation
- network_leases_location
- resources_cpu_socket
- resources_gpu
- resources_numa
- kernel_features
- id_map_current
- event_location
- storage_api_remote_volume_snapshots
- network_nat_address
- container_nic_routes
- cluster_internal_copy
- seccomp_notify
- lxc_features
- container_nic_ipvlan
- network_vlan_sriov
- storage_cephfs
- container_nic_ipfilter
- resources_v2
- container_exec_user_group_cwd
- container_syscall_intercept
- container_disk_shift
- storage_shifted
- resources_infiniband
- daemon_storage
- instances
- image_types
- resources_disk_sata
- clustering_roles
- images_expiry
- resources_network_firmware
- backup_compression_algorithm
- ceph_data_pool_name
- container_syscall_intercept_mount
- compression_squashfs
- container_raw_mount
- container_nic_routed
- container_syscall_intercept_mount_fuse
- container_disk_ceph
- virtual-machines
- image_profiles
- clustering_architecture
- resources_disk_id
- storage_lvm_stripes
- vm_boot_priority
- unix_hotplug_devices
- api_filtering
- instance_nic_network
- clustering_sizing
- firewall_driver
- projects_limits
- container_syscall_intercept_hugetlbfs
- limits_hugepages
- container_nic_routed_gateway
- projects_restrictions
- custom_volume_snapshot_expiry
- volume_snapshot_scheduling
- trust_ca_certificates
- snapshot_disk_usage
- clustering_edit_roles
- container_nic_routed_host_address
- container_nic_ipvlan_gateway
- resources_usb_pci
- resources_cpu_threads_numa
- resources_cpu_core_die
- api_os
- container_nic_routed_host_table
- container_nic_ipvlan_host_table
- container_nic_ipvlan_mode
- resources_system
- images_push_relay
- network_dns_search
- container_nic_routed_limits
- instance_nic_bridged_vlan
- network_state_bond_bridge
- usedby_consistency
- custom_block_volumes
- clustering_failure_domains
- resources_gpu_mdev
- console_vga_type
- projects_limits_disk
- network_type_macvlan
- network_type_sriov
- container_syscall_intercept_bpf_devices
- network_type_ovn
- projects_networks
- projects_networks_restricted_uplinks
- custom_volume_backup
- backup_override_name
- storage_rsync_compression
- network_type_physical
- network_ovn_external_subnets
- network_ovn_nat
- network_ovn_external_routes_remove
- tpm_device_type
- storage_zfs_clone_copy_rebase
- gpu_mdev
- resources_pci_iommu
- resources_network_usb
- resources_disk_address
- network_physical_ovn_ingress_mode
- network_ovn_dhcp
- network_physical_routes_anycast
- projects_limits_instances
- network_state_vlan
- instance_nic_bridged_port_isolation
- instance_bulk_state_change
- network_gvrp
- instance_pool_move
- gpu_sriov
- pci_device_type
- storage_volume_state
- network_acl
- migration_stateful
- disk_state_quota
- storage_ceph_features
- projects_compression
- projects_images_remote_cache_expiry
- certificate_project
- network_ovn_acl
- projects_images_auto_update
- projects_restricted_cluster_target
- images_default_architecture
- network_ovn_acl_defaults
- gpu_mig
- project_usage
- network_bridge_acl
- warnings
- projects_restricted_backups_and_snapshots
- clustering_join_token
- clustering_description
- server_trusted_proxy
- clustering_update_cert
- storage_api_project
- server_instance_driver_operational
- server_supported_storage_drivers
- event_lifecycle_requestor_address
- resources_gpu_usb
- clustering_evacuation
- network_ovn_nat_address
- network_bgp
- network_forward
- custom_volume_refresh
- network_counters_errors_dropped
- metrics
- image_source_project
- clustering_config
- network_peer
- linux_sysctl
- network_dns
- ovn_nic_acceleration
- certificate_self_renewal
- instance_project_move
- storage_volume_project_move
- cloud_init
- network_dns_nat
- database_leader
- instance_all_projects
- clustering_groups
- ceph_rbd_du
- instance_get_full
- qemu_metrics
- gpu_mig_uuid
- event_project
- clustering_evacuation_live
- instance_allow_inconsistent_copy
- network_state_ovn
- storage_volume_api_filtering
- image_restrictions
- storage_zfs_export
- network_dns_records
- storage_zfs_reserve_space
- network_acl_log
- storage_zfs_blocksize
- metrics_cpu_seconds
- instance_snapshot_never
- certificate_token
- instance_nic_routed_neighbor_probe
- event_hub
- agent_nic_config
- projects_restricted_intercept
- metrics_authentication
- images_target_project
- cluster_migration_inconsistent_copy
- cluster_ovn_chassis
- container_syscall_intercept_sched_setscheduler
- storage_lvm_thinpool_metadata_size
- storage_volume_state_total
- instance_file_head
- instances_nic_host_name
- image_copy_profile
- container_syscall_intercept_sysinfo
- clustering_evacuation_mode
- resources_pci_vpd
- qemu_raw_conf
- storage_cephfs_fscache
- network_load_balancer
- vsock_api
- instance_ready_state
- network_bgp_holdtime
- storage_volumes_all_projects
- metrics_memory_oom_total
- storage_buckets
- storage_buckets_create_credentials
- metrics_cpu_effective_total
- projects_networks_restricted_access
- storage_buckets_local
- loki
- acme
- internal_metrics
- cluster_join_token_expiry
- remote_token_expiry
- init_preseed
- storage_volumes_created_at
- cpu_hotplug
- projects_networks_zones
- network_txqueuelen
- cluster_member_state
- instances_placement_scriptlet
- storage_pool_source_wipe
- zfs_block_mode
- instance_generation_id
- disk_io_cache
- amd_sev
- storage_pool_loop_resize
- migration_vm_live
- ovn_nic_nesting
- oidc
- network_ovn_l3only
- ovn_nic_acceleration_vdpa
- cluster_healing
- instances_state_total
- auth_user
- security_csm
- instances_rebuild
- numa_cpu_placement
- custom_volume_iso
- network_allocations
- storage_api_remote_volume_snapshot_copy
- zfs_delegate
- operations_get_query_all_projects
- metadata_configuration
- syslog_socket
- event_lifecycle_name_and_project
- instances_nic_limits_priority
- disk_initial_volume_configuration
- operation_wait
- cluster_internal_custom_volume_copy
- disk_io_bus
- storage_cephfs_create_missing
- instance_move_config
- ovn_ssl_config
- init_preseed_storage_volumes
- metrics_instances_count
- server_instance_type_info
- resources_disk_mounted
- server_version_lts
- oidc_groups_claim
- loki_config_instance
- storage_volatile_uuid
- import_instance_devices
- instances_uefi_vars
- instances_migration_stateful
- container_syscall_filtering_allow_deny_syntax
- access_management
- vm_disk_io_limits
- storage_volumes_all
- instances_files_modify_permissions
- image_restriction_nesting
- container_syscall_intercept_finit_module
- device_usb_serial
- network_allocate_external_ips
- explicit_trust_token
api_status: stable
api_version: "1.0"
auth: trusted
public: false
auth_methods:
- tls
auth_user_name: root
auth_user_method: unix
environment:
  addresses:
  - 129.128.63.104:8443
  - 10.207.217.1:8443
  - '[fd42:44b3:9d9c:bc59::1]:8443'
  architectures:
  - x86_64
  - i686
  certificate: |
    -----BEGIN CERTIFICATE-----
    MIIB6TCCAW+gAwIBAgIQUydsvgyGdahYAGKXrTmUyjAKBggqhkjOPQQDAzAmMQww
    CgYDVQQKEwNMWEQxFjAUBgNVBAMMDXJvb3RAbHhkaG9zdDMwHhcNMjQxMTEzMTgw
    NDQ4WhcNMzQxMTExMTgwNDQ4WjAmMQwwCgYDVQQKEwNMWEQxFjAUBgNVBAMMDXJv
    b3RAbHhkaG9zdDMwdjAQBgcqhkjOPQIBBgUrgQQAIgNiAAQoeduq4xq8hMWYKa4w
    UOCJwluP/1trtPskPZgovF6HCAKLO4FI1QH7rAfrdodMozMkws/je1YObXrA2N4Z
    BA0iF6lvIYr5m9yacg5qGNNEXL69BufdsFmaM9CVsluQQ22jYjBgMA4GA1UdDwEB
    /wQEAwIFoDATBgNVHSUEDDAKBggrBgEFBQcDATAMBgNVHRMBAf8EAjAAMCsGA1Ud
    EQQkMCKCCGx4ZGhvc3QzhwR/AAABhxAAAAAAAAAAAAAAAAAAAAABMAoGCCqGSM49
    BAMDA2gAMGUCMQDHxrgKsHy05WE57pXdnBnw1C3hPvnMJgCuL0rxZ5R9spYvXoCd
    Asx4bLpqs9VvZ4UCMFViD+nGXEtWYmRiCDkA4WLnIubqfbWzgn5pz8cVkkSfBcp0
    5QelufmGpKInEPNXHg==
    -----END CERTIFICATE-----
  certificate_fingerprint: 6cdea8362789ff1605312c91891334afd8740b1e0d7e8ad59187e2d5e8b8a73c
  driver: lxc
  driver_version: 6.0.0
  instance_types:
  - container
  firewall: nftables
  kernel: Linux
  kernel_architecture: x86_64
  kernel_features:
    idmapped_mounts: "true"
    netnsid_getifaddrs: "true"
    seccomp_listener: "true"
    seccomp_listener_continue: "true"
    uevent_injection: "true"
    unpriv_fscaps: "true"
  kernel_version: 6.8.0-48-generic
  lxc_features:
    cgroup2: "true"
    core_scheduling: "true"
    devpts_fd: "true"
    idmapped_mounts_v2: "true"
    mount_injection_file: "true"
    network_gateway_device_route: "true"
    network_ipvlan: "true"
    network_l2proxy: "true"
    network_phys_macvlan_mtu: "true"
    network_veth_router: "true"
    pidfd: "true"
    seccomp_allow_deny_syntax: "true"
    seccomp_notify: "true"
    seccomp_proxy_send_notify_fd: "true"
  os_name: Ubuntu
  os_version: "24.04"
  project: default
  server: lxd
  server_clustered: false
  server_event_mode: full-mesh
  server_name: lxdhost3
  server_pid: 3149
  server_version: "6.1"
  server_lts: false
  storage: zfs
  storage_version: 2.2.6-1
  storage_supported_drivers:
  - name: dir
    version: "1"
    remote: false
  - name: lvm
    version: 2.03.11(2) (2021-01-08) / 1.02.175 (2021-01-08) / 4.48.0
    remote: false
  - name: powerflex
    version: 1.16 (nvme-cli)
    remote: true
  - name: zfs
    version: 2.2.6-1
    remote: false
  - name: btrfs
    version: 5.16.2
    remote: false
  - name: ceph
    version: 17.2.7
    remote: true
  - name: cephfs
    version: 17.2.7
    remote: true
  - name: cephobject
    version: 17.2.7
    remote: true

Issue description

When sending a copy of a container to another host using lxc copy --refresh using a zfs backend on both systems top level snapshot is deleted on the remote and resent even if the snapshots are identical.

This can cause large amounts of unnecessary network traffic, disk io, etc.

Some steps to help resolve this. Would be before sending any snapshots checking the zfs guid property for the snapshots on both hosts and comparing.
Since the zfs guids for the snapshots can't change as they are read only. If both parties leading snapshot match then no operations are needed.
This might have to be a new flag to specify you only want to check snapshot consistency instead of the live data as well.
Something like --snaps-only or a better name.

Steps to reproduce

  1. create a container (c1) on one system (sys1)
  2. snapshot c1 a couple of times with names like 1,2
    a. lxc snapshot c1 1
    b. lxc snapshot c1 2
  3. Write arbitrary data into c1. This is done to be able to see the transfer
    a. lxc exec c1 -- dd if=/dev/urandom of=/root/dd.img bs=1M count=1000
  4. Snapshot c1 as snapshot 3
    a. lxc snapshot c1 3
  5. Copy c1 to sys2 (this assumes you ahve the remotes setup)
    a. lxc copy c1 sys2:c1
  6. Very the snapshots exist sys2
    a. lxc info c1 should show snapshots 1,2,3
    b. zfs list -t snapshot should show the snapshots for c1 (assumes your pool isn't managed by lxd
  7. Now that the trees are synced do a lxc copy --refresh from sys1 to sys2 for c1
    a. lxc copy --refresh c1 sys2:c1
  8. If you monitor zfs on sys2 you'll notice that:
    1. snap 3 is deleted off sys2
    2. sys1 zfs sends snap3 back to sys2

Information to attach

  • Any relevant kernel output (dmesg)
  • Container log (lxc info NAME --show-log)
  • Container configuration (lxc config show NAME --expanded)
  • Main daemon log (at /var/log/lxd/lxd.log or /var/snap/lxd/common/lxd/logs/lxd.log)
  • Output of the client with --debug
  • Output of the daemon with --debug (alternatively output of lxc monitor while reproducing the issue)
@tomponline
Copy link
Member

Please may you update your reproducer steps with the exact lxc snapshot command you are using.

Also please can you confirm this issue isn't fix in latest/edge channel? Thanks

@tomponline tomponline added the Incomplete Waiting on more information from reporter label Nov 15, 2024
@manfromafar
Copy link
Author

manfromafar commented Nov 15, 2024

OK, just tried it with edge
lxd git-ba31c87 31205 latest/edge canonical✓ -
and same behavior occurs.
Snap 3 is deleted then resent

Updated the reproduction steps to have the snapshot commands. It's using the plan lxc snapshot commands.

@tomponline tomponline added Bug Confirmed to be a bug and removed Incomplete Waiting on more information from reporter labels Nov 18, 2024
@tomponline tomponline added this to the lxd-6.2 milestone Nov 18, 2024
@tomponline tomponline modified the milestones: lxd-6.2, lxd-6.3 Nov 27, 2024
@kadinsayani
Copy link
Contributor

Indeed, unnecessary I/O is occurring when copying containers with identical snapshots:

                                                capacity     operations     bandwidth
pool                                          alloc   free   read  write   read  write
--------------------------------------------  -----  -----  -----  -----  -----  -----
default                                       2.90G  26.6G     64  1.06K   224K  3.60M
  /var/snap/lxd/common/lxd/disks/default.img  2.90G  26.6G     64  1.06K   224K  3.60M
--------------------------------------------  -----  -----  -----  -----  -----  -----
                                                capacity     operations     bandwidth
pool                                          alloc   free   read  write   read  write
--------------------------------------------  -----  -----  -----  -----  -----  -----
default                                       2.90G  26.6G      4    260  5.49K   649K
  /var/snap/lxd/common/lxd/disks/default.img  2.90G  26.6G      4    260  5.49K   649K
--------------------------------------------  -----  -----  -----  -----  -----  -----

@kadinsayani
Copy link
Contributor

And, the guid's are identical:

❯ zfs get guid default/containers/c1@snapshot-3
NAME                              PROPERTY  VALUE                SOURCE
default/containers/c1@snapshot-3  guid      6664040801001024508  -

❯ zfs get guid default/containers/c2@snapshot-3
NAME                              PROPERTY  VALUE                SOURCE
default/containers/c2@snapshot-3  guid      6664040801001024508  -

@kadinsayani
Copy link
Contributor

kadinsayani commented Nov 27, 2024

After further investigation, I've found that we already have logic to prevent snapshots with identical GUID's from being copied:

// Generate list of snapshots which need to be synced, i.e. are available on the source but not on the target.
for _, srcSnapshot := range migrationHeader.SnapshotDatasets {
found := false
for _, dstSnapshot := range respSnapshots {
if srcSnapshot.GUID == dstSnapshot.GUID {
found = true
break
}
}
if !found {
syncSnapshotNames = append(syncSnapshotNames, srcSnapshot.Name)
}
}

Furthermore, lxc copy --refresh always rolls back to the latest identical snapshot. Therefore, the zfs send and zfs recv calls for the latest identical snapshot are necessary:

// Rollback to the latest identical snapshot if performing a refresh.
if volTargetArgs.Refresh {
snapshots, err = vol.Snapshots(op)
if err != nil {
return err
}
if len(snapshots) > 0 {
lastIdenticalSnapshot := snapshots[len(snapshots)-1]
err = d.restoreVolume(vol, lastIdenticalSnapshot, true, op)
if err != nil {
return err
}
}
}

if volTargetArgs.Refresh {
// Only delete the latest migration snapshot.
_, err := shared.RunCommand("zfs", "destroy", "-r", fmt.Sprintf("%s%s", d.dataset(vol, false), entries[len(entries)-1]))
if err != nil {
return err
}

I think lxc copy --instance-only might be a better fit for what you’re trying to do. It copies the instance itself, not its snapshots.

@manfromafar
Copy link
Author

manfromafar commented Nov 27, 2024

I think lxc copy --instance-only might be a better fit for what you’re trying to do. It copies the instance itself, not its snapshots.

But I do want the snapshots to be copied if they are missing instead of just applying the "active" data outside the snapshot.
I would assume it would be more appropriate that (at least on zfs) snapshots are assumed to be immutable, meaning if the snapshots on the two systems are the same then there's no reason to send the same top level snapshot over.
Instead of deleting the top level snapshot and resending it.

In this example that's only 1GB but a production system that maybe 100Gb-10TB depending on what the system is doing.

This just happens to be a case where both trees are identical so nothing should happen.
Maybe a --mirror flag would be better. Denoting I want to make sure X tree matchs on y system.
Then the normal --refresh would match the last in sync snapshot in the trees and start there.

But for the current implementation I think resending the top level snapshot again when it matches doesn't make sense

@kadinsayani
Copy link
Contributor

kadinsayani commented Nov 28, 2024

After following the reproducer steps outlined in the issue, here are the zpool events that occur:

Nov 28 2024 08:00:02.108390124 sysevent.fs.zfs.history_event version = 0x0 class = "sysevent.fs.zfs.history_event" pool = "default" pool_guid = 0xdd325a595ddcd173 pool_state = 0x0 pool_context = 0x0 history_hostname = "devbox" history_dsname = "default/containers/c2/%rollback" history_internal_str = "parent=c2" history_internal_name = "clone swap" history_dsid = 0x1077 history_txg = 0x5fe history_time = 0x67488572 time = 0x67488572 0x675e6ec eid = 0xd47

Nov 28 2024 08:00:02.109390127 sysevent.fs.zfs.history_event
version = 0x0
class = "sysevent.fs.zfs.history_event"
pool = "default"
pool_guid = 0xdd325a595ddcd173
pool_state = 0x0
pool_context = 0x0
history_hostname = "devbox"
history_dsname = "default/containers/c2/%rollback"
history_internal_str = "(bptree, mintxg=1508)"
history_internal_name = "destroy"
history_dsid = 0x1077
history_txg = 0x5fe
history_time = 0x67488572
time = 0x67488572 0x685292f
eid = 0xd48

Nov 28 2024 08:00:02.123390164 sysevent.fs.zfs.history_event
version = 0x0
class = "sysevent.fs.zfs.history_event"
pool = "default"
pool_guid = 0xdd325a595ddcd173
pool_state = 0x0
pool_context = 0x0
history_hostname = "devbox"
history_dsname = "default/containers/c1@snapshot-c0ffe7bb-b7b2-4589-b0b7-d56eefe3454a"
history_internal_str = " "
history_internal_name = "snapshot"
history_dsid = 0x140a
history_txg = 0x5ff
history_time = 0x67488572
time = 0x67488572 0x75ac8d4
eid = 0xd49

Nov 28 2024 08:00:02.147390227 sysevent.fs.zfs.history_event
version = 0x0
class = "sysevent.fs.zfs.history_event"
pool = "default"
pool_guid = 0xdd325a595ddcd173
pool_state = 0x0
pool_context = 0x0
history_hostname = "devbox"
history_dsname = "default/containers/c2/%recv"
history_internal_str = " "
history_internal_name = "receive"
history_dsid = 0x1410
history_txg = 0x600
history_time = 0x67488572
time = 0x67488572 0x8c8ff13
eid = 0xd4a

Nov 28 2024 08:00:02.160390262 sysevent.fs.zfs.history_event
version = 0x0
class = "sysevent.fs.zfs.history_event"
pool = "default"
pool_guid = 0xdd325a595ddcd173
pool_state = 0x0
pool_context = 0x0
history_hostname = "devbox"
history_dsname = "default/containers/c2/%recv"
history_internal_str = "snap=snapshot-c0ffe7bb-b7b2-4589-b0b7-d56eefe3454a"
history_internal_name = "finish receiving"
history_dsid = 0x1410
history_txg = 0x601
history_time = 0x67488572
time = 0x67488572 0x98f5c76
eid = 0xd4b

Nov 28 2024 08:00:02.160390262 sysevent.fs.zfs.history_event
version = 0x0
class = "sysevent.fs.zfs.history_event"
pool = "default"
pool_guid = 0xdd325a595ddcd173
pool_state = 0x0
pool_context = 0x0
history_hostname = "devbox"
history_dsname = "default/containers/c2/%recv"
history_internal_str = "parent=c2"
history_internal_name = "clone swap"
history_dsid = 0x1410
history_txg = 0x601
history_time = 0x67488572
time = 0x67488572 0x98f5c76
eid = 0xd4c

Nov 28 2024 08:00:02.160390262 sysevent.fs.zfs.history_event
version = 0x0
class = "sysevent.fs.zfs.history_event"
pool = "default"
pool_guid = 0xdd325a595ddcd173
pool_state = 0x0
pool_context = 0x0
history_hostname = "devbox"
history_dsname = "default/containers/c2@snapshot-c0ffe7bb-b7b2-4589-b0b7-d56eefe3454a"
history_internal_str = " "
history_internal_name = "snapshot"
history_dsid = 0x1489
history_txg = 0x601
history_time = 0x67488572
time = 0x67488572 0x98f5c76
eid = 0xd4d

Nov 28 2024 08:00:02.161390264 sysevent.fs.zfs.history_event
version = 0x0
class = "sysevent.fs.zfs.history_event"
pool = "default"
pool_guid = 0xdd325a595ddcd173
pool_state = 0x0
pool_context = 0x0
history_hostname = "devbox"
history_dsname = "default/containers/c2/%recv"
history_internal_str = "(bptree, mintxg=1508)"
history_internal_name = "destroy"
history_dsid = 0x1410
history_txg = 0x601
history_time = 0x67488572
time = 0x67488572 0x99e9eb8
eid = 0xd4e

Nov 28 2024 08:00:02.230390447 sysevent.fs.zfs.history_event
version = 0x0
class = "sysevent.fs.zfs.history_event"
pool = "default"
pool_guid = 0xdd325a595ddcd173
pool_state = 0x0
pool_context = 0x0
history_hostname = "devbox"
history_dsname = "default/containers/c2/%rollback"
history_internal_str = "parent=c2"
history_internal_name = "clone swap"
history_dsid = 0x1490
history_txg = 0x602
history_time = 0x67488572
time = 0x67488572 0xdbb7aaf
eid = 0xd4f

Nov 28 2024 08:00:02.231390449 sysevent.fs.zfs.history_event
version = 0x0
class = "sysevent.fs.zfs.history_event"
pool = "default"
pool_guid = 0xdd325a595ddcd173
pool_state = 0x0
pool_context = 0x0
history_hostname = "devbox"
history_dsname = "default/containers/c2/%rollback"
history_internal_str = "(bptree, mintxg=1537)"
history_internal_name = "destroy"
history_dsid = 0x1490
history_txg = 0x602
history_time = 0x67488572
time = 0x67488572 0xdcabcf1
eid = 0xd50

Nov 28 2024 08:00:02.256390515 sysevent.fs.zfs.history_event
version = 0x0
class = "sysevent.fs.zfs.history_event"
pool = "default"
pool_guid = 0xdd325a595ddcd173
pool_state = 0x0
pool_context = 0x0
history_hostname = "devbox"
history_dsname = "default/containers/c1@snapshot-c0ffe7bb-b7b2-4589-b0b7-d56eefe3454a"
history_internal_str = " "
history_internal_name = "destroy"
history_dsid = 0x140a
history_txg = 0x603
history_time = 0x67488572
time = 0x67488572 0xf483573
eid = 0xd51

Nov 28 2024 08:00:02.284390589 sysevent.fs.zfs.history_event
version = 0x0
class = "sysevent.fs.zfs.history_event"
pool = "default"
pool_guid = 0xdd325a595ddcd173
pool_state = 0x0
pool_context = 0x0
history_hostname = "devbox"
history_dsname = "default/containers/c2@snapshot-c0ffe7bb-b7b2-4589-b0b7-d56eefe3454a"
history_internal_str = " "
history_internal_name = "destroy"
history_dsid = 0x1489
history_txg = 0x604
history_time = 0x67488572
time = 0x67488572 0x10f374bd
eid = 0xd52

Nov 28 2024 08:00:02.334390721 sysevent.fs.zfs.history_event
version = 0x0
class = "sysevent.fs.zfs.history_event"
pool = "default"
pool_guid = 0xdd325a595ddcd173
pool_state = 0x0
pool_context = 0x0
history_hostname = "devbox"
history_dsname = "default/containers/c2"
history_internal_str = "mountpoint=legacy"
history_internal_name = "set"
history_dsid = 0x1273
history_txg = 0x605
history_time = 0x67488572
time = 0x67488572 0x13ee65c1
eid = 0xd53

Nov 28 2024 08:00:02.334390721 sysevent.fs.zfs.history_event
version = 0x0
class = "sysevent.fs.zfs.history_event"
pool = "default"
pool_guid = 0xdd325a595ddcd173
pool_state = 0x0
pool_context = 0x0
history_hostname = "devbox"
history_dsname = "default/containers/c2"
history_internal_str = "canmount=2"
history_internal_name = "set"
history_dsid = 0x1273
history_txg = 0x605
history_time = 0x67488572
time = 0x67488572 0x13ee65c1
eid = 0xd54

Nov 28 2024 08:00:02.472391086 sysevent.fs.zfs.history_event
version = 0x0
class = "sysevent.fs.zfs.history_event"
pool = "default"
pool_guid = 0xdd325a595ddcd173
pool_state = 0x0
pool_context = 0x0
history_hostname = "devbox"
history_dsname = "default/containers/c2"
history_internal_str = "mountpoint=legacy"
history_internal_name = "set"
history_dsid = 0x1273
history_txg = 0x608
history_time = 0x67488572
time = 0x67488572 0x1c281dae
eid = 0xd55

Nov 28 2024 08:00:02.472391086 sysevent.fs.zfs.history_event
version = 0x0
class = "sysevent.fs.zfs.history_event"
pool = "default"
pool_guid = 0xdd325a595ddcd173
pool_state = 0x0
pool_context = 0x0
history_hostname = "devbox"
history_dsname = "default/containers/c2"
history_internal_str = "canmount=2"
history_internal_name = "set"
history_dsid = 0x1273
history_txg = 0x608
history_time = 0x67488572
time = 0x67488572 0x1c281dae
eid = 0xd56

@manfromafar
Copy link
Author

Ah, Looking more closely I see what's going on.
I see that a temporary snapshot is created during the refresh called migration-GUID.
Then the last non ephemeral matching snapshot in this case snapshot-5 is used as a base for sending the migration snapshot.

ZFS send during refresh:

zfs send -c -L -i tank/virt/containers/test@snapshot-5 tank/virt/containers/test@migration-7503c1fe-afcc-4933-b627-680b95de8f35

Once the send is complete on the original host the migration snapshot is deleted.
From zpool history on the sending system:

zfs snapshot -r tank/virt/containers/test@migration-7503c1fe-afcc-4933-b627-680b95de8f35
zfs destroy -r -d tank/virt/containers/test@migration-7503c1fe-afcc-4933-b627-680b95de8f35

On the receiver side the container is rolled back to the last matching snapshot, snapshot-5 in this case.
Then the incremental of snapshot-5 to migration-GUID is received.
After the send has finished receiving the migration snapshot is deleted.

From zpool history on the receiving system when doing a refresh:

zfs rollback tank/virt/containers/test@snapshot-5
zfs receive -x mountpoint -F -u tank/virt/containers/test
zfs destroy -r tank/virt/containers/test@migration-7503c1fe-afcc-4933-b627-680b95de8f35

So my original conclusion was slightly wrong, since I was only watching the sizes of the datasets as the migration occurred.
But it seems the effect is still the same.
We're sending a ephemeral snapshot that is then deleted later on leaving the refreshed container on the remote system in the same state as if we never sent the migration snapshot over in the first place.

This behaviour still ends up wasting disk io and network bandwidth in the end since we delete the temporary snapshot which deletes all the data that was sent with it.

This could effectively be shortcut to just the zfs rollback to snapshot-5 without sending the migration snapshot.

@tomponline
Copy link
Member

tomponline commented Nov 28, 2024

OK thanks for the analysis both.

So this isn't a "bug" per-se, because the --refresh mechanism is supposed to be refreshing between LXD snaphots (those snapshots that are in the LXD database), rather than any temporary storage driver level snapshots that are used to facilitate the actual synchronization mechanism.

@simondeziel mentioned earlier in a discussion the same hypothesis.

So really the solution here is to take a LXD snapshot before doing a refresh using lxc snapshot.

This could also be an "improvement" task to add an option to the lxc copy command to enable taking a proper LXD snapshot automatically when doing a refresh (although how that works with the various storage drivers LXD supports would need some thought).

@tomponline tomponline added Improvement Improve to current situation and removed Bug Confirmed to be a bug labels Nov 28, 2024
@kadinsayani kadinsayani modified the milestones: lxd-6.3, later Nov 29, 2024
@kadinsayani kadinsayani removed their assignment Nov 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Improvement Improve to current situation
Projects
None yet
Development

No branches or pull requests

3 participants