Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deleting larger datastores with vsphere_vmfs_datastore times out #2249

Open
3 of 4 tasks
ianc769 opened this issue Aug 19, 2024 · 7 comments
Open
3 of 4 tasks

Deleting larger datastores with vsphere_vmfs_datastore times out #2249

ianc769 opened this issue Aug 19, 2024 · 7 comments
Labels
bug Type: Bug needs-triage Status: Issue Needs Triage

Comments

@ianc769
Copy link

ianc769 commented Aug 19, 2024

Community Guidelines

  • I have read and agree to the HashiCorp Community Guidelines .
  • Vote on this issue by adding a 👍 reaction to the original issue initial description to help the maintainers prioritize.
  • Do not leave "+1" or other comments that do not add relevant information or questions.
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment.

Terraform

1.8.5

Terraform Provider

2.8.2

VMware vSphere

8.0.2.00300

Description

Possibly related #417

Deleting vms that have larger disks, In our case 600GBx2, the operation times out since vSphere takes longer than 30s to delete them.

Perhaps having a customizable wait timer would be a good route for this resource, as vSphere seems to be a little iffy on timing.

Affected Resources or Data Sources

resources/vsphere_vmfs_datastore

Possibly this code:

waitForDelete := &resource.StateChangeConf{
Pending: []string{waitForDeletePending},
Target: []string{waitForDeleteCompleted},
Refresh: waitForDeleteFunc,
Timeout: defaultAPITimeout,
MinTimeout: 2 * time.Second,
Delay: 1 * time.Second,
NotFoundChecks: 35,
}

Terraform Configuration

resource "vsphere_virtual_machine" "pg_db" {
  for_each                = var.pg_db_servers
  name                    = each.key
  annotation              = each.value.annotation
  num_cpus                = each.value.num_cpus
  memory                  = each.value.memory
  folder                  = vsphere_folder.pg_db_folder.path
  datastore_id            = vsphere_vmfs_datastore.pg_data_datastore[each.key].id
  resource_pool_id        = data.terraform_remote_state.core_state.outputs.vsphere_compute_cluster.resource_pool_id
  guest_id                = data.vsphere_virtual_machine.rhel8.guest_id
  scsi_type               = data.vsphere_virtual_machine.rhel8.scsi_type
  tags                    = toset([data.terraform_remote_state.core_state.outputs.rhel_tag.id, data.terraform_remote_state.core_state.outputs.tf_tag.id])
  cpu_hot_add_enabled     = true
  memory_hot_add_enabled  = true
  firmware                = "efi"
  efi_secure_boot_enabled = true

  network_interface {
    network_id = each.value.network_id
  }
  disk {
    label       = "${each.key}.vmdk"
    size        = local.os_volume_size
    unit_number = 0
  }
  disk {
    label       = "${each.key}-data.vmdk"
    size        = each.value.pgdatasize
    unit_number = 1
  }
  disk {
    label        = "${each.key}-pg.vmdk"
    size         = each.value.pgpgsize
    unit_number  = 2
    datastore_id = vsphere_vmfs_datastore.pg_pg_datastore[each.key].id
  }

  clone {
    template_uuid = data.vsphere_virtual_machine.rhel8.id

    customize {
      linux_options {
        host_name = each.key
        domain    = each.value.linux_domain
      }
      dns_server_list = each.value.dns_server_list
      network_interface {
        ipv4_address = each.value.ipv4_address
        ipv4_netmask = each.value.ipv4_netmask
      }
      ipv4_gateway = each.value.ipv4_gateway

    }
  }
  connection {
    type     = "ssh"
    host     = self.default_ip_address
    user     = var.packer_user
    password = var.packer_pass
  }

  lifecycle {
    ignore_changes = [
      disk,
      resource_pool_id,
      clone[0],
      ept_rvi_mode,
      hv_mode
    ]
  }

  depends_on = [
    vsphere_vmfs_datastore.pg_data_datastore,
    vsphere_vmfs_datastore.pg_pg_datastore
  ]

}

Debug Output

https://gist.github.com/ianc769/b6fa08135736fa5db09ff49e5c85bd11

Panic Output

No response

Expected Behavior

Full Deletion with no issues.

Actual Behavior

Error Presenting if we hit the 30s mark.

Steps to Reproduce

Add a VM with a few large disks. Then delete them via Terraform.

Environment Details

No response

Screenshots

image

References

#417

@ianc769 ianc769 added bug Type: Bug needs-triage Status: Issue Needs Triage labels Aug 19, 2024
Copy link

Hello, ianc769! 🖐

Thank you for submitting an issue for this provider. The issue will now enter into the issue lifecycle.

If you want to contribute to this project, please review the contributing guidelines and information on submitting pull requests.

@ianc769
Copy link
Author

ianc769 commented Aug 19, 2024

Workaround:

  • Remove vsphere_virtual_machine
  • Run terraform apply
  • Encounter above timeout
  • Run terraform state rm on objects (They actually delete from vSphere.)
  • Run Terraform again to finish deletion.

@tenthirtyam
Copy link
Collaborator

Have you tried setting the api_timeout for the provider?

provider "vsphere" {
  user                 = var.vsphere_user
  password             = var.vsphere_password
  vsphere_server       = var.vsphere_server
  allow_unverified_ssl = true
  api_timeout          = 10
}

This will override the default timeout.

func providerConfigure(d *schema.ResourceData) (interface{}, error) {
timeoutMins := time.Duration(d.Get("api_timeout").(int))
defaultAPITimeout = timeoutMins * time.Minute
c, err := NewConfig(d)
if err != nil {
return nil, err
}
return c.Client()
}

Ryan Johnson
Distinguished Engineer, VMware by Broadcom

@ianc769
Copy link
Author

ianc769 commented Aug 19, 2024

Hey @tenthirtyam Does this affect the resource itself? I see that the default is 5 minutes according to the docs https://registry.terraform.io/providers/hashicorp/vsphere/latest/docs#api_timeout

Fully deleting the datastore takes about 1 minute it looks like.
image

I can try the build/destroy again with it doubled to 10 minutes, but I suspect it will not work.

@ianc769
Copy link
Author

ianc769 commented Aug 29, 2024

I upped the timeout via ENV Var. VSPHERE_API_TIMEOUT=10. Sadly looks like the provider is faster than vSphere.

image

The datastores do actually delete.

image

Current work around is to run the destroy, clean up the state objects that are listed as failed, then run the apply again.

@hatakashi
Copy link

Also noticing similar behaviour regardless of destroying a VM when deleting multiple datastores.

In our case, we are attempting to destroy 8 datastores and receive a timeout on any that are not deleted within 30 seconds or 1 minute and 5 seconds (we've seen both, however it is now only 30 seconds). Workaround mentioned here is the same as we've found works ourselves, however as we're using pipelines triggered by users unfamiliar with Terraform this wouldn't work as the timeout would cause the pipeline to consider itself failed.

Behaviour is noticeable even on relatively small, unused datastores (50-100GB).

Changing the API Timeout seems to have no effect on this.

Terraform
1.8.5

Terraform Provider
2.8.3 & 2.9.2

VMware vSphere
8.0.2.00300

@hatakashi
Copy link

Is there any news on this? Issue has been around since 2018 - even a workaround to manually set the timeout in the resource block would be greatly appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Type: Bug needs-triage Status: Issue Needs Triage
Projects
None yet
Development

No branches or pull requests

3 participants