Reclaim free space broken in 7.0

Florin Baca · September 5, 2016

We have 3 XenServers in our infra, the oldest one running a fresh install of 6.5, newest running a fresh install of 7.0 and one which I have upgraded yesterday from 6.5 to 7.0. There were no upgrade issues, aside from guest tools not installing on Windows hosts. Storage is local on all servers.

Cleaning the upgraded server, I tried to 'reclaim free storage' which failed with:

blkdiscard: /dev/VG_XenStorage-54fcd1f8-ef26-d32f-1a18-db9e16169231/54fcd1f8-ef26-d32f-1a18-db9e16169231_trim_lv: BLKDISCARD ioctl failed: Operation not supported

I looked in SMlog and I can see the blkdiscard runs after the lvcreate command:

Sep 5 07:20:57 xen2 SM: [11766] do_trim: {'sr_uuid': '54fcd1f8-ef26-d32f-1a18-db9e16169231'}
Sep 5 07:20:57 xen2 SM: [11766] lock: opening lock file /var/lock/sm/54fcd1f8-ef26-d32f-1a18-db9e16169231/sr
Sep 5 07:20:57 xen2 SM: [11766] lock: tried lock /var/lock/sm/54fcd1f8-ef26-d32f-1a18-db9e16169231/sr, acquired: True (exists: True)
Sep 5 07:20:57 xen2 SM: [11766] ['/sbin/lvs', '--noheadings', '/dev/VG_XenStorage-54fcd1f8-ef26-d32f-1a18-db9e16169231/54fcd1f8-ef26-d32f-1a18-db9e16169231_trim_lv']
Sep 5 07:20:57 xen2 SM: [11766] FAILED in util.pread: (rc 5) stdout: '', stderr: ' Failed to find logical volume "VG_XenStorage-54fcd1f8-ef26-d32f-1a18-db9e16169231/54fcd1f8-ef26-d32f-1a18-db9e16169231_trim_lv"
Sep 5 07:20:57 xen2 SM: [11766] '
Sep 5 07:20:57 xen2 SM: [11766] Ignoring exception for LV check: /dev/VG_XenStorage-54fcd1f8-ef26-d32f-1a18-db9e16169231/54fcd1f8-ef26-d32f-1a18-db9e16169231_trim_lv !
Sep 5 07:20:57 xen2 SM: [11766] ['/sbin/lvcreate', '-n', '54fcd1f8-ef26-d32f-1a18-db9e16169231_trim_lv', '-l', '100%F', 'VG_XenStorage-54fcd1f8-ef26-d32f-1a18-db9e16169231']
Sep 5 07:20:58 xen2 SM: [11766]   pread SUCCESS
Sep 5 07:20:58 xen2 SM: [11766] ['/usr/sbin/blkdiscard', '-v', '/dev/VG_XenStorage-54fcd1f8-ef26-d32f-1a18-db9e16169231/54fcd1f8-ef26-d32f-1a18-db9e16169231_trim_lv']
Sep 5 07:20:58 xen2 SM: [11766] FAILED in util.pread: (rc 1) stdout: '', stderr: 'blkdiscard: /dev/VG_XenStorage-54fcd1f8-ef26-d32f-1a18-db9e16169231/54fcd1f8-ef26-d32f-1a18-db9e16169231_trim_lv: BLKDISCARD ioctl failed: Operation not supported
Sep 5 07:20:58 xen2 SM: [11766] '
Sep 5 07:20:58 xen2 SM: [11766] ['/sbin/lvs', '--noheadings', '/dev/VG_XenStorage-54fcd1f8-ef26-d32f-1a18-db9e16169231/54fcd1f8-ef26-d32f-1a18-db9e16169231_trim_lv']
Sep 5 07:20:58 xen2 SM: [11766]   pread SUCCESS
Sep 5 07:20:58 xen2 SM: [11766] ['/sbin/lvremove', '-f', '/dev/VG_XenStorage-54fcd1f8-ef26-d32f-1a18-db9e16169231/54fcd1f8-ef26-d32f-1a18-db9e16169231_trim_lv']
Sep 5 07:20:58 xen2 SM: [11766]   pread SUCCESS
Sep 5 07:20:58 xen2 SM: [11766] ['/sbin/dmsetup', 'status', 'VG_XenStorage--54fcd1f8--ef26--d32f--1a18--db9e16169231-54fcd1f8--ef26--d32f--1a18--db9e16169231_trim_lv']

For testing, I ran the same on the freshly installed XenServer 7.0 but I am seeing the exact same error so it's not caused by the upgrade. I don't know if this can be ignored (since it shows in XenCenter as well) but the virtual allocation reported by XenCenter is larger than the actual size of the disk.

As a reference, I also reclaimed the free space on the old 6.5 server. The operation succeeds and there is no blkdiscard command in the SMlog.

Has anyone else seen these issues with 7.0?

Thanks,

Florin

Daniel Andrade · November 6, 2018

On 25/01/2018 at 7:12 AM, Mark Syms said:

No, recovery of space give no error, not the same thing at all. blkdiscard has absolutely no meaning for a local SAS drive. The "reclaim space" operation in XenServer is to allow thinly provisioned remote network storage to deprovisioned no longer used space back to the SAN free pool it is not going to do anything for you on a local SAS drive and arguably should be an option in XenCenter in this case.

My definitive test:

I installed XenServer 7.4 on a Dell R730 server with RAID 6 SAS disks.
I created a virtual machine of 500GB of disk and I created a snapshot, I removed the snapshot and the space was not recovered. I had an error while performing the space recovery procedure.

I formatted, installed XenServer 6.5, recreated a virtual machine with 500GB of disk, created the snapshot, removed the snapshot ... I clicked the button to recover the space and the space was recovered. If the XenServer does not display the Trim error because it has no processing to display it does not matter to me, what matters is that it can rather recover the removed snapshot space, while with version 7.X I have the error and I have no recovery of space.

Jason Rasmussen · November 5, 2018

On 11/3/2018 at 5:46 PM, Brian Rummel said:

I too am having this issue with 7.1 CU1. I only upgraded to 7.1 because I had too, as 6.5 is no longer supported.

I want to add, in my situation, I have a storage EMC storage array that I am connecting to via iSCSI. Before upgrading to 7.1, when I would migrate or move VMs, the coalesce process would eventually reclaim space from the migrate process. Now it does not. I have my storage array set up with multiple LUNS and attach them to my pool as separate SR's. So even though they show as different SRs, they are on the same array using the some connection and hardware etc. I have tried the rescan SR to no avail. I have also tried the reclaim freed space and get the BLKDISCARD error on some of the SRs. But I can also successfully run it on other SRs. Keeping in mind they are the same physical hardware and connection. So the issue cannot be related to some sort of hardware incompatibility, or it would never work on any of the SRs. I also do not use thin provisioning on my SRs. So far, no matter what I have tried, I cannot reclaim the lost space. I am working on migrating VMs off the SR. But because of the size of some of my VDIs, it is not an easy process. Is there any word on a long term solution for this?

You might want to try this. In my case this freed up a ton of space. I still plan on retiring XenServer hopefully by year's end. It's too unreliable, too many known issues, and the removed features in new versions is not sustainable. Proceed at your own risk.

Remove Orphaned VDIs (Virtual Disk Image)

If a live-migration fails, XenServer will often leave behind the VDI image of the VM and it not show up under storage. First list the VMs with 'xe vdi-list'.

xe vdi-list

...
uuid ( RO)                : e629cbe0-32c9-473e-80cf-9a6426d1111e <<<This is the UUID of the orphaned VM.
          name-label ( RW): AA-REMOTE-01 0
    name-description ( RW): Created by template provisioner
             sr-uuid ( RO): 5e3bb883-fe1b-e4d0-401c-4006fe1efeb2 <<<NOT THIS!! THIS IS WHERE THE VM IS STORED.
        virtual-size ( RO): 75161927680
            sharable ( RO): false
           read-only ( RO): false


uuid ( RO)                : 19604cdd-4add-4dce-a671-02bd04ac48c2
          name-label ( RW): BB-REMOTE-01
    name-description ( RW): Created by XenCenter Disk Image Import <<<THIS IS ANOTHER GOOD INDICATOR OF A FAILED XEN EXPORT TAKING UP SPACE.
             sr-uuid ( RO): 5e3bb883-fe1b-e4d0-401c-4006fe1efeb2
        virtual-size ( RO): 84351647744
            sharable ( RO): false
           read-only ( RO): false
...

This will list a lot of VMs, and in this case I see the VM 'AA-REMOTE-01 0' is listed as on the system, but I know I am not actively running it on this server. You can also run 'lvscan' to show a list of all logical volumes and show if ACTIVE or inactive:

lvscan

...
inactive '/dev/VG_XenStorage-5e3bb883-fe1b-e4d0-401c-4006fe1efeb2/VHD-e629cbe0-32c9-473e-80cf-9a6426d1111e' [70.14 GiB] inherit
...

It will show a longer list, but I can see the UUID matches with the orphaned VM I want to remove, and it is taking up 70gb. Next we delete (destroy) the VDI with 'xe vdi-destroy uuid=UUIDofSystemToDelete'

xe vdi-destroy uuid=e629cbe0-32c9-473e-80cf-9a6426d1111e

You can run another 'xe vdi-list' to see that the orphaned VDI is gone and also check the storage repository to see if space has been freed.

Jason Rasmussen · November 7, 2018

4 hours ago, Chandrika Srinivasan said:

Hi Jason and Daniel,

As Mark has pointed out above, snapshots not getting coalesced and trim not working are two different issues. For "space reclamation" to work, the VDI should've been deleted from XenServer (shouldn't be visible with vdi-list) and the underlying storage should be thinly-provisioned.

Call it what you will, there are multiple major known repeatable storage issues with XenServer 7.x going on for years affecting many people that refuse to be addressed. Using my method above I went from 98% storage used to 28% by removing everything manually that XenServer didn't, but should have automatically. Citrix fixing the issue would be more productive than Citrix fighting users experiencing the issue over semantics.

Chandrika Srinivasan · November 7, 2018

Hi Jason and Daniel,

As Mark has pointed out above, snapshots not getting coalesced and trim not working are two different issues. For "space reclamation" to work, the VDI should've been deleted from XenServer (shouldn't be visible with vdi-list) and the underlying storage should be thinly-provisioned.

I am not saying that snapshots not getting coalesced on your system is not an issue but space reclamation will not fix it and would've never fixed it (even in 6.5). One change after 6.5 is that snapshot coalescing is slightly delayed and doesn't take space immediately after snapshot deletion. If you notice that your snapshots are still not getting coalesced, let's open a bug for that and we will definitely look into it.

Thanks,

Chandrika

Jason Rasmussen · November 6, 2018

4 minutes ago, Daniel Andrade said:

My definitive test:

I installed XenServer 7.4 on a Dell R730 server with RAID 6 SAS disks.
I created a virtual machine of 500GB of disk and I created a snapshot, I removed the snapshot and the space was not recovered. I had an error while performing the space recovery procedure.

I formatted, installed XenServer 6.5, recreated a virtual machine with 500GB of disk, created the snapshot, removed the snapshot ... I clicked the button to recover the space and the space was recovered. If the XenServer does not display the Trim error because it has no processing to display it does not matter to me, what matters is that it can rather recover the removed snapshot space, while with version 7.X I have the error and I have no recovery of space.

Yup. There are known serious errors with Xen 7.x which remain unfixed even after literally years of reporting it. I submitted a bug on this, Xen says it wasn't a bug and closed it. Yeah, it is a repeatable bug as you have proven. See my manual workaround above to remove abandoned snapshots & failed VM migrations. In a few months I should be off XenServer.

Daniel Andrade · November 9, 2018

On 07/11/2018 at 2:32 PM, Mark Syms said:

This entire thread is getting out of control and isn't helping anyone.

The fact that "Reclaim space" reports an unsupported ioctl error has no bearing on whether or not the space associated with deleted snapshots will be freed and that just muddies the waters. We need to make this function more accurately report things and not produce errors on hardware that doesn't support the requested function.

In the specific case of XSO-824, this had nothing to do with unfreed deleted snapshots or reclaim space and was entirely down to failed migrations leaving orphaned disks behind. This is very definitely an issue and one that is complicated to fix, especially if the migration failed because of a problem communicating with the storage. If that happens any cleanup can't occur as the storage just disappeared.

Daniel - if you want to email me the logs from your XS 7.x server that you think isn't freeing the space up please DM me and I'll give you my email address.

Thanks,

Mark

Hypervisor Storage Engineering

I send DM

Thank you

Jason Rasmussen · April 9, 2019

I have moved all my XenServer VMs to VMWare VSAN. Migrating from XenServer was a pain due to non-standard 'open' format XenCenter exports, and mainly the problematic XenTools that embed themselves in Windows as they don't uninstall cleanly and conflict in other VM environments, but it can be done (reply if you want a how-to). Things just work in VMware and are much more reliable. I have retired all my XenServers. This is the solution.

Alan Lantz · April 9, 2019

Sorry to see you move to another platform, but I understand completely you need a solution that works with your

existing hardware.

--Alan--

Jason Rasmussen · April 9, 2019

No, I need a solution that actually works, not a behind the curve hypervisor riddled with unresolved known issues that are blamed on users such as this issue.

Daniel Andrade · April 9, 2019

1 hour ago, Jason Rasmussen said:

I have moved all my XenServer VMs to VMWare VSAN. Migrating from XenServer was a pain due to non-standard 'open' format XenCenter exports, and mainly the problematic XenTools that embed themselves in Windows as they don't uninstall cleanly and conflict in other VM environments, but it can be done (reply if you want a how-to). Things just work in VMware and are much more reliable. I have retired all my XenServers. This is the solution.

What is the license value of this solution that you started using?

I use Dell R720 and Dell R730 with 256GB of Ram and 20TB of HDD SAS with 2 Intel Xeon processors with total of 20 physical cores and 40 in HT.

Do you have any idea of a free solution to replace XenServer? If not, can you tell me how much I would spend with VMWare?

Jason Rasmussen · April 9, 2019

Sometimes, you get what you pay for. Goodbye XenServer. You will not be missed.

Mark Syms · November 7, 2018

This entire thread is getting out of control and isn't helping anyone.

The fact that "Reclaim space" reports an unsupported ioctl error has no bearing on whether or not the space associated with deleted snapshots will be freed and that just muddies the waters. We need to make this function more accurately report things and not produce errors on hardware that doesn't support the requested function.

In the specific case of XSO-824, this had nothing to do with unfreed deleted snapshots or reclaim space and was entirely down to failed migrations leaving orphaned disks behind. This is very definitely an issue and one that is complicated to fix, especially if the migration failed because of a problem communicating with the storage. If that happens any cleanup can't occur as the storage just disappeared.

Daniel - if you want to email me the logs from your XS 7.x server that you think isn't freeing the space up please DM me and I'll give you my email address.

Thanks,

Mark

Hypervisor Storage Engineering

Mark Syms · November 5, 2018

On 04/11/2018 at 0:46 AM, Brian Rummel said:

I too am having this issue with 7.1 CU1. I only upgraded to 7.1 because I had too, as 6.5 is no longer supported.

I want to add, in my situation, I have a storage EMC storage array that I am connecting to via iSCSI. Before upgrading to 7.1, when I would migrate or move VMs, the coalesce process would eventually reclaim space from the migrate process. Now it does not. I have my storage array set up with multiple LUNS and attach them to my pool as separate SR's. So even though they show as different SRs, they are on the same array using the some connection and hardware etc. I have tried the rescan SR to no avail. I have also tried the reclaim freed space and get the BLKDISCARD error on some of the SRs. But I can also successfully run it on other SRs. Keeping in mind they are the same physical hardware and connection. So the issue cannot be related to some sort of hardware incompatibility, or it would never work on any of the SRs. I also do not use thin provisioning on my SRs. So far, no matter what I have tried, I cannot reclaim the lost space. I am working on migrating VMs off the SR. But because of the size of some of my VDIs, it is not an easy process. Is there any word on a long term solution for this?

You are confusing two different things.

The reclaim space button on the LVM based SRs sends a BLKDISCARD request to the SAN for all the storage that XenServer has previously but is no longer using (i.e. the deleted space) and this allows a thinly provisioned SAN to release that allocation and use it for a different client. If you are getting an error issuing this request it means that either your LUN is fully provisioned or that your SAN does not support BLKDISCARD, XenServer 6.5 would silently swallow the error in these situations whereas XenServer 7.x will report it.

The freeing of space when VMs and snapshots are deleted is automatic and unrelated to the reclaim space button. There are a few reasons why this might not occur however. If you have deleted snapshots and the space has not been freed it may be that you have insufficient space to allow for the data to be collapsed. The system temporarily needs more space as data has to be copied from one part of the delta tree to its parent until the parent becomes a complete superset of the child node and then the child can be deleted. As an example, the following diagram show how the data is stored for a VM with three snapshots

      A
     / \
    B   S1
   / \
  C   S2
 / \
D   S3

D is the current node into which the VM is writing, A is the original base node that was created when the VM was first created. When snapshot1 was taken, A was made read-only and nodes B and S1 (snapshot1) were created sharing a common parent, the VM then wrote to B and the process was repeated as snapshots 2 & 3 were created until we get to D. If we now delete S2, then the separation between Nodes B and C is not required any longer as both B & C are hidden read-only nodes and cannot be copied or cloned. But, C might very well contain some of the same data blocks as B which means they are unnecessarily being stored in two places, so to resolve this all the data in C is copied to B which might or might not overwrite data already in B. Any data that isn't overwriting in B means that in the short term more space is required as those data blocks needs to be allocated in B and then written to. Once B contains all the data in C, D & S3 are updated to have B has their parent and C is deleted.

If you are really tight on space then it is possible, even likely, that the garbage collection process will be unable to actually resolve this. You will see errors in /var/log/SMlog to this effect where it will flag candidate nodes and report that there is insufficient free space.

Brian Rummel · November 4, 2018

I too am having this issue with 7.1 CU1. I only upgraded to 7.1 because I had too, as 6.5 is no longer supported.

I want to add, in my situation, I have a storage EMC storage array that I am connecting to via iSCSI. Before upgrading to 7.1, when I would migrate or move VMs, the coalesce process would eventually reclaim space from the migrate process. Now it does not. I have my storage array set up with multiple LUNS and attach them to my pool as separate SR's. So even though they show as different SRs, they are on the same array using the some connection and hardware etc. I have tried the rescan SR to no avail. I have also tried the reclaim freed space and get the BLKDISCARD error on some of the SRs. But I can also successfully run it on other SRs. Keeping in mind they are the same physical hardware and connection. So the issue cannot be related to some sort of hardware incompatibility, or it would never work on any of the SRs. I also do not use thin provisioning on my SRs. So far, no matter what I have tried, I cannot reclaim the lost space. I am working on migrating VMs off the SR. But because of the size of some of my VDIs, it is not an easy process. Is there any word on a long term solution for this?

Mark Syms · January 25, 2018

14 hours ago, Daniel Andrade said:

XenServer 6.5 does not display an error because it successfully reclaims space. In XenServer 7.X the error occurs and the space is not recovered.

I bought a new HP server and was able to see the error in version 7.2, formatted and installed version 6.5 and the recovery of space works. The server using local repository on SAS HDD.

No, recovery of space give no error, not the same thing at all. blkdiscard has absolutely no meaning for a local SAS drive. The "reclaim space" operation in XenServer is to allow thinly provisioned remote network storage to deprovisioned no longer used space back to the SAN free pool it is not going to do anything for you on a local SAS drive and arguably should be an option in XenCenter in this case.

Daniel Andrade · January 24, 2018

On 11/01/2018 at 9:41 AM, Mark Syms said:

The difference between XenServer 6.5 and XenServer 7.0 is that 6.5 would not report if it got errors from running TRIM on storage and 7.0 does do so. This is quite clearly the remote storage reporting that it does not support blkdiscard.

XenServer 6.5 does not display an error because it successfully reclaims space. In XenServer 7.X the error occurs and the space is not recovered.

I bought a new HP server and was able to see the error in version 7.2, formatted and installed version 6.5 and the recovery of space works. The server using local repository on SAS HDD.

Jason Rasmussen · January 11, 2018

Hi Mark,

But XenServer 6.5 did clear free disk space. XenServer 7.x does not, and for many XenServer 7 users here our array remains full and still need a resolution to this issue.

Thanks

Mark Syms · January 11, 2018

The difference between XenServer 6.5 and XenServer 7.0 is that 6.5 would not report if it got errors from running TRIM on storage and 7.0 does do so. This is quite clearly the remote storage reporting that it does not support blkdiscard.

Jason Rasmussen · January 10, 2018

Hi Tobias,

No change. Uploaded my logs per request to the bug form, and others may want to upload their logs for investigation into this serious issue as well. https://bugs.xenserver.org/browse/XSO-824

Thanks

Tobias Kreidl · January 10, 2018

Hi, Jason:
That message doesn't sound encouraging, alas.

-=Tobias

Jason Rasmussen · January 9, 2018

Hi Tobias,

I have performed a xe sr-scan uuid=5e3bb883-fe1b-e4d0-401c-4006fe1efeb2

So far no change with local storage at 91%, and clicking Reclaim freed space still errors out with same error:

Reclaiming freed space on SR 'Local storage'
blkdiscard: /dev/VG_XenStorage-5e3bb883-fe1b-e4d0-401c-4006fe1efeb2/5e3bb883-fe1b-e4d0-401c-4006fe1efeb2_trim_lv: BLKDISCARD ioctl failed: Operation not supportedmyserver-06.domain.comJan 9, 2018 3:50 PMDismiss

I can report back tomorrow.

Tobias Kreidl · January 9, 2018

I take it doing a rescan of the SR has no effect? Note it may take as long as 24 hours.

Jason Rasmussen · January 9, 2018

Hi Megaprovedor,

Thank you for confirming this serious error in Xen 7.X. Citrix needs to resolve this immediately as our drive will fill up performing basic tasks.

Daniel Andrade · January 9, 2018

1 hour ago, Jason Rasmussen said:

Hello,

The submitted bug remains unresolved. I cannot migrate any VMs to/from this server, and am concerned that upgrading my functional 6.5 to 7.2 will break the SSD reclaim free space.

Thank you

I have 2 more servers upgraded to version 7.3 and I still have the problem of reclaiming space.

A Dell R720 server controller H710 with 6 SAS 4TB drives in Raid 6 and another is a HP ProLiant DL380p Gen8 P420i controller with 8 SATA 4TB disks. Both updated to version 7.3 and exhibit the same error.

Other servers that have Dell R720 with SAS disks in version 6.5 do not display the error message and can reclaim the space.

Tobias Kreidl · January 9, 2018

Different checking parameters used in the newer release (just a theory)?

Reclaim free space broken in 7.0

Question

Link to comment

56 answers to this question

Recommended Posts

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Archived