IBM FlashSystem V9000 and VMware vSphere ESXi Guidelines

I’m really glad I had opportunity to participate on a RedBook residency regarding IBM FlashSystem V9000 and VMware “Best Practices” (it was actually one of my last thing which I did as IBMer) and till the RedPaper is released for public I decided to write a sneak peek with some guidelines so you can get best of it even now.

For those who don’t know IBM FlashSystem V9000 it is an all-flash array by IBM and I can tell you it is most probably the best performing array out there for now. To have even better idea V9000 is basically IBM FlashSystem 900 with two IBM Storage Volume Controller (SVC) nodes packaged and sold together ;). And if it still hasn’t ringed a bell FlashSystem 900 is an absolute beast in all-flash array field regarding performance, however it has one small issue – it is kinda “dumb”.

Therefore FlashSystem V9000 is coming with additional out of box features like:

  • Thin Provisioning
  • Data Migration
  • EasyTier
  • FlashCopy
  • Real-Time Compression (RTC)
  • Remote Mirroring

Additionally you can buy those features for your existing external storage 😉

Some of you may wonder where deduplication is. Unfortunately it is not there, as nothing is perfect, you have to be satisfied with better performance comparing to those vendors who offer it.

IBM FlashSystem V9000 General design guidelines for performance

  • Use one mdisk group per flash storage enclosure
  • For optimum performance use 4 (redundant) paths to your LUN
  • Use one host object per host defined in storage. Use more only if you need to reduce the number of paths – you have more than 2 HBA ports in your server
  • To get best Real-Time Compression performance use at least 8 compressed volumes (LUNs) per V9000. Regardless what sales people tell you, it is not good thing from performance point of view to create one big volume (and not even talking from VMware point of view). There are 8 threads dedicated for RTC and one volume can be handled by 1 thread only.
  • Use Round-Robin as multipathing policy

I definitely recommend you to check out our paper once it will be published if you want to know more.

VMware specific

ESXi is obviously coming with some preconfigured defaults which work great most of the time for standard environments. I’m not a huge fan of changing defaults, but it is needed sometimes if you want to get the best of it.

Consistent LUN numbering

Although, I think since ESXi 5.0, it is not required to have same number for LUN shared across whole ESXi cluster it is still recommended to keep it consistent. It is required if you are using RDM and MSCS clustering.

Round-Robin

If you are for whatever reason using ESXi version prior 5.5 you would have to change it manually.

Round-Robin path switching

By default ESXi is switching path after each 1000 IOPSs, which works generally fine in big environments with lots of LUNs and VMs. However for some workloads especially when you are dealing with single volume you can drastically improve your storage latency and throughput by decreasing this value.

http://kb.vmware.com/kb/2069356

You can do it for all volumes presented from V9000 (you will have to reboot ESXi to have it applied to already present volumes) – note this will actually change it for the other IBM Storwize based systems:

esxcli storage nmp satp rule add -s "VMW_SATP_ALUA" -V "IBM" -M "2145" -P "VMW_PSP_RR" -O "iops=1"

Or per LUN:

esxcli storage nmp psp roundrobin deviceconfig set --type=iops --iops=1 --device naa.xxxx

Adapter queue depth

This is something which you should change only if you know that this is your bottleneck already as increasing this value can increase throughput, but it can have negative impact on latency.

To check your queues and their utilization:

http://kb.vmware.com/kb/1027901

to change them:

http://kb.vmware.com/kb/1267

and to understand them:

https://blogs.vmware.com/vsphere/2012/07/troubleshooting-storage-performance-in-vsphere-part-5-storage-queues.html

vStorage APIs for Array Integration (VAAI)

This is something which is enabled by default, but if you don’t have it, make sure it is enabled.

Especially atomic locking (ATS) is a must, but accelerated init and copy will not hurt you either.

http://kb.vmware.com/kb/1021976

Important: do not forget to disable ATS Heartbeat feature if you are running vSphere 5.5 U2 or later.

http://www.thevirtualist.org/alert-application-outages-using-vaai-ats-on-vsphere-5-5-update2-vsphere-6-0/

HyperSwap

Make sure you have all your VMs running on HyperSwap volume on hosts at one site only, to do this create and maintain DRS “should-run” rules for your VMs based on datastores. HyperSwap has active-active architecture dynamically switching preferred site based on IOs issued. Obviously you will be suffering performance issues when issuing IOs from both sites to a single volume at time.

Dead Space Reclamation

Unfortunately FlashSystem V9000 does not support SCSI UNMAP for a dead space reclamation when using thin provisioning however there are still ways how to do it pretty easily.

As always first step would be to zero out all dead space, which you want to reclaim.

  • To do this from the operating system you can use a tool from Microsoft called “sdelete
  • If you want to do it on VMware datastore you can just simple create and then delete a new thick eager zeroed virtual disk (vmdk) with size of the free space which you want to reclaim of course. You can do it from GUI by creating a new virtual machine, assigning new disk to existing one, or you can use vmkfstools from console.

If you are using Real-Time Compression on your volumes, then your work is done as RTC will reclaim it automatically!

In case of only thin provisioned volumes you would have to create a thin provisioned mirror of this volume and delete source volume after synchronization finishes (You have to do it on FlashSystem V9000).

 

That’s all for now, I hope it was helpful and don’t forget to share 😉


Update:  added ATS Heartbeat into VAAI section comment. Thanks to Pavol for pointing that out

The following two tabs change content below.
Dusan has over 6 years experience in Virtualization field. Currently working as Senior VMware plarform Architect at one of the biggest retail bank in Slovakia. He has background in closely related technologies including server operating systems, networking and storage. Used to be a member of VMware Center of Excellence at IBM, co-author of several Redpapers. His main scope of work consists from designing and performance optimization of business critical virtualized solutions on vSphere, including, but not limited to Oracle WebLogic, MSSQL and others. He holds several IT industry leading certifications like VCAP-DCD, VCAP-DCA, MCITP and the others. Honored with #vExpert2015 and 2016 awards by VMware for his contribution to the community. Opinions are my own!

About Dusan Tekeljak

Dusan has over 6 years experience in Virtualization field. Currently working as Senior VMware plarform Architect at one of the biggest retail bank in Slovakia. He has background in closely related technologies including server operating systems, networking and storage. Used to be a member of VMware Center of Excellence at IBM, co-author of several Redpapers. His main scope of work consists from designing and performance optimization of business critical virtualized solutions on vSphere, including, but not limited to Oracle WebLogic, MSSQL and others. He holds several IT industry leading certifications like VCAP-DCD, VCAP-DCA, MCITP and the others. Honored with #vExpert2015 and 2016 awards by VMware for his contribution to the community. Opinions are my own!
Bookmark the permalink.

3 Comments

  1. Hi Dusan,

    looking forward to V9000 + ESX Red Book, although I’m not interested into VMWARE stuff much, for sure it will be interesting. When thinking of missing reduplication, I keep smile on my face as I do not see many reasons for having that, RtC should be enough. The first competitor of RtC (as online compress engine) seems to be the new XtremIO, however the compression ratios achieved there are not amazing at this moment. As also RtC is not perfect an obviously has some issues with rotational disks, when used with Flash 840 or the new 900 it is brilliant.
    I’m interested what could be the real use case for V9000 and VMWARE. No doubt V9000 is one of the fastest arrays nowadays, however is VMWARE able to utilise it in efficient way? ESX obviously do not like big VMs with many vCPU, due it quite unusual hypervisor scheduling policy. ESX does not have true NPIV, from the guest os perspective it is still virtual SCSI stuff, with quite a big latency overhead.

    Wouldn’t be V9000 overkill for most ESX implantations in the world?

    +1 for the dead space reclamation, round robin patch switching and that RtC needs at least 8 volumes.

    Two little additional notes. Shouldn’t we still disable the AtsHeartBeat even for V9000? As there are no news for this in SVC 7.5 (and still think AtsHeartBeat is badly designed by VMWARE).
    The second goes for RDM, I believe ESX 5.5 RDM does not require consistent LUN numbering across ESX cluster, however there is no doubt it is still best practice to keep it in tidy way.

  2. Pingback: Interesting info on the IBM FlashSystem V9000 and VMware by Dusan Tekeljak | Finnzi!

Leave a Reply