RSS .92| RSS 2.0| ATOM 0.3
  • Home
  •  

    Storage vMotion and RDM

    August 16th, 2011

    I just conducted a Storage vMotion of a virutal machine that had multiple very small RDM.  To my surprise each of the RDM became a thick-provisioned vmdk.  VMware KB 1005241 discusses this situation in particular, however I am left scratching my head because I do believe the RDM was configured for physical mode, which goes against the KB article.  VMware Communities has a nice little discussion as well.


    VAAI and supportability

    June 28th, 2011

    VAAI, or vStorage API for Array Integration, is an industry answer to offload specific storage requests from the ESX(i) servers to the storage array.

    VAAI implements only three functions, as described in VMware KB1021976:

    • Atomic Test & Set
    • Clone Blocks/Full Copy/XCOPY
    • Zero Blocks/Write Same

    As it turns out, only certain arrays support VAAI.  Since VAAI is on by default, at least in 4.1, the hosts will send VAAI commands to the array.  The array will usually say they are not supported, and then ESX will fall back to regular SCSI commands.  However, there seems to be occasions when SAN devices do not support VAAI, but can apparently cause failure states.  I’ve seen it.  It’s true.

    Now, how do we determine if the SAN handles VAAI?  SSH to your servers and run the following command, via KB102197:

    esxcfg-scsidevs -l | egrep “Display Name:|VAAI Status:”

    If the array does not support VAAI, you will most likely see results such as:

    Display Name: <Vendor> Fibre Channel Disk (naa.600*)
    VAAI Status: unknown

    VMware KB1033665 tells us how to disable VAAI both from the ESX(i) command line, as well as the vSphere Remote CLI

    ESX(i) command line:

    esxcfg-advcfg -s 0 /DataMover/HardwareAcceleratedMove

    esxcfg-advcfg -s 0 /DataMover/HardwareAcceleratedInit

    esxcfg-advcfg -s 0 /VMFS3/HardwareAcceleratedLocking

    -s 0 will disable, -s 1 will enable. You can also substitute -s 0 with -g to see the current setting.

    Remote CLI:

    vicfg-advcfg.pl –server <servername> –username <possibly root> –password <password> -s 0 /DataMover/HardwareAcceleratedMove

    vicfg-advcfg.pl –server <servername> –username <possibly root> –password <password> -s 0 /DataMover/HardwareAcceleratedInit

    vicfg-advcfg.pl –server <servername> –username <possibly root> –password <password> -s 0 /DataMover/VMFS3/HardwareAcceleratedLocking

    where items in < > are dependent on your specific configuration.  You can again substitute -s 0 with -g to see the current setting.

     


    Emulex LPe1200x card and A3 firmware nightmare

    June 20th, 2011

    If you have Emulex LPe1200x HBA cards in your servers, and have firmware 2.0.0A3, do yourself a favor and update the firmware to at least 2.0.0A4 as soon as you can.  It’ll save you time and trouble in the event that your SAN fabric is unstable and paths drop (and hopefully you have more than one path to the fabric).  Grab the update from here the Emulex Support Page.


    How available is VMware’s Round Robin Path Selection Plugin?

    June 10th, 2011

    VMware wisely introduced Round Robin (RR) as a supported path selection plugin (PSP) as part of their Native Multipathing (NMP) suite.  We originally dealt with Fixed Path (FP) and Most Recently Used (MRU), which led to administrators to directly manage I/O if their servers had multiple host bus adapters (HBA) and/or storage targets.  I’m sure most admins stuck with fixed path, which provided for an active/passive configuration, and went on with their business.  I know I have.

    Along comes Round Robin to many cheers and hoorays, and we merrily go about our business and configure our hosts to use it.  We extolled the virtues to our management saying how available our connection to the SAN fabric and storage has become now that we are using one path per I/O.  I know I did.

    Here is the rub.  Round Robin is not designed to provide High Availability.  As it turns out, VMware has designed and built RR (and the two other PSP for that matter) to only mark a path dead when it receives certain SCSI sense codes.  You can find the full list on the VMware KB.  They have followed the specifications, and I completely understand why they made their choices.

    But what happens when we are having issues with the SAN fabric and do NOT see those specifics sense codes?  If you guessed nothing, you would be very correct.  The implementation of RR will only move to the next path upon a SUCCESSFUL (sense code 0x0 or 0x00) I/O.  If we receive “soft” errors such as command: 0x2 (Host_Bus_Busy) or 0x28 (Task Set Full), RR will not move to the next path, due to the fact it has not received a 0x0 code.  VMware’s definition of SCSI error conditions can be found at VMware KB #1030381.  This means we will be STUCK on a path that is having problems, which results in failed I/Os.  You may notice the ESX(i) server disconnect from vCenter, the ability to run df from the command line diminish, and no way to enumerate the storage.  You will also notice that virtual machines stop responding to pings, and applications failures there-in.  This is a result of failed I/O’s in the virtual machine.  SCSI errors will be clearly visible within the logs of guest OS.

    You can find the error messages on ESX(i) in the system log from the command line:

    • do: cd /var/log
    • do: grep naa messages

    Or look through the old logs:

    • do: zcat messages*.gz |grep naa

    Due to ESXi’s very quick log rotation, you may be out of luck by the time you respond to an event.  You should take the time and export syslog from your ESXi hosts to a central server such as the vMA appliance.  If you need help setting up syslog, see Kanuj Behl’s post blog post on vmwise.com.


    Thoughts on Facebook’s ability to bring on new datacenters

    May 17th, 2011

    GigaOM has a very interesting article detailing how Facebook spun up a virtual data center inside of one of their production data centers in the Virginia (I assume Northern) area.  The direct link to the engineering blog can be found here.

    I found the content regarding their provisioning system named Kobold the most interesting.  Their ability to provision (tens of) thousands of servers in a 30 day window is impressive.  When I worked in RSA Security‘s SaaS group, we had a few different infrastructure silos that made up our hosting solution.  We created, for the most part, a build system that allowed us to get servers available in a matter of minutes after basic infrastructure was configured (power, cabling and kickstart VLAN).  While we were able to stand up infrastructure on the quicker side of things, we may have been able to scale to a few hundred (if it ever came to that) in 30 days, but nowhere near thousands or tens of thousands.  We just weren’t able to get the full boot-strap of the app layer in play.

    The promise of cloud to allow for quick scaling is based on the premise of boot-strapping.  Each machine needs to have logic to check in and get marching orders.  Without that, you are stuck with a whole lot of machines, and a whole lot of work to do to get them in to production.

    Two other interesting sites about data center boot-strapping:

    Twitter created Murder to conduct code deployments via BitTorrent.

    SmugMug created SkyNet to scale and boot-strap their infrastructure.


    vSphere Round Robin MultiPathing

    March 29th, 2011

    There are a number of blog posts describing the configuration of Round Robin (RR) multipathing on vSphere.  *Note: Content on this page has been distilled from the sources referenced below, as well as my colleague vmwise.com.  Check those sites for a deeper dive in to the content.  I’ve also removed some identifiers from the output.

    http://www.boche.net/blog/index.php/2010/02/04/configure-vmware-esxi-round-robin-on-emc-storage/

    http://www.yellow-bricks.com/2009/03/19/pluggable-storage-architecture-exploring-the-next-version-of-esxvcenter/

    http://www.ivobeerens.nl/?p=465

    The three commands that are your friends throughout this post:

    esxcli nmp satp list <- Storage Array Type Plugin (SATP)

    esxcli nmp psp list <- Path Selection Plugin (PSP)

    esxcli nmp device list <- List the LUNs from the SAN represented as their device names

    1) SSH in to the server (assuming you enabled remote tech support from the console).

    2) Display the current pathing configuration:

    esxcli nmp device list

    naa.60
    Device Display Name: Fibre Channel Disk (naa.60)
    Storage Array Type: VMW_SATP_DEFAULT_AA
    Storage Array Type Device Config: SATP VMW_SATP_DEFAULT_AA does not support device configuration.
    Path Selection Policy: VMW_PSP_FIXED
    Path Selection Policy Device Config: {preferred=vmhba:C:T:L;current=vmhba:C:T:L}

    3.1) If you have storage from NetApp, do(note, there are two dashes before “psp” and “satp”):

    esxcli nmp satp setdefaultpsp –psp VMW_PSP_RR –satp VMW_SATP_DEFAULT_AA

    3.2) If you have certain storage from an EMC DMX, do:

    esxcli nmp satp setdefaultpsp –psp VMW_PSP_RR –satp VMW_SATP_SYMM

    These commands will change the default pathing to round robin (PSP or Path Selection Plugin) for the specific SATP (Storage Array Type Plugin).

    3.3) At this point, you can reboot the  if LUNs were already presented.  If no SAN storage is attached, scan in the new devices, and they will be automagically set to round robin.  Or, run the following command to set the Path Selection Policy:

    for i in `ls /vmfs/devices/disks/ | grep naa.60` ; do esxcli nmp device setpolicy –device $i -P VMW_PSP_RR ; done

    4) Check the current config, post reboot:

    esxcli nmp device list

    naa.60
    Device Display Name: Fibre Channel Disk (naa.60)
    Storage Array Type: VMW_SATP_DEFAULT_AA
    Storage Array Type Device Config: SATP VMW_SATP_DEFAULT_AA does not support device configuration.
    Path Selection Policy: VMW_PSP_RR
    Path Selection Policy Device Config: {policy=rr,iops=1000,bytes=10485760,useANO=0;lastPathIndex=1: NumIOsPending=0,numBytesPending=0}

    Look at Path Selection Policy.  It now says WVM_PSP_RR instead of VMW_PSP_FIXED.  We are getting closer to our goal.

    5) Now we want to configure the round robin policy to send 1 IO down a path, and then round robin to the next path (note: there are two dashes before “type”).

    for i in `ls /vmfs/devices/disks/ | grep naa.60` ; do echo $i ; esxcli nmp roundrobin setconfig –type “iops” –iops=1 –device $i ;done

    This command will look in the /vmfs/devices/disks/ directory, grab anything that starts with naa.60 (which should pick up SAN storage), and then set the round robin policy to 1 IO per path.

    6) Verify the new configuration:

    esxcli nmp device list

    naa.60
    Device Display Name: Fibre Channel Disk (naa.60)
    Storage Array Type: VMW_SATP_DEFAULT_AA
    Storage Array Type Device Config: SATP VMW_SATP_DEFAULT_AA does not support device configuration.
    Path Selection Policy: VMW_PSP_RR
    Path Selection Policy Device Config: {policy=iops,iops=1,bytes=10485760,useANO=0;lastPathIndex=5: NumIOsPending=0,numBytesPending=0}

    Validating our output, we now have our policy=iops, and iops=1.


    ThinApp Howto

    March 19th, 2011

    I attended a brain dump session by Travis Sales (@thinappguru on twitter), one of the guys that built the original Thinstall prior the VMware’s purchase and re-branding.

    I decided to put together a step by step Howto ThinApp a program like Firefox.  The setup:

    Windows 7 64b, VMware Workstation 7.1.3, Windows XP SP2 guest VM and ThinApp Enterprise 4.6.1.  The VM is configured for Host Only networking.  A share has been configured on Windows 7 to hold content that will be used during the ThinApp process (*).

    Per Travis’ suggestion to “Know Thy App,” I have gone with XP SP2 with the following packages installed:

    • VMware Tools
    • Windows Installer 3.1
    • SP2

    I took a snapshot (#1) of the Virtual Machine for rollback.  I then installed ThinApp Enterprise, verified it worked, and took another snapshot (#2).  This will be my “gold” image.

    We are now ready to conduct a ThinApp Capture.   First fire up ThinApp -> Start -> Programs -> VMware ->ThinApp Setup Capture.

    You will see the Welcome Screen, hit Next.

    Select Prescan, in most cases.

    Once the scan is complete, we can now install the application, in our case it will be FireFox.

    Install FireFox as usual, and then verify it works.  At that point, click Postscan in the ThinApp window.

    I selected Mozilla Firefox.exe as the only entry point for our ThinApp.  In short, Entry Points are the Windows executables that allow the launch of the ThinApp.  For a more detailed description, check out this ThinApp team blog entry.

    Select the user groups that have permissions to run this ThinApp.  If this machine was connected to a Windows domain, AD groups can be selected here.  ThinApp permissions could then be managed via AD.  Very cool!

    I told our ThinApp to run in WriteCopy mode for security purposes.

    Place the sandbox on a windows share as described with (*) above.  We do this to allow for rollback of the VM to test the FireFox Thinapp, and keep our data intact.

    On the next few screens, select “No, Do not send info to VMware,” and Next on the plugin section.

    Change the Inventory Name from Mozilla Firefox 3.y.z to Mozilla Firefox 3.  This way we can easily upgrade .y.z versions, and have seperate trees for x. versions.

    Create the Package Settings with both the EXE entry point we selected above, as well as a MSI file.

    If you want to poke around on the build screen, go ahead.  I hit Build.

    The build is complete!

    Copy the Captures directory back to your file share.  The EXE and MSI will be found in the bin directory.

    Roll back your VM, re-mount the file share, and test the EXE.  Congrats, you just built your first ThinApp!


    VCAP-DCD Round-Up

    January 12th, 2011

    I sat for the VMware Certified Advanced Processional 4 – Datacenter Design (VCAP4-DCD) recently.  This was my first crack at a “design” test, and didn’t really know what to expect other than few public documents VMware has distributed.  So I put together this round-up.

    I do suggest sitting for the vSphere Design Workshop  http://mylearn.vmware.com/mgrreg/courses.cfm?ui=www&a=one&id_subject=13754.  It is a good review for the test.  My review can be found here: http://philthevirtualizer.com/2010/07/12/vmware-vsphere-design-workshop/

    Review the VCAP-DCD blueprint: http://mylearn.vmware.com/register.cfm?course=76644

    Watch the Exam UI demo: http://mylearn.vmware.com/courseware/82525/VCAPDCD_Tutorial.swf

    Look at questions by fellow VCAP candidates on the VCAP communities site: http://communities.vmware.com/community/vmtn/certedu/certification/vcap

    Another round-up page:

    http://www.seancrookston.com/2011/01/12/vmware-vsphere-design-workshop/

    And be patient.  You have 4 hours to work through the exam.  Best of luck!

    Update:  I have been assigned VCAPDCD-120!


    The Great Road Trip to the Cloud

    December 22nd, 2010

    Cloud computing is one of the new buzz words of the tech industry.  Everyone is jumping on the bandwagon.  The adoption of virtualization in the Enterprise has led to the rise of Cloud.  Cloud has even gone mainstream with Microsoft’s “To the Cloud” add campaign.

    I became interested in Cloud when I worked at a SaaS company.  At the time we had to graft three different environments together due to acquisition.  I started to think of a better way to standardize on an application server, Operating System and platform.  In effect we were dealing with a 3x3x3x3x3x3x3 syndrome.  We had 3 different web servers, 3 application servers, 3 operating systems, 3 database platforms, 3 SAN’s, 3 networks and 3 sets of hardware.  It was painful.

    I stumbled upon the blog of Don MacAskill from a service called SmugMug (http://www.smugmug.com)  He wrote about his version of SkyNet to elastically extend his environment to Amazon AWS.  Needless to say it was a turning point, sort of like when I heard Led Zeppelin I for the first time.

    A Short History

    Virtualization is not new technology.  In fact, it has its roots in Mainframes.  The tech industry is a circular beast.  Central computing with dumb terminals gave way to distributed computing, client/server, and now a hybrid where data can be found on multiple hubs, and a combination of smart and dumb spokes.  The industry also realized that running a data center is not an easy task.  Running multiple data centers incurs huge expense.  Thus, the rise of co-location.  Business realized it could be a cheaper proposition to pay someone else to do some of the dirty work (space, power, cooling, physically security), all the way up to a managed service.

    Business then realized it was still booking Capital Expense (CapEx) and Operational Expense (OpEx) in the dealing with co-lo.  Servers are not being used as much as expected.  When growth hit unexpectedly, giant road blocks presented themselves in both acquiring gear fast enough, but finding space, and still staying within the original co-lo agreement.

    Virtualization nudges itself in to the equation because people realized that everything shouldn’t be focused on the application and an infrastructure that is a) expensive and b) underutilized.  If you want to focus solely on your application stack, you can now do that.  If you don’t want to go through CapEx to buy infrastructure, you can easily lease CPU time, in effect, from the cloud.

    Cloud

    So now you may ask yourself “what is cloud computing?”  Good question.  A good answer: It all depends on who you ask.

    I’ll give you my opinion on the state of Cloud.

    • Public
    • Private
    • Hybrid
    • SaaS
    • IaaS
    • PaaS
    • AaaS

    Public: The cloud is hosted by a third-party, somewhere on the Internet.

    Private: The cloud is hosted inside the firewalls of the business.

    Hybrid: A grafting of resources from Public and Private clouds, used to augment the infrastructure.  In short, if Public and Private are two circles in a venn diagram, their intersection is Hybrid.

    Saas: It could be argued that Software as a Service (SaaS) was the first of the new generation of infrastructure that begat cloud.  A person or business consumes a resource that is hosted, and possibly sold, by a third-party.  Twitter and Facebook and World of Warcraft all fall in to this category.  The SaaS provider usually built their own web, application and database servers, storage and network.  Most likely at great cost.  The environment may have been self-hosted, or in a co-lo.

    IaaS: I believe technology developed by VMware has led to Infrastructure as a Service (IaaS).  I know IBM, Sun and HP have been doing virtualization for years, but only on high-end gear.  VMware was the mainstream player that rammed it down everyone’s throats.  Turning cheap x86 based servers in to powerhouses.  Servers went from scale out, to scale up/scale out configurations.  We need bigger, but less.  Short provision cycles, and chargeback models all help to turn IaaS in to a business generator, and less a budget black hole.  Amazon AWS is probably the biggest player in Public Cloud IaaS.

    PaaS: PaaS provides an infrastructure as a bundled stack, where infrastructure is abstracted and is presented as a consumable resource.  It seems to me that VMware’s vCloud Director is going to allow business to provision the private cloud, and sell resources to its internal, and external customers.

    AaaS: I count the App as a Service to be a power-play by vendors.  They give application developers a fully abstracted platform, and expose certain pieces by API calls.  The users and developers on top of this platform do not care at all how the plumbing works, only that it does.  Google App Engine, Microsoft Azure and Salesforce are big players in this arena.  VMware and Red Hat are making in-roads with their latest purchases.

    Conclusion

    The race to the cloud includes a tipping-point for business when consuming public-cloud resources becomes more expensive than building a private-cloud.  There are always use-cases for all the current cloud types I have listed.  Industry is trying to build partnerships to allow private cloud application stacks to migrate to public, and vice-versa.  The technology is not ready as of the end of 2010, but by mid-2011 I do believe we will see the beginnings of true migration paths to create Hybrid clouds to create active-active infrastructure.

    This blog post will be a living document as things change.  Stay tuned!


    Thoughts on RHEV

    December 7th, 2010

    I recently attended a Red Hat Enterprise Virtualization (RHEV) http://www.redhat.com/virtualization/rhev/server/ session, and walked away impressed on the amount of improvement that has gone in to the RHEV management suite since I last saw a demo roughly 8 months ago.  In a recent past life I was a (RHEL) Linux zealot, but never really got the warm and fuzzies with Red Hat’s push in to the virtualization space, and their play between Xen and then KVM.

    Things have changed…

    Red Hat has gone the way of VMware with the full-blown distribution footprint, and stripped down footprint (rhev-h).  RHEV supports most of the feature sets that vSphere has supported for years now: LiveMigration (vMotion), a scheduler (DRS), and HA (HA).  RHEV’s management suite however is functionally different where they define a datacenter by the storage that will be used.  NFS seems to be a core choice, as well as iSCSI and FC.  The storage combinations were layed out as NFS+NFS, NFS+iSCSI and NFS+FC.  It would seem that planning on storage, and sticking to it is “the way” with RHEV.  I would suspect Red Hat will move to a more open approach to storage.

    Red Hat also provides PowerShell cmdlets to interact with RHEV-m.  I’m a huge fan of PowerShell, and am glad that more vendors are allowing customers to extend the product scriptomagically.

    Look for more posts as I begin to play with RHEV!