Tuesday, March 31, 2009

VMware ESX storage: How to get local storage to act as a raw disk for VMs

http://itknowledgeexchange.techtarget.com/virtualization-pro/tag/vmware-esx/

I assume this would be helpful if you want to configure a cluster in VM's... not sure just found this article and figured i'd save it for use later.

Wednesday, March 25, 2009

Questions I have about VMWare / virtualization

1. Q How does P2V work?
Answer - This is a very general question that would require a full post to explain however there are explanation on VMware's site. I believe there is an acredidation you can test for and acquire to show you know the product. I don't know if you'd get this on top of getting certified or if you're certified it's redundant.
a. Q If I want to test or find compatibility what do I need to do?
Answer - Go to the software manufacturer and you should easily find compatibility information with VMWare... A lot of large and small companies that I researched had full articles explaining their compatibility and willingness to support their product running on virtual servers. This year Microsoft even gave in and broadcasted their new decision to support VMWare. http://support.microsoft.com/default.aspx/kb/957006
b. Q What is hardware mapping?
c. If I convert a server to a virtual machine is it possible to keep the original physical machine in
in tact and ready to start back up once I've finished testing the virtual version?
i. this is more of a schematics / planning question, whether it's virtual or not the issue is with
the Mac address and how it correlates or interacts w/ the rest of the network.
2. VMWare virtual center
a. Q How can I supply contingency with virtual center? Meaning... it's currently installed and
operating on one server, what happens to the cluster and everything else if the server is
down for any reason?
Answer
This is now included in VMWare VSphere 4 so I feel the information that I've posted below is now obsolete. The quick answer is VMWare's Virtual Center is very difficult to recover from in the event of a hard disk or server failure and I strongly suggest you use the newer version instead of an old version if this is important to you. It's not that you can't recover, it's that it's terribly difficult and manual vs. automated and simple with the new version. Before the new version came out I even noticed Citrix pointing out this issue as a "single point of failure" (even though you stay up and running in the event of a virtual center server going down). Sorry Citrix :).

All the information I posted below is old and only meant to show archived documentation. I wrote this disclaimer 07-21-09

Here's the most recent info and I'm pasting the information in case this person removes their article at some point. My only concern about what's noted in this paragraph is that I don't see a specification that you can move clusters. I only see you can "steal" hosts. I would prefer an option that ensures HA, DRS, VMotion is working properly and I'll have to test this to see if that works or not.

03-27-09 update - Sure enough when I try to add one of the hosts that's already managed by the other virtual center I get this message

"The host is already being managed by IP Address:
Only one VirtualCenter Server may manage this host. If you succeed in adding the host to this VirtualCenter Server, the host will lose its connection to its original management server. Are you sure you want to continue?"

This is only after creating a datacenter in the new virtual center which isn't the same "datacenter" that's arleady setup so I'm not able to move or "steal" the cluster / settings which means i'd have to manually reconfigure them. This isn't a huge deal I wouldn't think however it is a concern. I am going to see if I can actually figure out how to move the cluster and everything else from one machine to another because I need to understand the importance of an actual cluster made for redundancy or perhaps a simple VM which according to the article below that's okay. If it's a VM then I can actually move it to any other hardware i'd like and not only that the redundancy is somewhat built in because of the server contingency w/ the cluster. It just seems a little strange that I'm having to manage something from within itself.

"I get questions from customer who want to setup some kind of redundancy for Virtual Center Server. Some run Virtual Center Server right in a VM but want a physical standby in case they loose the host, others want a cluster solution. This post hopefully answers a few questions you might have about doing this.
You can install another instance of Virtual Center, no problem. If the current VC server is unavailable just add the hosts to the new VC server. You will get a prompt that they are already managed by a VC but you can click OK and “steal” them. This prevents two VCs from managing one host and causing issues with the database.
You can also setup VC in a cluster. Virtual Center Server (the windows service) can be clustered using industry standard solutions, and only 1 license is required when only one instance is active at any given time. Active / Passive clustered configurations can be installed and configured to point to the same Virtual Center database (but only one instance should be active at any given time).
Active / Passive instances of the Virtual Center Management server will also require the following configuration settings to be equivalent-
Both should point to the same database (same ODBC connection setup)
Both should be set to the same “Virtual Center Server ID” (configured through the File->VC Settings menu).
Both should use the same public/private SSL keys (contained in the “C:\Documents and Settings\All Users\Application Data\VMware\VMware\VirtualCenter\SSL” directory)
If VC WebService is enabled, both should use the same configuration file (located at “C:\DocumentsAndSettings\AllUsers\ApplicationData\VMware\VMwareVirtualCenter\VMA\vmaConfig.xml”) "


Below is the old info that I found however it proved to be terribly difficult w/ regard to detaching and attaching the database however I was using SQL Express version so it's possible this doesn't work so well in that scenario.
Answer - Contingency is only available through third party tools. The linked item is an interesting comparison or bitch session from Citrix people about this "single point of failure". Not sure which third party tools you'd use but I'm assuming windows clustering would suffice. Possibly just having Virtual center as a VM would work? I am guessing there's a way to do that and better ways or worse ways. I'll look further. In the meantime I found how you can move your Virtual Center from one server to another.

Here's how per this post

"



  • Take backup of Server A sql database
  • Stop all VMware services on Server A
  • Detach Virtual Center database on Server A
  • Stop all VMware services on Server B
  • Delete the Virtual Center database on Server B (database was empty and was created for the installation of Virtual Center on the new server.
  • Copy the database files from Server A to Server B
  • Attach the database on Server B
  • On your Virtual Center user account grant them DBO access to the newly attached database
  • Start the VMware services on Server B
  • Launch the VI Client form Server B
  • You will notice that after a few minutes the ESX hosts will show disconnected because they still think they are being managed by the old Virtual Center Server
  • Right-click and remove the ESX hosts from the cluster
  • Add the ESX hosts back to the cluster
  • Adding the ESX hosts back to the cluster does not put the VM's into any Resource Pools (Hosts and Clusters View) or Folders (VM and Templates View). Move VM's back to the correct Resource Pools and Folders
  • On each ESX host ensure that the licensing information looks correct
  • Test vMotion
  • Add templates back to inventory
  • Move SysPrep files from old Virtual Center Server to new Virtual Center Server
  • Test deploying VM from template. This did not work for me. I received the error message "The virtual center server is unable to decrypt passwords stored in the customization specification" I had to export the customizations (did this before I moved the server) edit the XML file in a text editor and search for the phrase "

Thursday, March 12, 2009

Offline antivirus tools

So we got a virus on one of the servers in our DMZ and it's causing so much traffic on our firewall that it's causing it to restart thus killing all access to any other server in the DMZ. This led me to find an offline virus scanner. I used this article to ultimately conclude these two items would fit my needs nicely.

1. Kapersky AVP tool - download here - this one you just install and use.
2. Trend Micro Sysclean - download here - this one you have to extract the most recent pattern file which can be downloaded from here. This also appears to be a good spyware scanner and there is a different pattern file for that which you can download from the same place that I linked for the virus pattern file.
3. Looks like Dr. WebCureIt is another one and a lot of people on that link liked it most. - Download here.

More roadblocks with VMWare

Trying to test migration of VM's from one host to the other I get the error;

1. Unable to migrate from to : The VMotion interface is not configured (or is misconfigured) on the source host (IP).

I've been looking around on how to configure VMotion however I haven't seen anything. I'll keep updating this blog as I find the solution.

2. I noticed on the configuration tab VMotion shows - Not Used

First attempt to resolve was following these instructions which worked for a lot of people however the post doesn't explain exactly where to configure the VMKernal network or VMKernal port and i'm not seeing anything that talks about VMotion in these settings

"Have you properly configured the VMKernel network?Is VMotion enabled on the VMKernel port?The option is somewhat buried in the config dialogs. It's in-> Host configuration / Networking-> Properties of the vSwitch that has the VMKernel port attached to it-> Select VMKernel from the port list and click Edit..."

Second attempt and solutions was to found here

It appears I had to configure the VMKernal Network Configuration which I could do by following page 30-33 on this guide. I got the information about this guide as well as further diagnostic tests that can be run to troubleshoot this error from here.

now that I appear to be able to move VM's I want to note I'm getting this warning;
The above warning didn't interfere with migrating VM's to different hosts.

Migration from to : Reverting to snapshot might generate errors (warnings) on the destination host.

PS> I forgot to mention in my previous article that I had to enable Intel VT and No Execute Memory in order to use 64 Bit Operationg System VM's.

on the HP DL380 G5
F10 - Advanced / Processor options / enable both Intel VT and No Execute

Wednesday, March 11, 2009

Roadblocks encountered while configuring VMWare ESX Server, Connecting it to a SAN and configuring HA

This is the first time I configured ESX Server to connect to a SAN and it's also the first time I managed to get HA and DRS working so I figured i'd add the notes for what I experienced before it got working.

First of all we have 2 DL385 G5 servers w/ FC connection to an MSA 2012FC SAN via 2 SAN Fiber switches. We configured everything so there's load balancing on the fiber connections so there's two fiber ports on each server connecting to the Switches (1 to each) then the SAN also has dual ports for each controller which are both connected to each Switch.

The main topics i'm going to talk about include the following.

1. Getting the servers to talk to the SAN. a. pointing to the SAN inside the VMWare software
b. resolving a path error after pointing to the SAN. c. mention multipathing is something I haven't addressed and still need to look at. A. first of all I used this reference tool heavily while trying to figure out how the SAN to get configured with Virtual Disks, Volumes and Luns so I could add the storage in VMWare. (you need to point to external storage if you are going to use the high availability (HA), DRS, Vmotion etc. )B. The problem I encountered is that I received multiple errors inside VMWare after pointing to the storage. The exact error was SCSI: 4506: Cannot find a path to device Vmhba:0:1:2 in a good state. Trying path vmhba0:1:2 (the SAN ID was here). The other error that was related is the I/O error every time i'd try to browse the datastore I would try to upload a file to the store from my computer and I got the I/O error. I looked everywhere and couldn't find an answer that directly solved my problem however I found through a lot of troubleshooting and going through a process of elimination that the issue was caused by the Host port failures happening on the SAN. These were the errors I saw in the event log on the SAN. You'll see below it was going up and then down and I could see the errors occurring while looking at the host port status on the SAN because it goes from green to red on random ports.

03-10 14:45:02 111A14416 Host link up Chan1: 2 Loop IDs, Fabric
03-10 14:44:59 112A14415 Host link down Chan1
03-10 14:44:47 111B14430 Host link up Chan0: 2 Loop IDs, Fabric 03-10 14:44:47 111A14414 Host link up Chan1: 1 Loop ID
03-10 14:44:47 111A14413 Host link up Chan0: 2 Loop IDs, Fabric

The reason this was happening was simple and due to configuration error on our part. We knew we had to do this however I assumed the engineers that plugged everything in had already done it so I didn't think to look for it but the way to resolve it was by disabling interconnect.

This information the manual that I linked to in this article is what resolved the issue. pages 41-42 Configuring FC Host Port Interconnects
"For a dual-controller FC system in a switch attach configuration, host port interconnects are always disabled.""3. Set Internal Host Port Interconnect to Interconnected (enabled) or Straight-through(disabled).The default is Straight-through.This setting affects all host ports on both controllers."
After making this small change the SAN storage started working incredibly well.

2. Setting up HA and DRS.
a. enable root user to login via SSH. (this is required unless you want to go to the server physically to configure what's in step (b).
b. resolving
Go to the service console on the physical server & login
  • vi /etc/ssh/sshd_config
  • Change the line that says PermitRootLogin from “no” to “yes”
  • do service sshd restart

b. I configured the cluster and added the hosts however I received this error.
"Configuration of host IP address is inconsistent on host (IP was here): address resolved to Host misconfigured. IP address of localhost 127.0.0.1 not found on local interfaces and interfaces. "

This issue is a known issue and it's because some additional configuration of the hosts file needs to be done if you want to use HA / Clustering. Here's how to fix it.

PS... I got this from here.

"Login to your ESX hosts with your favorite secure shell program and look at your /etc/hosts file.
The file should look something like this:# Do not remove the following line, or various
# programs that require network functionality
# will fail.
127.0.0.1 localhost.localdomain localhost
192.168.14.2 myesxserver.foo.org
See the last line with the fully qualified domain name (FQDN) of the ESX server beside the IP address of that server? What you want to do is append to that line the shortname of the host as well. What you end up with looks like this:# Do not remove the following line, or various
# programs that require network functionality
# will fail.
127.0.0.1 localhost.localdomain localhost
192.168.14.2 myesxserver.foo.org myesxserver
The typical way to do this is to insert a tab, then the name you chose for your server, up to (but not including) the first dot. You want to add every ESX host machine that is in your cluster to each other’s hosts file. Not only does this make HA much more robust, it makes DNS lookups redundant, and that’s a good thing. Ask yourself, if my DNS has an outage for just 12 seconds, do I really want all of my HA nodes going into isolation mode?
That’s it! Save your changes and exit.
Why do we need to do this? I’m not sure why it helps with VMotion, but HA needs it. HA you see was not written by the same developers as ESX. HA was developed by Legato, which is owned by EMC, as is VMware. It’s a marriage made in heaven, but the devil’s in the details!"

also...

I used this information to assist me as well. However it's not directly related to this error.

1. Run "hostname -v -f " this will show full details on the host. To correct simply run "hostname server.domain.com"
2. Run "vi /etc/hosts". Press "i" to fix whaterver may be incorrect
3. Run "service network restart" and then "service mgmt-vmware restart"
4. Re-enable HA on the clusterhost

After I did all this I managed to get HA operational and I'm currently working with multiple VM's pulling from multiple hosts.





About Me