Wednesday, March 11, 2009

Roadblocks encountered while configuring VMWare ESX Server, Connecting it to a SAN and configuring HA

This is the first time I configured ESX Server to connect to a SAN and it's also the first time I managed to get HA and DRS working so I figured i'd add the notes for what I experienced before it got working.

First of all we have 2 DL385 G5 servers w/ FC connection to an MSA 2012FC SAN via 2 SAN Fiber switches. We configured everything so there's load balancing on the fiber connections so there's two fiber ports on each server connecting to the Switches (1 to each) then the SAN also has dual ports for each controller which are both connected to each Switch.

The main topics i'm going to talk about include the following.

1. Getting the servers to talk to the SAN. a. pointing to the SAN inside the VMWare software
b. resolving a path error after pointing to the SAN. c. mention multipathing is something I haven't addressed and still need to look at. A. first of all I used this reference tool heavily while trying to figure out how the SAN to get configured with Virtual Disks, Volumes and Luns so I could add the storage in VMWare. (you need to point to external storage if you are going to use the high availability (HA), DRS, Vmotion etc. )B. The problem I encountered is that I received multiple errors inside VMWare after pointing to the storage. The exact error was SCSI: 4506: Cannot find a path to device Vmhba:0:1:2 in a good state. Trying path vmhba0:1:2 (the SAN ID was here). The other error that was related is the I/O error every time i'd try to browse the datastore I would try to upload a file to the store from my computer and I got the I/O error. I looked everywhere and couldn't find an answer that directly solved my problem however I found through a lot of troubleshooting and going through a process of elimination that the issue was caused by the Host port failures happening on the SAN. These were the errors I saw in the event log on the SAN. You'll see below it was going up and then down and I could see the errors occurring while looking at the host port status on the SAN because it goes from green to red on random ports.

03-10 14:45:02 111A14416 Host link up Chan1: 2 Loop IDs, Fabric
03-10 14:44:59 112A14415 Host link down Chan1
03-10 14:44:47 111B14430 Host link up Chan0: 2 Loop IDs, Fabric 03-10 14:44:47 111A14414 Host link up Chan1: 1 Loop ID
03-10 14:44:47 111A14413 Host link up Chan0: 2 Loop IDs, Fabric

The reason this was happening was simple and due to configuration error on our part. We knew we had to do this however I assumed the engineers that plugged everything in had already done it so I didn't think to look for it but the way to resolve it was by disabling interconnect.

This information the manual that I linked to in this article is what resolved the issue. pages 41-42 Configuring FC Host Port Interconnects
"For a dual-controller FC system in a switch attach configuration, host port interconnects are always disabled.""3. Set Internal Host Port Interconnect to Interconnected (enabled) or Straight-through(disabled).The default is Straight-through.This setting affects all host ports on both controllers."
After making this small change the SAN storage started working incredibly well.

2. Setting up HA and DRS.
a. enable root user to login via SSH. (this is required unless you want to go to the server physically to configure what's in step (b).
b. resolving
Go to the service console on the physical server & login
  • vi /etc/ssh/sshd_config
  • Change the line that says PermitRootLogin from “no” to “yes”
  • do service sshd restart

b. I configured the cluster and added the hosts however I received this error.
"Configuration of host IP address is inconsistent on host (IP was here): address resolved to Host misconfigured. IP address of localhost 127.0.0.1 not found on local interfaces and interfaces. "

This issue is a known issue and it's because some additional configuration of the hosts file needs to be done if you want to use HA / Clustering. Here's how to fix it.

PS... I got this from here.

"Login to your ESX hosts with your favorite secure shell program and look at your /etc/hosts file.
The file should look something like this:# Do not remove the following line, or various
# programs that require network functionality
# will fail.
127.0.0.1 localhost.localdomain localhost
192.168.14.2 myesxserver.foo.org
See the last line with the fully qualified domain name (FQDN) of the ESX server beside the IP address of that server? What you want to do is append to that line the shortname of the host as well. What you end up with looks like this:# Do not remove the following line, or various
# programs that require network functionality
# will fail.
127.0.0.1 localhost.localdomain localhost
192.168.14.2 myesxserver.foo.org myesxserver
The typical way to do this is to insert a tab, then the name you chose for your server, up to (but not including) the first dot. You want to add every ESX host machine that is in your cluster to each other’s hosts file. Not only does this make HA much more robust, it makes DNS lookups redundant, and that’s a good thing. Ask yourself, if my DNS has an outage for just 12 seconds, do I really want all of my HA nodes going into isolation mode?
That’s it! Save your changes and exit.
Why do we need to do this? I’m not sure why it helps with VMotion, but HA needs it. HA you see was not written by the same developers as ESX. HA was developed by Legato, which is owned by EMC, as is VMware. It’s a marriage made in heaven, but the devil’s in the details!"

also...

I used this information to assist me as well. However it's not directly related to this error.

1. Run "hostname -v -f " this will show full details on the host. To correct simply run "hostname server.domain.com"
2. Run "vi /etc/hosts". Press "i" to fix whaterver may be incorrect
3. Run "service network restart" and then "service mgmt-vmware restart"
4. Re-enable HA on the clusterhost

After I did all this I managed to get HA operational and I'm currently working with multiple VM's pulling from multiple hosts.





No comments:

About Me