Sunday, December 29, 2019

Horizon View - "Failed to connect to Connection Server" when accessed via LB WIP or DNS alias

Scenario
Horizon View - "Failed to connect to Connection Server" when accessed via LB WIP or DNS alias
Works fine when accessed with server FQDN



Solution
If you are facing this issue after upgrading to view 7.X, you are not alone! And this is not an issue.
It's a new security feature part of 7.X and can be disable by steps mentioned in this KB.


All you need to do is 
> Create file with the name locked.properties
> Add line "checkOrigin=false" (without quotes)
> Save and copy this to C:\Program Files\VMware\VMware View\Server\sslgateway\conf folder on all your connection servers.
> Reboot them or restart connection service on them one by one like you normally do




Thursday, December 12, 2019

VMware Horizon View 7.X desktop “Agent unreachable” status

Scenario -: 
We had a VDI user reporting issues connecting to his VDI machine.



Checking View Admin page shows this VDI machine 


First thing first

1) Checked vCenter and made sure that the VM is up and running, not down or suspended. 
2) I could remote desktop to it and checked services
3) Restarted Agent Service. No luck
4) Rebooted VM, no luck there too.

Started to look at the logs at this point

C:\ProgramData\VMware\VDM\logs

debug-2019-12-12-150326.txt

2019-12-12T15:03:35.940+10:00 DEBUG (25A0-26C4) <Thread-4> [AgentMessageSecurityHandler] Configuring message security (ON).
2019-12-12T15:03:36.033+10:00 DEBUG (25A0-26C4) <Thread-4> [BrokerUpdateUtility] Published CHANGEKEY request
2019-12-12T15:03:51.035+10:00 DEBUG (25A0-26C4) <Thread-4> [BrokerUpdateUtility] Timeout waiting for success response

So looks like it was trying to change the Key, but wasn't successful. So I decided to push it from
Connection server instead 

1) Login to one of out View Connection Servers
2) Opened a CMD as Admin
3) Ran below commands

Cd C:\Program Files\VMware\VMware View\Server\tools\bin
vdmadmin -A -d <Name of the Desktop Pool> -m <Machine Name> -resetkey

4) You should be able to see the Agent Public Key listed there and thats all good. 
5) Wait for a few mins and could can see status reporting :)

Reporting as "Unassigned User Connected" coz it was assigned to someone else and I logged in there with Admin ID. So rebooted VM and all good afterwords.


Update - 20/03/2020

We have see the same issue when few users enabled installed Docker and part of that Hyper-V feature was enabled!

Removed Hyper-V feature from Add/Remove Programs -> Turn Windows Feature On/Off

Rebooted the VDI machine and that brought Agent back online








Tuesday, December 10, 2019

vCenter SSO User password Expired

We had a vCenter SSO user created for SRM and it's password expired. Here is how you can check it and fix it.

User name is srm@vsphere.local

1) Login to VCSA with SSH and below are commands

root@vcenterserver [ ] cd /usr/lib/vmware-vmafd/bin/

root@vcenterserver [ /usr/lib/vmware-vmafd/bin ]# ./dir-cli user find-by-name --account srm --level 2
Enter password for administrator@vsphere.local:
Account: srm
UPN: srm@VSPHERE.LOCAL
Account disabled: FALSE
Account locked: FALSE
Password never expires: FALSE
Password expired: TRUE

root@vcenterserver [ /usr/lib/vmware-vmafd/bin ]#./dir-cli user modify --account srm  --password-never-expires
Enter password for administrator@vsphere.local:
Password set to never expire for [srm].


root@vcenterserver [ /usr/lib/vmware-vmafd/bin ]#./dir-cli password reset --account srm --password XXXXXXXX 





Tuesday, November 19, 2019

applmgmt service wont start on PSC Appliace post converge operation

Scenario

We had a vCenter with External PSC. We converged them and converge job was successful execpt a cert warning.
After a week we tried to decommission the old PSC appliance and found that the status is shown in WebClient as "Unknown"

Up on checking we found applmgmt in stopped state. Tried to start it but it failed with below error

[ ~ ]# service-control --status
Running:
 lwsmd pschealth vmafdd vmcad vmdird vmdnsd vmonapi vmware-analytics vmware-certificatemanagement vmware-cis-license vmware-cm vmware-rhttpproxy vmware-sca vmware-sts-idmd vmware-stsd vmware-vapi-endpoint vmware-vmon
Stopped:
 applmgmt vmware-statsmonitor


[ ~ ]# service-control --start applmgmt
Operation not cancellable. Please wait for it to finish...
Performing start operation on service applmgmt...
Error executing start on service applmgmt. Details {
    "detail": [
        {
            "translatable": "An error occurred while starting service '%(0)s'",
            "id": "install.ciscommon.service.failstart",
            "args": [
                "applmgmt"
            ],
            "localized": "An error occurred while starting service 'applmgmt'"
        }
    ],
    "componentKey": null,
    "resolution": null,
    "problemId": null
}
Service-control failed. Error: {
    "detail": [
        {
            "translatable": "An error occurred while starting service '%(0)s'",
            "id": "install.ciscommon.service.failstart",
            "args": [
                "applmgmt"
            ],
            "localized": "An error occurred while starting service 'applmgmt'"
        }
    ],
    "componentKey": null,
    "resolution": null,
    "problemId": null
}



We has this issue on two infrastructures and we could fix it one

FIX that worked on first PSC

# List all disabled services for removal.  
find /etc/systemd/system/ -lname '/dev/null' -exec ls {} \;   
 
# Automatically remove them (or rm each file) 
find /etc/systemd/system/ -lname '/dev/null' -exec rm {} \;  
 
# Relaod systemctl daemon 
systemctl daemon-reload  
 
# Start services or Reboot 
service-control --start --all  


However second PSC was not happy still. So we had to manfully remove the replication manually

Manual Removal of the replication

1) Shutdown both PSC and vCenters and take an offline snap
2) Power on only vCenter. Do not start PSC
3) SSH to vCenter and run below commands

a) List all PSCs connected
]# ./vdcrepadmin -f showservers -h localhost -u administrator -w XXXX
cn=oldpscappliance.mydomain.com,cn=Servers,cn=Sites,cn=Configuration,dc=vsphere,dc=local
cn=vcenter.mydomain.com,cn=Servers,cn=Sites,cn=Configuration,dc=vsphere,dc=local


Note -- XXXX is the SSO password for administrator@vsphere.local 

I can now see two, old PSC appliance and also the vCenter with PSC converged in to it.
Ran below command to make sure vCenter is pointing to converged PSC and not the old appliance

]# /usr/lib/vmware-vmafd/bin/vmafd-cli get-ls-location --server-name localhost
https://vcenter.mydomain.com:443/lookupservice/sdk


Output confirmed that the PSC appliance is not in use. So decided to manually remove the association.

# /bin/cmsso-util unregister --node-pnid oldpscappliance.mydomain.com --username administrator --passwd XXXX

Watch theoutput basically ends like this

2019-11-12T08:29:24.939Z  Running command: ['/usr/lib/vmware-vmafd/bin/dir-cli', 'service', 'list', '--login', 'administrator']
2019-11-12T08:29:25.059Z  Done running command
Stopping all the services ...
All services stopped.
Starting all the services ...
Started all the services.
Success

2019-11-12T08:33:13.071Z  Running command: ['/usr/bin/sed', '-i', '-e', 's/cmsso-util.*/cmsso-util/g', '/var/log/vmware/procstate']
2019-11-12T08:33:13.829Z  Done running command

Login to the vCenter via WebClient and under Administration ->  System Configuration makesure that the old PSC is listed anymore.


You may keep the old PSC appliance for a few days and delete it once it's all good. 


Thursday, November 14, 2019

vMotion Failing at 21% with error ""The vMotion failed because the destination host did not receive data from the source host on the vMotion network. Please check your vMotion network settings and physical network configuration and ensure they are correct."

vMotion Failing at 21% with error ""The vMotion failed because the destination host did not receive data from the source host on the vMotion network. Please check your vMotion network settings and physical network configuration and ensure they are correct."

Scenario

We built a new ESXi 6.7 Cluster and we couldn't make vMotions work there

Troubleshooting steps

1) Make sure there is no IP address conflict.

2) SSH to ESXi and do a VMK Ping check

For Default TCP/IP Stack

vmkping 10.11.7.188

Or If you are using Multi NIC vMotion, you might want to specify which VMK interface to use
vmkping -I vmk<vmkinterfacenumber> <Destination VMK IP to Ping>

Eg -: vmkping -I vmk3 192.168.1.1

For vMotion Stack

If you try above vmkping command on a ESXi host with VMK interfaces on VMK stack, they will fail with an error 
Unknown interface 'vmk': Invalid argument
Because the command is looking for TCP/IP stack by default and this VMK wont be listed there. So you need to specify that.

vmkping -I vmk<vmkinterfacenumber> -S vmotion <Destination VMK IP to Ping>

Eg-: vmkping -I vmk3 -S vmotion 192.168.1.1

If  the Ping test fails, check 
a) vMotion port group settings
b) ETXi Host Configuration -> Networking -> VMKernal Adapters and make sure vmotion is not enabled on the current VMK and only on the correct one. 

3) Check vMotion network port status on ESXi hosts to see if it's listening
We can use netcat utility for this test, similar to telnet test we do in Windows.

nc -z <Destination VMK IP> 8000

In the first try it took me you to the next line immedetly, connection is not establicked.
If it connects, it will stay running like I have in the second try for a while before it stops the connection. 


 

Another way to know that is to check network port status. Below is the equivalent to netstat command.
So what you need to do is, while above command is runnig and still active, open a new ssh session to same ESXi and also to Destination ESXi and look at the listening ports

esxcli network ip connection list | grep -i 8000


If  the connection test fails, revisit ESXi firewall rules using web client










Thursday, September 5, 2019

CISCO ACI + vCenter Integration Error "Create a vSphere Distributed Switch" "Status: The operation is not supported on the object"

"Create a vSphere Distributed Switch"  
"Status: The operation is not supported on the object"

We were using CISCO ACI + vSphere for a while. We now have a new datacentre coming up and there were conversations on ACI vs NSX. But finally decided to give ACI a second chance :(

Current datacenters are on vSphere 6.0 and were linked to ACI with no issues.

We decided to go with latest 6.7 vSphere and Built them. ESXi hosts were added on to vCenter and a management DVS was created manually (we always had management DVS out side ACI!) and ESXi VMK0 was moved there, basically all went as per plan and well.

Then came the ACI integration part... We had 3 vCenters and they were all under same VMM domain. So we decided to add the new VC also there.

ACI Role was created in vCenter for permissions, service account is configured there mapping it to the new role. Time came for the integration and it wasn't happy about something.. After adding the new VC, ACI could see ESXi inventory. But the DVS creation was failing with below error in vCenter



Operation is not supported! Come on VMware what operation! 

Anyways I had to gig though logs to find out. Finally below is what I found in VPXA log


2019-09-06T01:33:50.907Z error vpxd[04740] [Originator@6876 sub=DvsUtils opID=7a4ad61f] Non-VMware DVS [Cisco Systems Inc.: ] is not supported
2019-09-06T01:33:51.003Z warning vpxd[04740] [Originator@6876 sub=dvsKeeper opID=7a4ad61f] DVS name [virtual] not in reserved map of DvsManager instance
2019-09-06T01:33:51.003Z info vpxd[04740] [Originator@6876 sub=vpxLro opID=7a4ad61f] [VpxLRO] -- FINISH task-3903
2019-09-06T01:33:51.003Z info vpxd[04740] [Originator@6876 sub=Default opID=7a4ad61f] [VpxLRO] -- ERROR task-3903 -- group-n41 -- vim.Folder.createDistributedVirtualSwitch: vmodl.fault.NotSupported:


So, the issue was.....
Old VM domains were created log time back and were set to use Cisco Systems Inc. as the vendor. VMware 6.0 just dosent care (for now! But upgrade to 6.5 will fail and if you look got KBs there is a way to modify it with a SQL command. Atelast we havent go that far and we will be migrating all VMs to the new DC once built and old once will be decommessioned!). But starting from 6.5 U1 VMware stopped supporting third parity DVS switches. Instead they've opned up APIs and said these vendors can now use APIs create and consume VDS (VMware Distributed Switches). 

Now getting back to how we fixed it. Our ACI was maintained well was updated to the latest. So we could just create a new VMM domain and could specify VMware there.  All worked well!



PowerCLI Install error on powershell

Today I was trying to install PowerCli on my Windows 10 machine using below command

Install-Module -Name VMware.PowerCLI -RequiredVersion 11.1.0.11289667

Ended up with an error "PackageManagement\Install-Package : The following commands are already available on this system:'Get-Cluster,New-Cluster,Remove-Cluster'. This module 'VMware.VimAutomation.Core' may override the existing commands. If you still want to install this module"




Looked like some of the modules are conflicting.
As per MS document here "If the module being installed has the same name or version, or contains commands in an existing module, warning messages are displayed. After you confirm that you want to install the module and override the warnings, use the -Force and -AllowClobber" 

So decided to use -AllowClobber and it worked! 


Wednesday, April 3, 2019

vCenter Appliance WebClient down - Error: Service name is invalid

vCenter Appliance - WebClient down with "Error: Service name is invalid"


Last week I had to shutdown my lab vCenter 6.7. As it's just a lab I just powered it off instead of shutting down. After a few days I powered it back on when I wanted it back, and it booted without issues. But then WebClient wouldn't come up. Just says page cannot be displayed.

I logged in to the appliance via SSH and tried to list service status

root@vcenter [ ~ ]#  service-control --status --all
2019-04-04T10:43:41.796Z  Error: Service name "applmgmt" is invalid.
2019-04-04T10:43:41.816Z  Error: Service name "content-library" is invalid.
2019-04-04T10:43:41.837Z  Error: Service name "vcha" is invalid.
2019-04-04T10:43:41.857Z  Error: Service name "pschealth" is invalid.
2019-04-04T10:43:41.878Z  Error: Service name "sps" is invalid.
2019-04-04T10:43:41.898Z  Error: Service name "sca" is invalid.
2019-04-04T10:43:41.936Z  Error: Service name "vmcam" is invalid.
2019-04-04T10:43:41.956Z  Error: Service name "vmware-vpostgres" is invalid.
2019-04-04T10:43:41.976Z  Error: Service name "vsphere-client" is invalid.
2019-04-04T10:43:41.997Z  Error: Service name "vapi-endpoint" is invalid.
2019-04-04T10:43:42.017Z  Error: Service name "vmware-postgres-archiver" is invalid.
2019-04-04T10:43:42.037Z  Error: Service name "vsm" is invalid.
2019-04-04T10:43:42.057Z  Error: Service name "updatemgr" is invalid.
2019-04-04T10:43:42.078Z  Error: Service name "vmonapi" is invalid.
2019-04-04T10:43:42.098Z  Error: Service name "vsan-health" is invalid.
2019-04-04T10:43:42.119Z  Error: Service name "rbd" is invalid.
2019-04-04T10:43:42.156Z  Error: Service name "vpxd-svcs" is invalid.
2019-04-04T10:43:42.176Z  Error: Service name "statsmonitor" is invalid.
2019-04-04T10:43:42.264Z  Error: Service name "imagebuilder" is invalid.
2019-04-04T10:43:42.284Z  Error: Service name "vsan-dps" is invalid.
2019-04-04T10:43:42.304Z  Error: Service name "cm" is invalid.
2019-04-04T10:43:42.342Z  Error: Service name "mbcs" is invalid.
2019-04-04T10:43:42.362Z  Error: Service name "vpxd" is invalid.
2019-04-04T10:43:42.382Z  Error: Service name "netdumper" is invalid.
2019-04-04T10:43:42.403Z  Error: Service name "rhttpproxy" is invalid.
2019-04-04T10:43:42.423Z  Error: Service name "vsphere-ui" is invalid.
2019-04-04T10:43:42.444Z  Error: Service name "eam" is invalid.
2019-04-04T10:43:42.464Z  Error: Service name "perfcharts" is invalid.
2019-04-04T10:43:42.519Z  Error: Service name "analytics" is invalid.
2019-04-04T10:43:42.539Z  Error: Service name "cis-license" is invalid.
Running:
lwsmd vmafdd vmcad vmdird vmdnsd vmware-pod vmware-sts-idmd vmware-stsd vmware-vmon

Tried to start them using "service-control --start --all", but it didn't work. 

I followed below steps to get it back running.. 

Make sure that you take a backup of your vCenter before trying this (Snapshot or Clone)

1) Stop all services

root@vcenter [ ~ ]# service-control --stop --all

2) Cd to the below dir

root@vcenter [ ~ ]# cd /storage/vmware-vmon/.svcStats

3) Create a new directory and move all .Jason files there 

root@vcenter [ /storage/vmware-vmon/.svcStats ]# ls
stats_analytics.json    stats_cm.json            stats_netdumper.json   stats_sps.json            stats_vcha.json     stats_vmware-postgres-archiver.json  stats_vpxd-svcs.json    stats_vsphere-client.json
stats_applmgmt.json     stats_eam.json           stats_perfcharts.json  stats_statsmonitor.json   stats_vdcs.json     stats_vmware-sca.json                stats_vsan-dps.json     stats_vsphere-ui.json
stats_autodeploy.json   stats_imagebuilder.json  stats_pschealth.json   stats_updatemgr.json      stats_vmcam.json    stats_vmware-vpostgres.json          stats_vsan-health.json
stats_cis-license.json  stats_mbcs.json          stats_rhttpproxy.json  stats_vapi-endpoint.json  stats_vmonapi.json  stats_vpxd.json                      stats_vsm.json



root@vcenter [ /storage/vmware-vmon/.svcStats ]# mkdir StatsjsonBKP
root@vcenter [ /storage/vmware-vmon/.svcStats ]# mv *.json /storage/vmware-vmon/.svcStats/mkdir StatsjsonBKP/

4) Do another ls to confirm

root@vcenter [ /storage/vmware-vmon/.svcStats ]# ls
mkdir StatsjsonBKP

5) Now it's time to start all services

root@vcenter [ ~ ]# service-control --start --all
Operation not cancellable. Please wait for it to finish...
Performing start operation on service lwsmd...
Successfully started service lwsmd
Performing start operation on service vmafdd...
Successfully started service vmafdd
Performing start operation on service vmdird...
Successfully started service vmdird
Performing start operation on service vmcad...
Successfully started service vmcad
Performing start operation on service vmware-sts-idmd...
Successfully started service vmware-sts-idmd
Performing start operation on service vmware-stsd...
Successfully started service vmware-stsd
Performing start operation on service vmdnsd...
Successfully started service vmdnsd
Performing start operation on profile: ALL...
Successfully started profile: ALL.
Performing start operation on service vmware-pod...
Successfully started service vmware-pod

Bang!! 
Gave few mins to start web client and all good!











Thursday, March 28, 2019

Converting P2V of a Windows 2000 Server to vSphere 6.X

It's a shame but I recently had to P2V a Windows 2000 server, part of our legacy services, in order to free up and decomm an old physical server. It was not a pleasure at all :(

Planning

1) Download and keep VMware Converter 4.0.1 (VMware-converter-4.0.1-161434.exe) – To P2V Windows 2000
P2V Converter 4.0.1 doesn’t support any vCenter/ ESXi later than 4.x. So I ended up in installing ESXi 4.0 on a VM running on top of my vSphere 6.5 infra!
So 2) ESXi 4.0 ISO
I read articles of people using VMware WorkStation for this, but in my case I didn’t want the export job to run outside datacentre, due to security reasons.
My Windows 2000 Server that I need to virtualize is 200 Gig. So if you look at the disk size of ESXi VM, I made sure that It’s bigger than that. 

 
 

3) Check Patch Level of Windows 2000 server and make sure
> It’s updated to SP4 Rollup 1 (KB891861)
If not, install from here and reboot the server before running P2V job
4) Download and keep VMware Converter 6.0.0 (VMware-converter-en-6.0.0-2716716.exe) – To move Virtual machine from ESX 4.0 to vCenter 6.X
5) vmscsi-1.2.0.4.flp For troubleshooting - https://kb.vmware.com/s/article/100520

Steps I followed

  1. 1)      Installed ESX4.0 on my VM that I created, configured it with an IP that is accessible from Windows 2000 server and also from my PROD 6.x vCenter
    2)      Installed Converter 4.0.1 on the Windows 2000 machine
    3)      Started P2V Conversion of the Windows 2000 machine to the virtual ESX 4.1
    Make sure you select  “SCSI” as Disk Controller and not “Preserve Source”

    4)      Once the conversion is complete, we now have the VM inside ESX 4. However as this is a nested setup, we can’t power on and test it.
    5)      Install VMware Converter 6.0.0 on a machine that has access to the ESXi 4 VM and your PROD vCenter. We will use V2V on this converter version to move the VM to our PROD 6.5
    Not tested with 6.7, but worst case, we can also download the files from the datastore and move to PROD vCenter!
    6)      Once done, power on and test.


You can then take a backup and do R&D stuff to keep VMware tools and Hardware up to date J

Issues that I’ve seen

BSOD with “KMODE_EXCEPTION_NOT_HANDLED” error while loading windows.
This usually happens if you haven’t installed the SP4 Rollup 1 (KB891861) on the source VM.

I could fix it, without having to re-run P2V. Steps below

  1. 1)      Detach the C:\ Disk from the Windows 2000 VM where you see this issue after P2V
    2)      Connect this disk to any of your existing Windows VMs.
    3)      Go to disk management and bring the disk online and assign a drive letter
    4)      Use 7 Zip and Extract KB891861 and vmscsi-1.2.0.4.flp that we’ve downloaded.
    Look for scsiport.sys under KB891861 extract – version 5.00.2195.7059
          And for vmscsi.sys under vmscsi-1.2.0.4.flp extract – version 1.2.0.4 

           Make sure the versions are correct. 

    5)  Copy these two files to the VM where we have the disk attached.
  1. 6)      Navigate to WINNT\System32\drivers and rename existing scsiport.sys and vmscsi.sys to “–old”
    7)      Copy the two new files.
    8)      Bring disk offline and move it back to Windows 2000 VM
    9)      Double check and make sure that the SCSI controller type is “BusLogic Parallel”
    10)   Power on the VM to test!