“Any sufficiently advanced technology is indistinguishable from magic.” – – Arthur C. Clarke
I was recently approached by our server team that we were seeing slow transfer rates on VMs. They noticed it first in their application but were able to run a dd command to shower certain VMs were only getting 4MB/s when they should have been getting up to 1GB/s. This only happened when ran against our datastore.
dd if=/dev/sda of=/dev/null bs=1M count=100 status=progress52428800 bytes (52 MB) copied, 10.988102 s, 4.8 MB/s
The problem persisted across multiple ESXi hosts but not on others in the same cluster. This narrowed it down to something on this rack which probably meant an issue at the top of rack switch but I couldn’t figure out what. I decided to run a packet capture on one of the offending hosts.
pktcap-uw –uplink vmnic1 -o /tmp/packetcapture.pcap
I noticed enough malformed packets and retransmits that I decided to check errors on our vmnic ports.
esxcli network nic stats get -n vmnic0 | egrep “Total receive errors|Receive CRC errors|Receive missed errors”
esxcli network nic stats get -n vmnic1 | egrep “Total receive errors|Receive CRC errors|Receive missed errors”
I checked multiple hosts and saw the same across the board. We had a high level of CRC errors on one port but not the other
Lots of Totalreceiveerrors!!!!!!
Multicast bad actor?
…. to be continued