IB on VMware and Windows
After struggling with 6.5 and newest 6.7 ESXi, which properly work only in Ethernet mode with Connect-X cards family from Mellanox:
- 1.8.x driver supports IB/iSER/SRP, but does not support SR-IOV
- 2.4 driver doesn't work at all in 6.5/6.7 - mlx4_core is not loading
- Default ESXi driver supports SR-IOV but works only in Ethernet mode
and learning a bunch of esxcli commands, I've decided to look at how things are on Windows side.
[Spoiler: Things are really good and almost as good as in Linux]
WinOF
Compared to just plain drivers for ESXi, Windows gets the whole OpenFabrics Distribution:
- VPI drivers - switch between IB and ETH anytime
- OpenSM - installs Windows OpenSM that is required to manage the subnet from a host. Even the command to run it as a service with auto-start is specified!
- Performance tools - installs the performance tools that are used to measure the InfiniBand performance in user environment
- Analyze tools - installs the tools that can be used either to diagnose or analyze the InfiniBand
- environment
- SDK - contains the libraries and DLLs for developing InfiniBand application over IBAL
- Documentation - contains the User Manual and Installation Guide
- Firmware update - for Mellanox genuine adapters, firmware update is performed automatically
- Performance tuning - there are a few tuning scenarios available: Single-port traffic, Dual-port traffic, Forwarding traffic, Multicast traffic
- Failover teaming - provides redundancy through automatic fail-over from an active adapter to a standby adapter in case of switch port, cable, or adapter failure.
Finally, for Connect-X2/X3 family all of these features are supported in:
Windows Server: 2012, 2012 R2, 2016, including newest build 1803 (RS4)
Windows Client: 8.1, 10, including newest build 1803
RDMA in Windows
RDMA is power. It should be in every storage and network protocol, because of:- Increased throughput: leverages the full throughput of high speed networks in which the network adapters coordinate the transfer of large amounts of data at line speed.
- Low latency: provides extremely fast responses to network requests, and, as a result, makes remote file storage feel as if it is directly attached block storage.
- Low CPU utilization: uses fewer CPU cycles when transferring data over the network, which leaves more power available to server applications.
SMB Direct in Windows Server 2016
Nothing to do here. If the adapter is RDMA-capable, and you haven't disabled it, it'll just work, especially in InfiniBand fabric. With RoCE - well, let's see official statement:
Microsoft Recommendation: While the Microsoft RDMA interface is RDMA-technology agnostic, in our experience with customers and partners we find that RoCE/RoCEv2 installations are difficult to get configured correctly and are problematic at any scale above a single rack. If you intend to deploy RoCE/RoCEv2, you should a) have a small scale (single rack) installation, and b) have an expert network administrator who is intimately familiar with Data Center Bridging (DCB), especially the Enhanced Transmission Service (ETS) and Priority Flow Control (PFC) components of DCB. If you are deploying in any other context iWarp is the safer alternative. iWarp does not require any configuration of DCB on network hosts or network switches and can operate over the same distances as any other TCP connection. RoCE, even when enhanced with Explicit Congestion Notification (ECN) detection, requires network configuration to configure DCB/ETS/PFC and/or ECN especially if the scale of deployment exceeds a single rack. Tuning of these settings, i.e., the settings required to make DCB and/or ECN work, is an art not mastered by every network engineer.
Here's a couple of good guides:
SMB Direct in Windows 10
Previously, RDMA was not available on Windows client systems. Thanks to Microsoft, with Fall Creators update it became available in a specific hi-end edition, Windows 10 Pro for Workstations.
This is the usual Windows 10 Pro:
Same ConnectX-3 adapter which worked fine in Windows Server will return Enabled/True on all of the checks:
And onlyGet-NetOffloadGlobalSetting | Select NetworkDirectGet-NetAdapterRDMAGet-NetAdapterHardwareInfoGet-SmbClientConfiguration | Select EnableMultichannel
Get-SmbClientNetworkInterfacewill show that RDMA is not working.
The solution is simple - get Windows 10 Pro for Workstations. Here are upgrade paths:
https://docs.microsoft.com/en-us/windows/deployment/upgrade/windows-10-edition-upgrades
You could use Microsoft Store App, search for Windows 10 pro for Workstations:
slmgr /ipk DXG7C-N36C4-C4HTG-X4T3X-2YV77More detailed instructions are available at https://www.tenforums.com/tutorials/95822-upgrade-windows-10-pro-windows-10-pro-workstations.html* Keep in mind that Windows 10 Pro N version includes all the base features of the operating system but without Windows Media Player, Music, Video, Voice Recorder and Skype.
Check for updates in Windows Update after activation. Download of KB 4100403 starts.
Now, run the same command:
Get-SmbClientNetworkInterface
If you have an active share already, check that it's using RDMA:
Get-SmbMultichannelConnection
Last but not least - you will NOT see any adapter utilization with RDMA in Task Manager:
Instead, check Performance Monitor counters - you'll see them under RDMA activity:
If that helped, please endorse: https://goo.gl/RfjbnG