So, we've set up OpenSM as a service in Windows, with Start Type = Automatic.
https://wchukov.blogspot.com/2019/06/fixing-this-configuration-exceeds-mtu.html
Sometimes (depending on the OS updates) the service just won't start; even if recovery options (Restart the service) are set. If we start it manually via elevated command line, it will start normally.
However, if the service is set to Automatic(Delayed Start), it will start in 240 seconds.
The problem, I believe, is in unsatisfied dependency for OpenSM in mlx4_bus,ibbus, or ipoib6x drivers that have to finish initialization before the SM is invoked.
The simplest workaround so far is to adjust the Delayed Start timer and set OpenSM service to Automatic(Delayed Start).
To do that, in Registry Editor, in HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control a new DWORD key has to be created with the name AutoStartDelay. Its value can be set to 25000 decimal(value is in milliseconds, i.e 25 seconds).
Feel free to adjust the value - hopefully, you won't have to wait for 240 seconds to be online.
Yet Another Network Blog
Tuesday, April 14, 2020
Monday, July 8, 2019
Fixing "Error from osm_opensm_bind (0x2B) Perhaps another instance of OpenSM is already running"
Sometimes, OpenSM service won't start automatically or restart it will fail:
First thing we want to confirm is that the card currently in IB mode:
Then, opensm.conf has to be adjusted. By default its first option, GUID, is empty. Take GUID of an active port (port 1 or port 2, 0x0002c903000232db in the example) and replace the following line: Done, service should start as expected.
OpenSM 3.3.11 UMAD
Entering DISCOVERING state
Error from osm_opensm_bind (0x2B)
Perhaps another instance of OpenSM is already running
Exiting SM
First thing we want to confirm is that the card currently in IB mode:
Then, opensm.conf has to be adjusted. By default its first option, GUID, is empty. Take GUID of an active port (port 1 or port 2, 0x0002c903000232db in the example) and replace the following line: Done, service should start as expected.
Sunday, June 9, 2019
Fixing "This configuration exceeds the MTU reported by OpenSM, which is 2048"
I got tired of the following EventLog message:
First thing we want to adjust is partitions.conf file, set mtu to 5 (4092):
By default, ancient OpenSM 3.3.11 for some reason will not read partitions.conf file specified in global opensm.conf The easiest way to force it to read is to recreate a service with forced -P parameter. Here's how to do it in default PowerShell:
Done, check System log in Event Viewer.
According to the configuration under the "Jumbo Packets" advanced property, the MTU configured for device Mellanox ConnectX-3 IPoIB Adapter is 4092. The effective MTU is the supplied value + 4 bytes (for the IPoIB header). This configuration exceeds the MTU reported by OpenSM, which is 2048. This inconsistency may result in communication failures. Please change the MTU of IPoIB or OpenSM, and restart the driver.
First thing we want to adjust is partitions.conf file, set mtu to 5 (4092):
By default, ancient OpenSM 3.3.11 for some reason will not read partitions.conf file specified in global opensm.conf The easiest way to force it to read is to recreate a service with forced -P parameter. Here's how to do it in default PowerShell:
Done, check System log in Event Viewer.
Sunday, May 27, 2018
Infiniband/RDMA on Windows - now on Windows 10 too
IB on VMware and Windows
After struggling with 6.5 and newest 6.7 ESXi, which properly work only in Ethernet mode with Connect-X cards family from Mellanox:
- 1.8.x driver supports IB/iSER/SRP, but does not support SR-IOV
- 2.4 driver doesn't work at all in 6.5/6.7 - mlx4_core is not loading
- Default ESXi driver supports SR-IOV but works only in Ethernet mode
and learning a bunch of esxcli commands, I've decided to look at how things are on Windows side.
[Spoiler: Things are really good and almost as good as in Linux]
WinOF
Compared to just plain drivers for ESXi, Windows gets the whole OpenFabrics Distribution:
- VPI drivers - switch between IB and ETH anytime
- OpenSM - installs Windows OpenSM that is required to manage the subnet from a host. Even the command to run it as a service with auto-start is specified!
- Performance tools - installs the performance tools that are used to measure the InfiniBand performance in user environment
- Analyze tools - installs the tools that can be used either to diagnose or analyze the InfiniBand
- environment
- SDK - contains the libraries and DLLs for developing InfiniBand application over IBAL
- Documentation - contains the User Manual and Installation Guide
- Firmware update - for Mellanox genuine adapters, firmware update is performed automatically
- Performance tuning - there are a few tuning scenarios available: Single-port traffic, Dual-port traffic, Forwarding traffic, Multicast traffic
- Failover teaming - provides redundancy through automatic fail-over from an active adapter to a standby adapter in case of switch port, cable, or adapter failure.
Finally, for Connect-X2/X3 family all of these features are supported in:
Windows Server: 2012, 2012 R2, 2016, including newest build 1803 (RS4)
Windows Client: 8.1, 10, including newest build 1803
RDMA in Windows
RDMA is power. It should be in every storage and network protocol, because of:- Increased throughput: leverages the full throughput of high speed networks in which the network adapters coordinate the transfer of large amounts of data at line speed.
- Low latency: provides extremely fast responses to network requests, and, as a result, makes remote file storage feel as if it is directly attached block storage.
- Low CPU utilization: uses fewer CPU cycles when transferring data over the network, which leaves more power available to server applications.
SMB Direct in Windows Server 2016
Nothing to do here. If the adapter is RDMA-capable, and you haven't disabled it, it'll just work, especially in InfiniBand fabric. With RoCE - well, let's see official statement:
Microsoft Recommendation: While the Microsoft RDMA interface is RDMA-technology agnostic, in our experience with customers and partners we find that RoCE/RoCEv2 installations are difficult to get configured correctly and are problematic at any scale above a single rack. If you intend to deploy RoCE/RoCEv2, you should a) have a small scale (single rack) installation, and b) have an expert network administrator who is intimately familiar with Data Center Bridging (DCB), especially the Enhanced Transmission Service (ETS) and Priority Flow Control (PFC) components of DCB. If you are deploying in any other context iWarp is the safer alternative. iWarp does not require any configuration of DCB on network hosts or network switches and can operate over the same distances as any other TCP connection. RoCE, even when enhanced with Explicit Congestion Notification (ECN) detection, requires network configuration to configure DCB/ETS/PFC and/or ECN especially if the scale of deployment exceeds a single rack. Tuning of these settings, i.e., the settings required to make DCB and/or ECN work, is an art not mastered by every network engineer.
Here's a couple of good guides:
SMB Direct in Windows 10
Previously, RDMA was not available on Windows client systems. Thanks to Microsoft, with Fall Creators update it became available in a specific hi-end edition, Windows 10 Pro for Workstations.
This is the usual Windows 10 Pro:
Same ConnectX-3 adapter which worked fine in Windows Server will return Enabled/True on all of the checks:
And onlyGet-NetOffloadGlobalSetting | Select NetworkDirectGet-NetAdapterRDMAGet-NetAdapterHardwareInfoGet-SmbClientConfiguration | Select EnableMultichannel
Get-SmbClientNetworkInterfacewill show that RDMA is not working.
The solution is simple - get Windows 10 Pro for Workstations. Here are upgrade paths:
https://docs.microsoft.com/en-us/windows/deployment/upgrade/windows-10-edition-upgrades
You could use Microsoft Store App, search for Windows 10 pro for Workstations:
slmgr /ipk DXG7C-N36C4-C4HTG-X4T3X-2YV77More detailed instructions are available at https://www.tenforums.com/tutorials/95822-upgrade-windows-10-pro-windows-10-pro-workstations.html* Keep in mind that Windows 10 Pro N version includes all the base features of the operating system but without Windows Media Player, Music, Video, Voice Recorder and Skype.
Check for updates in Windows Update after activation. Download of KB 4100403 starts.
Now, run the same command:
Get-SmbClientNetworkInterface
If you have an active share already, check that it's using RDMA:
Get-SmbMultichannelConnection
Last but not least - you will NOT see any adapter utilization with RDMA in Task Manager:
Instead, check Performance Monitor counters - you'll see them under RDMA activity:
If that helped, please endorse: https://goo.gl/RfjbnG
Friday, April 6, 2018
Automating Mellanox ONYX and MLNX-OS using Ansible: quick guide
For those who have already evolved to NetOps, manual command-by-command execution and config template copy-pasting are things of the past. Netconf, REST-based APIs are becoming a must; if you don't have it - there are still options to automate. Among other orchestrating solutions, Ansible stands out - it's simple, SSH-based, and already has modules to rule the ocean of your network devices.
MLNX-OS CLI
By creating Cisco-like CLI, Mellanox inherited classic IOS problems: lack of rollback, configuration management, filtering options and other things (100 things why we love Juniper). Later, XML API was introduced - but it still looks like a wrap of old CLI commands (why not do Netconf?). To this day, the underlying interfaces in Onyx and MLNX-OS are still the same - yes, means we still have to feed old commands to these systems. Although, in ONYX new JSON-API has become available (for x86 platforms) - but, again, there was just a little markup magic required to create it. We can only wish for features present in NX-OS and Junos like shell access, wildcards, Linux-based filters and commands, and, additionally, an ability to integrate into a proper SDN one day.
Ansible 2.5
Before Ansible 2.5, the automation of Mellanox was pretty sad. Thankfully, it was possible to send multiple commands by ssh: https://community.mellanox.com/docs/DOC-2092.
With the release of 2.5, new network modules were introduced:
- onyx_bgp – Configures BGP on Mellanox ONYX network devices
- onyx_command – Run commands on remote devices running Mellanox ONYX
- onyx_config – Manage Mellanox ONYX configuration sections
- onyx_facts – Collect facts from Mellanox ONYX network devices
- onyx_interface – Manage Interfaces on Mellanox ONYX network devices
- onyx_l2_interface – Manage Layer-2 interface on Mellanox ONYX network devices
- onyx_l3_interface – Manage L3 interfaces on Mellanox ONYX network devices
- onyx_linkagg – Manage link aggregation groups on Mellanox ONYX network devices
- onyx_lldp – Manage LLDP configuration on Mellanox ONYX network devices
- onyx_lldp_interface – Manage LLDP interfaces configuration on Mellanox ONYX network devices
- onyx_magp – Manage MAGP protocol on Mellanox ONYX network devices
- onyx_mlag_ipl – Manage IPL (inter-peer link) on Mellanox ONYX network devices
- onyx_mlag_vip – Configures MLAG VIP on Mellanox ONYX network devices
- onyx_ospf – Manage OSPF protocol on Mellanox ONYX network devices
- onyx_pfc_interface – Manage priority flow control on ONYX network devices
- onyx_protocol – Enables/Disables protocols on Mellanox ONYX network devices
- onyx_vlan – Manage VLANs on Mellanox ONYX network devices
As you can see, the majority of those manage Ethernet-specific protocols, but a couple can be used to manage both Infiniband (MLNX-OS VPI) and Ethernet (MLNX-OS/ONYX) devices.
Editing Ansible configuration
We need to change /etc/ansible/ansible.cfg
There are a couple of things to do to prevent a headache while using Ansible, uncomment lines and change options as needed. These parameters, however, are not final and you can freely adjust them to your liking.
gathering = explicit
Gathering doesn't work properly with network equipment, therefore we say that we don't do it by default
host_key_checking = False
Crucial for devices management, SSH key checking sometimes causes timeouts and failed plays. Either you live with security drawback while disabling it, or maintain known_hosts list on your master host.
timeout = 30
Increasing SSH timeout from 10 to 30 for remote connections.
look_for_keys = False
Out of scope, because we do not use paramiko to connect to Mellanox devices, but still important for other devices.
host_key_auto_add = True
Same here, paramiko parameter to add new ssh host keys automatically.
connect_timeout = 60
Increasing persistent connection timeout from 30 to 60 seconds.
connect_retry_timeout = 45
Increasing retry timeout from 15 to 45 seconds
command_timeout = 30
Increasing the amount of time to wait for a command before timing out from 10 to 30 seconds (hello to slow and old PowerPC board inside)
Setting up inventory
Let's use standard Ansible hosts file /etc/ansible/hosts
[test_cluster]
192.168.0.[10:25]
[test_cluster:vars]
ansible_network_os=onyx
So, our Mellanox switches are have IP addresses starting from 192.168.0.10 to 192.168.0.25, they have default user admin with password admin configured, ssh access is enabled, and no enable_password configured. We also specify Network OS for Ansible to handle the CLI properly.
Creating playbooks
Create a .yml file with the following content:
- hosts: test_cluster
gather_facts: false
connection: network_cli
tasks:
- name: run command on MLNX-OS/Onyx device
onyx_command:
commands:
- enable
- show version
- show ntp
- show usernames
Here we specify that we're connecting to our test_cluster group using network_cli connection to execure a series of commands.
However, if you add some configuration commands,
- conf t- show version
- xml-gw enable
module will fail:
"msg": "onyx_command does not support running config mode commands. Please use onyx_config instead"
Okay, let's run our playbook:
ansible-playbook <filename>.yml -u admin --ask-pass -vvv
We will be prompted to enter SSH password (by default, admin) and pretty soon will see output lines:
"stdout_lines": [
[
""
],
[
"Product name: MLNX-OS",
"Product release: 3.6.6000",
"Build ID: #1-dev",
"Build date: 2018-03-04 16:48:04",
"Target arch: ppc",
"Target hw: m460ex",
"Built by: jenkins@2811f8c7d517",
"Version summary: PPC_M460EX 3.6.6000 2018-03-04 16:48:04 ppc",
"",
"Product model: ppc",
"Host ID: EC0D9ACED572",
"",
"Uptime: 14d 17h 28m 13.056s",
"CPU load averages: 1.37 / 1.18 / 1.09",
"Number of CPUs: 1",
"System memory: 268 MB used / 1759 MB free / 2027 MB total",
"Swap: 0 MB used / 0 MB free / 0 MB total"
],
[
"NTP is administratively : enabled",
"NTP Authentication administratively: disabled",
"",
"Clock is unsynchronized.",
"",
"Active servers and peers:",
" No NTP associations present."
],
[
"USERNAME FULL NAME CAPABILITY ACCOUNT STATUS",
"admin System Administrator admin Password set (SHA512)",
"monitor System Monitor monitor Password set (SHA512)",
"xmladmin XML Admin User admin Password set (SHA512)",
"xmluser XML Monitor User monitor Password set (SHA512)"
]
]
}
Of course, you can run Infiniband-specific commands just as easily:
- show ib ha
- show ib smnodes
Using onyx_config module
Here's an example of using onyx_config to back up running configuration:
- hosts: test_cluster
gather_facts: false
connection: network_cli
become: yes
become_method: enable
tasks:
- name: change config on MLNX-OS device
onyx_config:
backup: yes
The only change is adding "become" and "become_method" parameters which are required for enable_mode on Mellanox switches - without it, we can not read the running configuration.
Run it with:
ansible-playbook <new_filename>.yml -u admin --ask-pass.
Configuration files will be saved to ./backup directory.
Troubleshooting
First of all, ensure that you have correct Ansible and Python installed.
user@somewhere:$ ansible --version
ansible 2.5.0
config file = /etc/ansible/ansible.cfg
configured module search path = [u'/home/user/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
ansible python module location = /usr/lib/python2.7/dist-packages/ansible
executable location = /usr/bin/ansible
python version = 2.7.14+ (default, Feb 6 2018, 19:12:18) [GCC 7.3.0]
Currently, ONYX modules work best with python2.7.
If something doesn't work, first try to connect to target host by SSH and execute commands manually, see if that works. Then, increase verbosity of Ansible commands by adding -vvvv
Then, use this great troubleshooting guide:
http://docs.ansible.com/ansible/latest/network/user_guide/network_debug_troubleshooting.html
See best practices here (use ssh-keys for authentication instead of default admin user, for example):
http://docs.ansible.com/ansible/latest/network/user_guide/network_best_practices_2.5.html
If that helped, please endorse: https://goo.gl/RfjbnG
Subscribe to:
Posts (Atom)
Fixing OpenSM service not running
So, we've set up OpenSM as a service in Windows, with Start Type = Automatic. https://wchukov.blogspot.com/2019/06/fixing-this-configura...
-
IB on VMware and Windows After struggling with 6.5 and newest 6.7 ESXi, which properly work only in Ethernet mode with Connect-X cards ...
-
For those who have already evolved to NetOps, manual command-by-command execution and config template copy-pasting are things of the past. ...
-
Sometimes, OpenSM service won't start automatically or restart it will fail: OpenSM 3.3.11 UMAD Entering DISCOVERING state E...