Tuesday, April 14, 2020

Fixing OpenSM service not running

So, we've set up OpenSM as a service in Windows, with Start Type = Automatic. https://wchukov.blogspot.com/2019/06/fixing-this-configuration-exceeds-mtu.html

Sometimes (depending on the OS updates) the service just won't start; even if recovery options (Restart the service) are set. If we start it manually via elevated command line, it will start normally. However, if the service is set to Automatic(Delayed Start), it will start in 240 seconds.

The problem, I believe, is in unsatisfied dependency for OpenSM in mlx4_bus,ibbus, or ipoib6x drivers that have to finish initialization before the SM is invoked. The simplest workaround so far is to adjust the Delayed Start timer and set OpenSM service to Automatic(Delayed Start).

To do that, in Registry Editor, in HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control a new DWORD key has to be created with the name AutoStartDelay. Its value can be set to 25000 decimal(value is in milliseconds, i.e 25 seconds).

Feel free to adjust the value - hopefully, you won't have to wait for 240 seconds to be online.

Monday, July 8, 2019

Fixing "Error from osm_opensm_bind (0x2B) Perhaps another instance of OpenSM is already running"

Sometimes, OpenSM service won't start automatically or restart  it will fail:
OpenSM 3.3.11 UMAD 
Entering DISCOVERING state 


Error from osm_opensm_bind (0x2B)
Perhaps another instance of OpenSM is already running
Exiting SM 

First thing we want to confirm is that the card currently in IB mode:
Then, opensm.conf has to be adjusted. By default its first option, GUID, is empty. Take GUID of an active port (port 1 or port 2, 0x0002c903000232db in the example) and replace the following line: Done, service should start as expected.

Sunday, June 9, 2019

Fixing "This configuration exceeds the MTU reported by OpenSM, which is 2048"

I got tired of the following EventLog message:

According to the configuration under the "Jumbo Packets" advanced property, the MTU configured for device Mellanox ConnectX-3 IPoIB Adapter is 4092. The effective MTU is the supplied value + 4 bytes (for the IPoIB header). This configuration exceeds the MTU reported by OpenSM, which is 2048. This inconsistency may result in communication failures. Please change the MTU of IPoIB or OpenSM, and restart the driver.

First thing we want to adjust is partitions.conf file, set mtu to 5 (4092):

By default, ancient OpenSM 3.3.11 for some reason will not read partitions.conf file specified in global opensm.conf The easiest way to force it to read is to recreate a service with forced -P parameter. Here's how to do it in default PowerShell:

Done, check System log in Event Viewer.

Sunday, May 27, 2018

Infiniband/RDMA on Windows - now on Windows 10 too


IB on VMware and Windows

After struggling with 6.5 and newest 6.7 ESXi, which properly work only in Ethernet mode with Connect-X cards family from Mellanox: 
  • 1.8.x driver supports IB/iSER/SRP, but does not support SR-IOV
  • 2.4 driver doesn't work at all in 6.5/6.7 - mlx4_core is not loading
  • Default ESXi driver supports SR-IOV but  works only in Ethernet mode
and learning a bunch of esxcli commands, I've decided to look at how things are on Windows side.
[Spoiler: Things are really good and almost as good as in Linux]

WinOF

Compared to just plain drivers for ESXi, Windows gets the whole OpenFabrics Distribution:

  • VPI drivers - switch between IB and ETH anytime
  • OpenSM - installs Windows OpenSM that is required to manage the subnet from a host. Even the command to run it as a service with auto-start is specified!
  • Performance tools - installs the performance tools that are used to measure the InfiniBand performance in user environment
  • Analyze tools - installs the tools that can be used either to diagnose or analyze the InfiniBand
  • environment
  • SDK - contains the libraries and DLLs for developing InfiniBand application over IBAL
  • Documentation - contains the User Manual and Installation Guide
  • Firmware update - for Mellanox genuine adapters, firmware update is performed automatically
  • Performance tuning - there are a few tuning scenarios available: Single-port traffic, Dual-port traffic, Forwarding traffic, Multicast traffic
  • Failover teaming - provides redundancy through automatic fail-over from an active adapter to a standby adapter in case of switch port, cable, or adapter failure.
Finally, for Connect-X2/X3 family all of these features are supported in:

Windows Server: 2012, 2012 R2, 2016, including newest build 1803 (RS4)

Windows Client: 8.1, 10, including newest build 1803


RDMA in Windows

RDMA is power. It should be in every storage and network protocol, because of:
  • Increased throughput: leverages the full throughput of high speed networks in which the network adapters coordinate the transfer of large amounts of data at line speed.
  • Low latency: provides extremely fast responses to network requests, and, as a result, makes remote file storage feel as if it is directly attached block storage.
  • Low CPU utilization: uses fewer CPU cycles when transferring data over the network, which leaves more power available to server applications.
Still not convinced? See https://blogs.technet.microsoft.com/filecab/2017/03/27/to-rdma-or-not-to-rdma-that-is-the-question/ 28% IOPS increase, 27% more IOPS/%kernel CPU saving

The Windows implementation of RDMA is called Network Direct (ND), and it's biggest consumer is SMB, Windows network file sharing protocol. Here are some use-cases: File storage for virtualization (Hyper-V™ over SMB), Microsoft SQL Server over SMB, traditional file sharing. Therefore, SMB Direct is RDMA-enabled SMB, in the next steps we'll see how to configure and test it.

SMB Direct in Windows Server 2016

Nothing to do here. If the adapter is RDMA-capable, and you haven't disabled it, it'll just work, especially in InfiniBand fabric. With RoCE - well, let's see official statement:
Microsoft Recommendation: While the Microsoft RDMA interface is RDMA-technology agnostic, in our experience with customers and partners we find that RoCE/RoCEv2 installations are difficult to get configured correctly and are problematic at any scale above a single rack.  If you intend to deploy RoCE/RoCEv2, you should a) have a small scale (single rack) installation, and b) have an expert network administrator who is intimately familiar with Data Center Bridging (DCB), especially the Enhanced Transmission Service (ETS) and Priority Flow Control (PFC) components of DCB.  If you are deploying in any other context iWarp is the safer alternative.  iWarp does not require any configuration of DCB on network hosts or network switches and can operate over the same distances as any other TCP connection. RoCE, even when enhanced with Explicit Congestion Notification (ECN) detection, requires network configuration to configure DCB/ETS/PFC and/or ECN especially if the scale of deployment exceeds a single rack.  Tuning of these settings, i.e., the settings required to make DCB and/or ECN work, is an art not mastered by every network engineer.   

Here's a couple of good guides:


SMB Direct in Windows 10


Previously, RDMA was not available on Windows client systems. Thanks to Microsoft, with Fall Creators update it became available in a specific hi-end edition, Windows 10 Pro for Workstations.

This is the usual Windows 10 Pro:


Same ConnectX-3 adapter which worked fine in Windows Server will return Enabled/True on all of the checks:
Get-NetOffloadGlobalSetting | Select NetworkDirect
Get-NetAdapterRDMA
Get-NetAdapterHardwareInfo
Get-SmbClientConfiguration | Select EnableMultichannel  
And only

Get-SmbClientNetworkInterface
will show that RDMA is not working.

The solution is simple - get Windows 10 Pro for Workstations. Here are upgrade paths:
https://docs.microsoft.com/en-us/windows/deployment/upgrade/windows-10-edition-upgrades

You could use Microsoft Store App, search for Windows 10 pro for Workstations:


 or, just simple slmgr activation command with generic key (get a genuine key and activate it later)
slmgr /ipk DXG7C-N36C4-C4HTG-X4T3X-2YV77
More detailed instructions are available at https://www.tenforums.com/tutorials/95822-upgrade-windows-10-pro-windows-10-pro-workstations.html* Keep in mind that Windows 10 Pro N version  includes all the base features of the operating system but without Windows Media Player, Music, Video, Voice Recorder and Skype.

Check for updates in Windows Update after activation. Download of KB 4100403 starts.


Additionally, you need to enable SMB Direct in Control panel > Programs and Features > Turn Windows Features on or off > check SMB Direct. This is another KB & it requires reboot. Check system edition after reboot.



Now, run the same command:
Get-SmbClientNetworkInterface

If you have an active share already, check that it's using RDMA:
Get-SmbMultichannelConnection

Last but not least - you will NOT see any adapter utilization with RDMA in Task Manager:


Instead, check Performance Monitor counters - you'll see them under RDMA activity:


If that helped, please endorse: https://goo.gl/RfjbnG


Friday, April 6, 2018

Automating Mellanox ONYX and MLNX-OS using Ansible: quick guide

For those who have already evolved to NetOps, manual command-by-command execution and config template copy-pasting are things of the past. Netconf, REST-based APIs are becoming a must; if you don't have it - there are still options to automate. Among other orchestrating solutions, Ansible stands out - it's simple, SSH-based, and already has modules to rule the ocean of your network devices.
 
MLNX-OS CLI

By creating Cisco-like CLI, Mellanox inherited classic IOS problems: lack of rollback, configuration management, filtering options and other things (100 things why we love Juniper). Later, XML API was introduced - but it still looks like a wrap of old CLI commands (why not do Netconf?). To this day, the underlying interfaces in Onyx and MLNX-OS are still the same - yes, means we still have to feed old commands to these systems. Although, in ONYX new JSON-API has become available (for x86 platforms) - but, again, there was just a little markup magic required to create it. We can only wish for features present in NX-OS and Junos like shell access, wildcards, Linux-based filters and commands, and, additionally, an ability to integrate into a proper SDN one day.

Ansible 2.5

Before Ansible 2.5, the automation of Mellanox was pretty sad. Thankfully, it was possible to send multiple commands by ssh: https://community.mellanox.com/docs/DOC-2092.
With the release of 2.5, new network modules were introduced:

As you can see, the majority of those manage Ethernet-specific protocols, but a couple can be used to manage both Infiniband (MLNX-OS VPI) and Ethernet (MLNX-OS/ONYX) devices.

Editing Ansible configuration
We need to change /etc/ansible/ansible.cfg

There are a couple of things to do to prevent a headache while using Ansible, uncomment lines and change options as needed. These parameters, however, are not final and you can freely adjust them to your liking.

gathering = explicit
Gathering doesn't work properly with network equipment, therefore we say that we don't do it by default

host_key_checking = False
Crucial for devices management, SSH key checking sometimes causes timeouts and failed plays. Either you live with security drawback while disabling it, or maintain known_hosts list on your master host.
 
timeout = 30
Increasing SSH timeout from 10 to 30 for remote connections.

look_for_keys = False
Out of scope, because we do not use paramiko to connect to Mellanox devices, but still important for other devices.

host_key_auto_add = True
Same here, paramiko parameter to add new ssh host keys automatically.

connect_timeout = 60
Increasing persistent connection timeout from 30 to 60 seconds.

connect_retry_timeout = 45
Increasing retry timeout from 15 to 45 seconds

command_timeout = 30
Increasing the amount of time to wait for a command before timing out from 10 to 30 seconds (hello to slow and old PowerPC board inside)

Setting up inventory
Let's use standard Ansible hosts file  /etc/ansible/hosts

[test_cluster]
192.168.0.[10:25]
[test_cluster:vars]
ansible_network_os=onyx

So, our Mellanox switches are have IP addresses starting from 192.168.0.10 to 192.168.0.25, they have default user admin with password admin configured, ssh access is enabled, and no enable_password configured. We also specify Network OS for Ansible to handle the CLI properly.

Creating playbooks
Create a .yml file with the following content:

-  hosts: test_cluster
   gather_facts: false
   connection: network_cli
   tasks:
   - name: run command on MLNX-OS/Onyx device
     onyx_command:
      commands:
       - enable
       - show version
       - show ntp
       - show usernames


Here we specify that we're connecting to our test_cluster group using network_cli connection to execure a series of commands.
However, if you add some configuration commands,
       - conf t
       - show version
       - xml-gw enable


module will fail:

"msg": "onyx_command does not support running config mode commands.  Please use onyx_config instead"

Okay, let's run our playbook:

ansible-playbook <filename>.yml -u admin --ask-pass -vvv

We will be prompted to enter SSH password (by default, admin) and pretty soon will see output lines:

    "stdout_lines": [
        [
            ""
        ],
        [
            "Product name:      MLNX-OS",
            "Product release:   3.6.6000",
            "Build ID:          #1-dev",
            "Build date:        2018-03-04 16:48:04",
            "Target arch:       ppc",
            "Target hw:         m460ex",
            "Built by:          jenkins@2811f8c7d517",
            "Version summary:   PPC_M460EX 3.6.6000 2018-03-04 16:48:04 ppc",
            "",
            "Product model:     ppc",
            "Host ID:           EC0D9ACED572",
            "",
            "Uptime:            14d 17h 28m 13.056s",
            "CPU load averages: 1.37 / 1.18 / 1.09",
            "Number of CPUs:    1",
            "System memory:     268 MB used / 1759 MB free / 2027 MB total",
            "Swap:              0 MB used / 0 MB free / 0 MB total"
        ],
        [
            "NTP is administratively            : enabled",
            "NTP Authentication administratively: disabled",
            "",
            "Clock is unsynchronized.",
            "",
            "Active servers and peers:",
            "  No NTP associations present."
        ],
        [
            "USERNAME    FULL NAME               CAPABILITY  ACCOUNT STATUS",
            "admin       System Administrator    admin       Password set (SHA512)",
            "monitor     System Monitor          monitor     Password set (SHA512)",
            "xmladmin    XML Admin User          admin       Password set (SHA512)",
            "xmluser     XML Monitor User        monitor     Password set (SHA512)"
        ]
    ]
}


Of course, you can run Infiniband-specific commands just as easily: 
- show ib ha
- show ib smnodes

 
Using onyx_config module


 Here's an example of using onyx_config to back up running configuration:

-  hosts: test_cluster
   gather_facts: false
   connection: network_cli
   become: yes
   become_method: enable
   tasks:
   - name: change config on MLNX-OS device
     onyx_config:
       backup: yes


The only change is adding "become" and "become_method" parameters which are required for enable_mode on Mellanox switches - without it, we can not read the running configuration.

Run it with:
ansible-playbook <new_filename>.yml -u admin --ask-pass. 
Configuration files will be saved to ./backup directory.

Troubleshooting

First of all, ensure that you have correct Ansible and Python installed.

user@somewhere:$ ansible --version

ansible 2.5.0
  config file = /etc/ansible/ansible.cfg
  configured module search path = [u'/home/user/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/lib/python2.7/dist-packages/ansible
  executable location = /usr/bin/ansible
  python version = 2.7.14+ (default, Feb  6 2018, 19:12:18) [GCC 7.3.0]


Currently, ONYX modules work best with python2.7.

If something doesn't work, first try to connect to target host by SSH and execute commands manually, see if that works. Then, increase verbosity of Ansible commands by adding -vvvv

Then, use this great troubleshooting guide:

http://docs.ansible.com/ansible/latest/network/user_guide/network_debug_troubleshooting.html

See best practices here (use ssh-keys for authentication instead of default admin user, for example):
http://docs.ansible.com/ansible/latest/network/user_guide/network_best_practices_2.5.html

If that helped, please endorse: https://goo.gl/RfjbnG

Fixing OpenSM service not running

So, we've set up OpenSM as a service in Windows, with Start Type = Automatic. https://wchukov.blogspot.com/2019/06/fixing-this-configura...