For those who have already evolved to NetOps, manual command-by-command execution and config template copy-pasting are things of the past. Netconf, REST-based APIs are becoming a must; if you don't have it - there are still options to automate. Among other orchestrating solutions, Ansible stands out - it's simple, SSH-based, and already has modules to rule the ocean of your network devices.
MLNX-OS CLI
By creating Cisco-like CLI, Mellanox inherited classic IOS problems: lack of rollback, configuration management, filtering options and other things (100 things why we love Juniper). Later, XML API was introduced - but it still looks like a wrap of old CLI commands (why not do Netconf?). To this day, the underlying interfaces in Onyx and MLNX-OS are still the same - yes, means we still have to feed old commands to these systems. Although, in ONYX new JSON-API has become available (for x86 platforms) - but, again, there was just a little markup magic required to create it. We can only wish for features present in NX-OS and Junos like shell access, wildcards, Linux-based filters and commands, and, additionally, an ability to integrate into a proper SDN one day.
Ansible 2.5
Before Ansible 2.5, the automation of Mellanox was pretty sad. Thankfully, it was possible to send multiple commands by ssh: https://community.mellanox.com/docs/DOC-2092.
With the release of 2.5, new network modules were introduced:
- onyx_bgp – Configures BGP on Mellanox ONYX network devices
- onyx_command – Run commands on remote devices running Mellanox ONYX
- onyx_config – Manage Mellanox ONYX configuration sections
- onyx_facts – Collect facts from Mellanox ONYX network devices
- onyx_interface – Manage Interfaces on Mellanox ONYX network devices
- onyx_l2_interface – Manage Layer-2 interface on Mellanox ONYX network devices
- onyx_l3_interface – Manage L3 interfaces on Mellanox ONYX network devices
- onyx_linkagg – Manage link aggregation groups on Mellanox ONYX network devices
- onyx_lldp – Manage LLDP configuration on Mellanox ONYX network devices
- onyx_lldp_interface – Manage LLDP interfaces configuration on Mellanox ONYX network devices
- onyx_magp – Manage MAGP protocol on Mellanox ONYX network devices
- onyx_mlag_ipl – Manage IPL (inter-peer link) on Mellanox ONYX network devices
- onyx_mlag_vip – Configures MLAG VIP on Mellanox ONYX network devices
- onyx_ospf – Manage OSPF protocol on Mellanox ONYX network devices
- onyx_pfc_interface – Manage priority flow control on ONYX network devices
- onyx_protocol – Enables/Disables protocols on Mellanox ONYX network devices
- onyx_vlan – Manage VLANs on Mellanox ONYX network devices
As you can see, the majority of those manage Ethernet-specific protocols, but a couple can be used to manage both Infiniband (MLNX-OS VPI) and Ethernet (MLNX-OS/ONYX) devices.
Editing Ansible configuration
We need to change /etc/ansible/ansible.cfg
There are a couple of things to do to prevent a headache while using Ansible, uncomment lines and change options as needed. These parameters, however, are not final and you can freely adjust them to your liking.
gathering = explicit
Gathering doesn't work properly with network equipment, therefore we say that we don't do it by default
host_key_checking = False
Crucial for devices management, SSH key checking sometimes causes timeouts and failed plays. Either you live with security drawback while disabling it, or maintain known_hosts list on your master host.
timeout = 30
Increasing SSH timeout from 10 to 30 for remote connections.
look_for_keys = False
Out of scope, because we do not use paramiko to connect to Mellanox devices, but still important for other devices.
host_key_auto_add = True
Same here, paramiko parameter to add new ssh host keys automatically.
connect_timeout = 60
Increasing persistent connection timeout from 30 to 60 seconds.
connect_retry_timeout = 45
Increasing retry timeout from 15 to 45 seconds
command_timeout = 30
Increasing the amount of time to wait for a command before timing out from 10 to 30 seconds (hello to slow and old PowerPC board inside)
Setting up inventory
Let's use standard Ansible hosts file /etc/ansible/hosts
[test_cluster]
192.168.0.[10:25]
[test_cluster:vars]
ansible_network_os=onyx
So, our Mellanox switches are have IP addresses starting from 192.168.0.10 to 192.168.0.25, they have default user admin with password admin configured, ssh access is enabled, and no enable_password configured. We also specify Network OS for Ansible to handle the CLI properly.
Creating playbooks
Create a .yml file with the following content:
- hosts: test_cluster
gather_facts: false
connection: network_cli
tasks:
- name: run command on MLNX-OS/Onyx device
onyx_command:
commands:
- enable
- show version
- show ntp
- show usernames
Here we specify that we're connecting to our test_cluster group using network_cli connection to execure a series of commands.
However, if you add some configuration commands,
- conf t- show version
- xml-gw enable
module will fail:
"msg": "onyx_command does not support running config mode commands. Please use onyx_config instead"
Okay, let's run our playbook:
ansible-playbook <filename>.yml -u admin --ask-pass -vvv
We will be prompted to enter SSH password (by default, admin) and pretty soon will see output lines:
"stdout_lines": [
[
""
],
[
"Product name: MLNX-OS",
"Product release: 3.6.6000",
"Build ID: #1-dev",
"Build date: 2018-03-04 16:48:04",
"Target arch: ppc",
"Target hw: m460ex",
"Built by: jenkins@2811f8c7d517",
"Version summary: PPC_M460EX 3.6.6000 2018-03-04 16:48:04 ppc",
"",
"Product model: ppc",
"Host ID: EC0D9ACED572",
"",
"Uptime: 14d 17h 28m 13.056s",
"CPU load averages: 1.37 / 1.18 / 1.09",
"Number of CPUs: 1",
"System memory: 268 MB used / 1759 MB free / 2027 MB total",
"Swap: 0 MB used / 0 MB free / 0 MB total"
],
[
"NTP is administratively : enabled",
"NTP Authentication administratively: disabled",
"",
"Clock is unsynchronized.",
"",
"Active servers and peers:",
" No NTP associations present."
],
[
"USERNAME FULL NAME CAPABILITY ACCOUNT STATUS",
"admin System Administrator admin Password set (SHA512)",
"monitor System Monitor monitor Password set (SHA512)",
"xmladmin XML Admin User admin Password set (SHA512)",
"xmluser XML Monitor User monitor Password set (SHA512)"
]
]
}
Of course, you can run Infiniband-specific commands just as easily:
- show ib ha
- show ib smnodes
Using onyx_config module
Here's an example of using onyx_config to back up running configuration:
- hosts: test_cluster
gather_facts: false
connection: network_cli
become: yes
become_method: enable
tasks:
- name: change config on MLNX-OS device
onyx_config:
backup: yes
The only change is adding "become" and "become_method" parameters which are required for enable_mode on Mellanox switches - without it, we can not read the running configuration.
Run it with:
ansible-playbook <new_filename>.yml -u admin --ask-pass.
Configuration files will be saved to ./backup directory.
Troubleshooting
First of all, ensure that you have correct Ansible and Python installed.
user@somewhere:$ ansible --version
ansible 2.5.0
config file = /etc/ansible/ansible.cfg
configured module search path = [u'/home/user/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
ansible python module location = /usr/lib/python2.7/dist-packages/ansible
executable location = /usr/bin/ansible
python version = 2.7.14+ (default, Feb 6 2018, 19:12:18) [GCC 7.3.0]
Currently, ONYX modules work best with python2.7.
If something doesn't work, first try to connect to target host by SSH and execute commands manually, see if that works. Then, increase verbosity of Ansible commands by adding -vvvv
Then, use this great troubleshooting guide:
http://docs.ansible.com/ansible/latest/network/user_guide/network_debug_troubleshooting.html
See best practices here (use ssh-keys for authentication instead of default admin user, for example):
http://docs.ansible.com/ansible/latest/network/user_guide/network_best_practices_2.5.html
If that helped, please endorse: https://goo.gl/RfjbnG