Sometimes you have to deal with servers that you don’t know anything about:
- You are a short temp IT consultant with not previous knowledge on the environment.
- The CMDB is out of order.
- You are on a DR situation.
- Or simply the main administrator is not there.
And you need:
- Run commands in parallel
- Get info from many servers at a time
- Troubleshoot DNS problems
- Check how many servers are up and running
Basic Orchestrator architecture
Both tools have a “reverse” client/server architecture where the nodes are connected to the server launching the commands.
SaltStack can run also in a masterless way too using the salt-ssh package:
This particular package provides the salt ssh controller. It is able to run salt modules and states on remote hosts via ssh. No minion or other salt specific software needs to be installed on the remote host.
What an orchestrator can do?
See how many of your servers are up and running right now.
With both tools you launch a ping-like command.
% mco ping server1 time=126.19 ms server2 time=132.79 ms server3 time=133.57 ms ---- ping statistics ---- 25 replies max: 305.58 min: 57.50 avg: 113.16
salt '*' test.ping server1: True server2: True server3: True server4: Minion did not return. [No response]
Stop/Start/Status services on many servers:
mco service status ssh * [ ==========================================> ] 1 / 3 server1: running server2: running server3: running Summary of Service Status: running = 3 Finished processing 3 / 3 hosts in 116.32 ms
salt '*' cmd.run '/etc/init.d/ssh status' server1: sshd is running. server2: sshd is running. server3: sshd is running.
- Mcollective (list from Bluemalkin’s blog) :
- puppet: manage Puppet agents (run a test, enable / disable, get statistics etc…)
- package: install, uninstall a package
- apt: upgrade packages, list number of available upgrades
- service: start, stop, restart a service
- nettest: check ping and telnet connectivity
- filemgr: touch, delete files
- process: list, kill process
- nrpe: run nrpe commands (check_load, check_disks, check_swap)
With SaltStack you can run almost every command-line tool:
salt '*' cmd.run 'uname -an' server1: Linux server1 3.2.0-4-amd64 #1 SMP Debian 3.2.65-1+deb7u2 x86_64 GNU/Linux server2: Linux server2 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt9-3 (2015-04-23) x86_64 GNU/Linux server3: Linux server3 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt9-3 (2015-04-23) x86_64 GNU/Linux
But you don’t have and orchestrator configured right now
And you need one. Let’s have a look to GNU parallel:
GNU parallel is a shell tool for executing jobs in parallel using one or more computers. A job can be a single command or a small script that has to be run for each of the lines in the input. The typical input is a list of files, a list of hosts, a list of users, a list of URLs, or a list of tables. A job can also be a command that reads from a pipe. GNU parallel can then split the input and pipe it into commands in parallel.
Usually it is used to spread high resources (CPU/Memory/etc) demanding task across multiple servers. But GNU parallel can be used as an orchestrator too in a similar way as MCollective and Saltstack.
Usually it already exists on the main Linux official repositories. It it doesn’t exist for yours, it can be downloaded from http://ftp.gnu.org/gnu/parallel/
apt-get install parallel
In order to launch password less commands to remote servers using ssh public-key you need the private key on your user’s home and the public key on the .ssh/authorized_keys. In a normal infrastructure this should already exist.
As there is a lot of literature about it I am not going there again: passwordless ssh setup
Also you need a list of servers to launch commands to:
cat > server.list server1 server2 server3
This is the magic command that will do the trick:
parallel --tag --nonall -j2 -k --slf server.list 'host google.es'
This command is going to launch the ‘host google.es’ command in the servers that are declared on the server.list file previously created.
–tag will tag the output with the server’s name the lines belong to.
–nonall this is useful for running the same command (e.g. uptime) on a list of servers.
-j2 the number of jobs to be launched in parallel.
-k Keep sequence of output same as the order of input. Normally the output of a job will be printed as soon as the job completes.
-slf file name with the list of the servers
And finally the output is:
server1 google.es has address 126.96.36.199 server1 google.es has IPv6 address 2a00:1450:4013:c01::5e server1 google.es mail is handled by 30 alt2.aspmx.l.google.com. server1 google.es mail is handled by 10 aspmx.l.google.com. server1 google.es mail is handled by 40 alt3.aspmx.l.google.com. server1 google.es mail is handled by 50 alt4.aspmx.l.google.com. server1 google.es mail is handled by 20 alt1.aspmx.l.google.com. server2 google.es has address 188.8.131.52 server2 google.es has IPv6 address 2a00:1450:4003:805::2003 server2 google.es mail is handled by 50 alt4.aspmx.l.google.com. server2 google.es mail is handled by 10 aspmx.l.google.com. server2 google.es mail is handled by 20 alt1.aspmx.l.google.com. server2 google.es mail is handled by 40 alt3.aspmx.l.google.com. server2 google.es mail is handled by 30 alt2.aspmx.l.google.com. server3 google.es has address 184.108.40.206 server3 google.es has IPv6 address 2a00:1450:4003:805::2003 server3 google.es mail is handled by 10 aspmx.l.google.com. server3 google.es mail is handled by 50 alt4.aspmx.l.google.com. server3 google.es mail is handled by 20 alt1.aspmx.l.google.com. server3 google.es mail is handled by 40 alt3.aspmx.l.google.com. server3 google.es mail is handled by 30 alt2.aspmx.l.google.com.
As server1 is located on a different place with different DNS configuration than server2 and server3, the output is different.
The command can be used for example to troubleshoot DNS replication problems between servers.
Other uses cases:
You can launch more or less the same commands we have seen before with SaltStack and Mcollective and any other that you can imagine. Yeah! Here your imagination is the limit:
- Populate your CMDB with the content of the /etc/resolv.conf of all your servers in a matter of seconds
- Check the ldap configuration of all you servers
- An “orchestrated grep” or “orchestrated log parser” for all your servers
- Install packages
- Patch servers
I will add more examples in a future. Here you are some examples:
Check connectivity with a local mysql database
parallel --tag --nonall -j2 -k --slf server.list 'telnet databaseserver 3306' server1 Trying 127.0.0.1... server1 Connected to localhost. server1 Escape character is '^]'. server1 S server1 5.5.43-0+deb7u1Ȉ%#}o5YA�Sy*a*M(zRi9!mysql_native_passwordConnection closed by foreign host. server2 Trying ::1... server2 Trying 127.0.0.1... server2 telnet: Unable to connect to remote host: Connection refused server3 Trying ::1... server3 Connected to localhost. server3 Escape character is '^]'. server3 S server3 5.5.43-0+deb7u1ǈzsS$XnB�Q\Z0|Dz\@hmgmysql_native_passwordConnection closed by foreign host.
In this case we can see that servers 1 a 3 have a mysql database installed and the mysql daemon is listening on the standard port 3306.
parallel --tag --nonall -j2 -k --slf server.list 'ping -c 1 google.es' server1 PING google.es (220.127.116.11) 56(84) bytes of data. server1 64 bytes from ea-in-f94.1e100.net (18.104.22.168): icmp_req=1 ttl=47 time=5.68 ms server1 server1 --- google.es ping statistics --- server1 1 packets transmitted, 1 received, 0% packet loss, time 0ms server1 rtt min/avg/max/mdev = 5.684/5.684/5.684/0.000 ms server2 PING google.es (22.214.171.124) 56(84) bytes of data. server2 64 bytes from mad01s24-in-f3.1e100.net (126.96.36.199): icmp_seq=1 ttl=45 time=156 ms server2 server2 --- google.es ping statistics --- server2 1 packets transmitted, 1 received, 0% packet loss, time 0ms server2 rtt min/avg/max/mdev = 156.197/156.197/156.197/0.000 ms server3 PING google.es (188.8.131.52) 56(84) bytes of data. server3 64 bytes from mad01s24-in-f3.1e100.net (184.108.40.206): icmp_seq=1 ttl=45 time=36.0 ms server3 server3 --- google.es ping statistics --- server3 1 packets transmitted, 1 received, 0% packet loss, time 0ms server3 rtt min/avg/max/mdev = 36.018/36.018/36.018/0.000 ms
All servers can ping google.es
A word of caution about orchestrators:
Remember what Uncle Ben says:
With great power comes great responsibility
Orchestrators are a very useful tools but they can do a lot of harm if they are not managed carefully. As you can launch an inappropriate command in many servers in parallel 🙁