A bold new Ruby world

It’s something different

For a long time now I’ve been put off by Ruby, my interactions have been limited and most of my understanding of Ruby comes from Puppet. I’ve found it a bit of a pain, but the truth is that has nothing to do with Ruby as a language, it was more the packaging of gems and so forth. I really like the idea of yum repos and packaging systems, but Ruby uses gems, I still have no idea what they really are other than libraries to be used by your Ruby program, either way sometimes because of the quirks of yum repos and the lack of maintenance you aren’t always able to get the right version of the rubygems that you need for the application you are running. This alone was enough for me to avoid looking at Ruby as a go to programming language of choice.

In the past I’ve traditionally done my scripting in Bash, if things got difficult in Bash or it wasn’t quite suitable I’d fall back to Perl or PHP (PHP is definitely my go-to language) but with that said I can count the number of scripts I’ve had to write in Perl on my hands, and where possible I always go with Bash. Why? Well Why not? It’s easier for most sysadmins without programming backgrounds to follow as in most cases you are using system commands combined within a framework of programming.

Which leads onto an interesting side note, Why are there sysadmins that can’t program! I guess it happens, and to be honest I was once described as having “no natural programming ability” by one of the college tutors, so i’m not saying I’m good. I do think that every sysadmin needs a fundamental understanding of conditionals, operators, looping and scoping… Again not saying I’m brilliant but I’ve had to learn and I also force myself to learn by writing scripts for things. A sysadmin who can’t write a script is as good as a lithium coated paper umbrella in a thunderstorm.

Moving along

So what was my first venture into Ruby? A simple monitoring script for Solr indexing. I thought about doing it in Bash, then quickly changed my mind, in short I was dealing with JSON and needed a slightly better way of dealing with the output and an easy way to deal with getting the JSON.

This is something I’ve done in the past but within PHP, so I thought it’d be a good comparison. I can honestly say I was rather surprised at how easy it was to get it working, I managed to Google for some code that got the JSON data and understand its use really easily, it wasn’t all obfuscated like some Perl stuff can be.

From what I can see thus far it is quite a reasonable language, its got some useful features and some flexibility but rather than being like Perl with hundreds of ways to do the same thing it has a small selection of ways to do each thing, so you can choose an appropriate style or just one that suits your coding style.

I’m tempted to start writing something a little more complicated to see how it is with that, I have no doubts it’ll be okay, but until I try I will not know.

So what did my first adventure into Ruby look like:

require 'rubygems'
require 'json'
require 'net/http'

def get_metric(query, base_url)
	url = base_url + "?" + query
	resp = Net::HTTP.get_response(URI.parse(url))
	data = resp.body
	# we convert the returned JSON data to native Ruby
	# data structure - a hash
	result = JSON.parse(data)

	# if the hash has 'Error' as a key, we raise an error
	if result.has_key? 'Error'
		raise "web service error"
	return result

#	Get Arguments or default

if ARGV.length == 0
	print "You must specify one of the following options\n\n-u\thttp://example.com/path\tREQUIRED\n\n-q\taction=REPORT&wt=xml\n\n-i\tindexname\tREQUIRED\n"
	for count in 0..ARGV.length
		case ARGV[count]
		when "-i"
			if ARGV[count+1] != nil
				count += 1
				print "No argument for option -i\n"
		when "-q"
			if ARGV[count+1] != nil
				count += 1
		when "-u"
			if ARGV[count+1] != nil
				count += 1
				print "No argument for option -u\n"
	if (url == nil || index == nil)
		print "You must specify a URL with -u <url> and -i index\n"

rs = get_metric(query, url)
	lag = rs["Summary"][index]["Lag"]
	print "Invalid Index\n"
regex = Regexp.new(/\d*/)
lag_number = regex.match(lag)
print lag_number, "\n" if lag_number !=nil

Something like that. I Know its not brill, but it’s a starting point, the next thing I write is probably going to make this look rather small by comparison.

Anywho, that’s all on Ruby, wonder if it’ll catch on.

Understanding Risk

The short version

Stuff happens, move on.

The long version

Risk management is a really interesting topic, I know there will be lots of people out there falling asleep at just the thought or risk management, well to you I say Hah! If you find risk management dull you’ve probably never had the fun of thinking through 101 different ways in which something could fail, and that requires a great use of imagination!

When considering risk there is a tendency from a sysadmin point of view to get stuck in the technical detail, i.e. if Node X dies we lose service Y; which is fine, that is a valid risk, but moving past this is kinda vital, predominately as most technical risks can be avoided with change processes or redundancy and high availability. After the technical risks you end up in environmental risks, “what if…” risks, for example “What if a power failure occurs” Great, these are environmental and you’ve chosen a provider that has UPS’s, Wonderful, Do they have generators? Diesel stored on site? In multiple containers? with a deliver schedule with multiple suppliers in the event of an emergency? Divergent power sources?

Okay, nothing to panic about here, these are just common sense issues, regardless of all of the mitigations that are in place you could just run 2 sites, 30 miles apart. So what if you are using the same provider for your 2 sites, what about the financial collapse of your hosting provider?

Okay so being totally paranoid, You have 2 providers each 30 miles apart, each with UPS, redundant generators, divergent power sources, SLA’s with fule providers, free air cooled data centre with backup air conditioners. Great, Good job…. Wrong! Where’s the backups? Are they both in the same Country? same Planet?

I guess the laboured point is you can’t mitigate everything, even if you think you can, you can’t.

So what do you do?

Kick back and relax, the problems will solve themselves! Not quite, but not far from the truth either, you have to be pragmatic, you have to consider what level of risk is affordable and justifiable. Remember that mitigating risk often costs money, and it is very easy for Senior management bods to pull you over hot coals when something fails and they will ask “How did this happen?”, It’s probably worth noting at this point you do not want to reply with “We didn’t have a suitable DR plan” That’s not going to wash.

Luckily for you, you just have to come up with all the risks you can and a number of solutions that mitigate against varying numbers of risks, let someone else make the call about what is an acceptable amount of risk and what can we live with.

It may also help to plot your risk management strategy against your year long or three year long strategy or against growth of the solution so there are known points at which a certain amount of resilience is needed.

For example, You launch a new website, you don’t know if it will be popular or not, you don’t know if it will be profitable. So for this solution, what is wrong with just ensuring you have a decent backup, even if it is to local disk and not “offsite” that’s better than nothing NB I would highly recommend you at least make a regular local copy, or better yet store the website in SVN as well and back that up…

This solution has a cheap and reasonable risk management policy, it may occasionally go down for an unknown period of time, Worse case scenario you have to apologise to all the users, promise to make it better and actually make it better (always do what you say you are going to do…)

As time goes on you can always add in additional sites and better backups. Always go for the solution that gives you the best bang for your buck. i.e. If you need off site backups, why not run two sites in high availability and do local backups in each, more throughput and better resilience.


You can not mitigate everything, so don’t try, look at what really is important, make sure you can recover. Have a plan that if customers hit number X or the solution profitability reaches y% you’ll add in the additional risk mitigation.