Why is the UK Government struggling with IT?

Recently I’ve had an insight into how the government is conducting it’s IT. It’s eye opening. If I had to sum it up, it’s like applying 1990’s approaches to current practices, but what does that mean, let me explain.

Many moon’s ago, the UK government decided it was not an IT specialist and that it should not be in the business of running IT. This lead to it outsourcing a lot of its core IT to often large organisations who on the whole I imagine did make it much more cost effective, most likely at the cost of efficiency. Fast forward and over the years there have been many changes to encourage open source technologies and agile methodologies no doubt to address these inefficiencies that have crept in. It was probably not predictable that if you out source it there’s no incentive for them to do it efficiently. Even now with the new technologies there’s mindset issues holding back on unlocking the benefits to a DevOps style approach.

devops

What do I mean by a DevOps style approach? I mean that people should be encourage to use evidence to drive decisions, Automate the process, be encouraged to fail and measure the results against expectations. Based on what I’ve been able to observe over 2 months there’s a number of issues stopping the deliver of true efficiency and agility.

  1. Lack of technical management
  2. Too many consultant / IT outsourcing companies
  3. Mentality and approach barriers

Let’s tackle the causes of these and how to fix them one at a time.

1, Due to the outsourcing approach and the ideal that government does not do IT the technical decisions are being┬ámade by people in the government who are not adequately having the ┬áproblem explained to them. This is happening because the consultancies do not wish to harm their business by being seen as being ‘difficult’ or by saying “No” to release schedules. This also does not allow for a clear strategy to be defined for the entire department to work towards as there is not central guidance for the consultancy groups to adhere too. In 2 months I have not met anyone who is in technology within the department I’m working that is a position above me.

2, Due to the preference these days to use small consultancies and to bring in subject matter experts in various fields there is a lack of cohesion between styles and approaches to delivery. Everyone is pulling in slightly different directions, this is made worse by the lack of central technical leadership which is why it is the biggest problem that needs fixing.

Stop trying to change things

3, I am not lying when I say I have been told every week for 8 weeks “Stop trying to change things”. There is a culture that has developed where if you speak out you are seen as difficult. If you suggest improvements that require sign off or budget it does not happen because people are scared to approach the subject as they will have to justify how it speeds up delivery, no one cares about cost savings or efficiencies. People can and will change their outlook when it can be evidenced that change is benefiting the process but that requires point 1 again, good leadership and the ability to explain back reasons why fixing things up is worth while.

I’m not saying this is easy, there are payment deadlines that need to happen. It was pointed out to me that when it comes to government if it says it is going to do it it really must, no if’s or buts. Any failure is political canon fodder and the general public are entirely scathing when things don’t happen on time. This makes it harder than it would typically be, however, not impossible.

As a UK taxpayer I’m utterly shocked at how bad it really is, I had suspicions based on what I’d heard or read but I wasn’t expecting people to be so worn down and unwilling to change or for the consultancies to be so scared of speaking up but I guess when you have a revenue stream dependant on the government you wouldn’t want to cut that off. I guess ultimately my concern is that new technology will be implemented without fixing the root cause of the issue, the culture and organisation. No doubt if anyone was to read this of any power they would insist they have checks and measures are in place to ensure this can not happen, the reality is the people deciding are not qualified to know if it is a good deal or not.

How to do DevOps in an enterprise

Overview

Over the years we have seen fads come and go and trends ebb and flow as business requirements and drives change. DevOps has however been rather resilient so much so that all of the enterprises believe they now need it, the interesting point here is that they still don’t know why, they just know everyone else has it and if they don’t their behind the times.

In some ways this in its self is worrying because the enterprises need help and guidance. They really need to be reading through this book: Next Gen DevOps: Creating the DevOps organisation

There’s an increasing trend to have the DevOps teams literally sit between Dev and Ops, companies doing this should stop. Here’s a couple of thing not to do and examples.

What Not to do

Sky: full post assuming that is not there see this pic:
Screen Shot 2015-03-25 at 22.48.49

Defra: I know from people that work there and various jobs that they implement the same mechanism, A team makes the app, a team builds the app and packages it and another team deploys the app, so many broken processes I feel for them but it is ultimately what has lead to these – 1 & 2

What both of these enterprises have done is jump on a band wagon and some how understood DevOps to be the thing that makes deliveries smoother and therefore sit them between the dev and the ops, mainly as they have an existing Dev team that works and an existing Ops team that works but they don’t have a DevOps team.

WallOfConfusion_Release

Lets examine this, Dev one side Ops the other. DevOps is meant to, and I’ll spell this out simply for Sky and Defra…. break down the wall of confusion. Not as these guys have done which is to implement DevOps between a Dev Team and an Ops team to help make it more agile.

If Anyone, Anyone… can explain how adding a team in the middle of your existing Dev and Ops team magically brings you improvements in agility I’d be interested in knowing how. Before you say something like… “They make all the tooling and stuff to make it all quick and to deploy the continuously” You should refer to the book from before and go get a job in DevOps (you’re not in it by definition, sorry). We’ve always had Build engineers for that so your argument is to basically do what we always I’d but with more people who are slightly less code focused, alright…

Then there’s the people that think simply by making Developers do operations they get DevOps, Infrastructure as code they say, yay. I’m not saying developers can not learn operations, likewise this rant applies to ops doing dev… anyway the point is forcing one skil set to do the other is hard.

What to do

Firstly realise there is no right answer. Secondly realise experimenting and trying new things is necessary.

Start with a good foundation. Get a Senior Dev and a Senior DevOps guy in the team, for the DevOps guy I consider these the 3 most important things.

1, Can program in an Object Orientated language and understands Classes, modules, inheritance, recursion etc etc
2, Has a fundamental understanding of how the OS works at the filesystem and process level
3, Appreciates pragmatism over perfection

The hard thing for anyone going from Dev to Ops or Ops to Dev is that they have a massive learning curve, and they may not realise what it is until many screw ups have happened but having these people in the team is massively important because they will bridge the gap between the Dev and the DevOps and trust me there is one.

Once you get these 3 or 4 people in a team the next most important thing to do is to make that team fully accountable and responsible for the end to end service. If anyone in the team is not okay with supporting the service they wrote and designed in production, fire them and find someone new. it’s paramount to have the buy in within the team of accountability of the product and services in production, it forms a fundamental pillar of continuous feedback. I did something bad, it bit me / my team, they told me I did it bad / I noticed I did something bad, I made it better, we all benefited.

Summary

By having the right people working in a close nit group and empowering them to own the whole solution front to back is the only way you can realistically implement DevOps and have meaningful gains.

Foundation building is important

The man who built his house on sand

You are probably familiar with the proverb about the man who built his house on sand, if not read this. It’s important to have a solid foundation to work from when you want to start considering Continuous Delivery (CD) or Continuous Integration (CI).

From an IT perspective this would be like a CTO dictating that CD is the only way to do things; which when poorly managed leads to something that is poorly tested, poorly structured and hard to innovate on. By the time the pessimistic IT bod mentioned it to his boss and it was turned into management speak, then translated to senior management speak it ended up being mistranslated into something completely different.

IT bod “It’s taking ages because the puppet manifests are a complete mess where we had to keep rushing stuff”
IT Bods’ Manager “It’s taking longer than expected as the work is more complicated but it will be done soon”
IT Director “We are spending our time making sure we do this right, we don’t cut corners”
CTO “We have a really stable well produced system”

Yay. I’m 90% sure this is how it works… People become afraid to say how bad it is, but from experience I can honestly say when you start telling people bluntly they stop hassling you, they also stop talking to you so it is a hard thing to make better, it’s harder when the whole train of people desperately want to come across as having done an awesome job.

Imagining that situation, and adding in people that are brought in to deliver just that, while being asked to do lots of other stuff that isn’t in scope, you can end up with something that with lots of careful hand holding produces a build, maybe it even builds an environment with only 2 or 3 hours hand holding, maybe it’s good enough for production using virtual box Who knows.

Typically these nightmarish situations exist only because someone wasn’t clear in defining what the problem was, or when they did they allowed themselves to be pushed over. Well I’m saying it’s not good enough, everyone in the chain has a responsibility to make sure they communicate in clear and uncertain terms what the problem is so there is no ambiguity about how bad a situation is.

foundations

The latest trend at the moment is all towards Continuous Delivery (CD) and Continuous Integration (CI) and all these over wonderful DevOps words. Although it is possible for you to take code and deploy it automatically it is stupid to do so without a sufficient understanding of what the consequences could be. As such it is important to identify what you need to be able to deliver effectively before working out what you need to do to achieve CI or CD.

So before considering CD or CI you need to be able to do the following things, minimum:

  • Easily differentiate between each configuration release
  • Easily differentiate between each infrastructure release
  • Easily differentiate between each application release
  • Be able to build each application server from scratch
  • Be able to build the infrastructure from scratch
  • Be able to track work through a process i.e. request to release for new Infrastructure, Application code or configuration
  • Have an agreed process for peer review of changes
  • Have an agreed release process
  • Be able to manually follow the processes that are in place
  • Adequate test coverage of infrastructure
  • Adequate test coverage of Configuration
  • Adequate test coverage of Application

Once you have those basics in place you can start to look at automating each step, Skip the list at your peril. Let’s touch on a few for clarities sake. “Easily differentiate between XXX” The reason for these is that at some point someone will say “it’s not working and you broke it” and you want to turn that from an opinion based approach into a factual one, and the easiest way to do that is a simple diff between the previous and the current release, no ambiguity, only facts.

Lets look at the “Be able to build XXX from scratch” This is really important, the only way to guarantee that your box is in the state you know it to be in is to build from scratch, use an golden disk, AMI or plain OS, it doesn’t matter as long as you bring that box up from scratch and build it through to a working state (ands off). I’ve had conversations with people that don’t get it, some times the arguments go like this… “We don’t need to because everything is is puppet” well, Lies… No one puts everything in puppet and even if you did, I logged on and stopped the process or I installed a package that wasn’t in puppet or I started a service or I changed a file that was’t etc etc etc… No excuses, build from scratch; it’s really important for the message this sends to the rest of the business which is consistency through process.

Processes are important, they describe the things you will and won’t do, they need to be public, they need to be really simple and they can then be automated, Starting without a process is just going to mean re-working steps as others in the business have different opinions about how it should be done so it’s good practice to sort that out as soon as possible.

The last set “Adequate test coverage of XXX” This needs to be in place beforehand, these tests will become your computerised approver so at the very least it should do everything the human counterpart does to check the system and they need to evolve as time goes on to include more and more tests, when the confidence is in the testing it shouldn’t matter when you release or ho often as you have a set of tests that you and the business trusts.

Summary

It’s important to try and not rush into the final solution, everybody wants it, it’s everyone responsibility to check and cross check that the process is being done sensibly and to call foul if anyone tries to change the process or the requirements. The only way to do this is with some sort of consistency and that should be the driving force, the business needs to accept that if the pipeline is broken the releases don’t happen. but when the pipe line is fixed they should all go fine. This turns the whole release cycle into a maintenance process rather than an active involvement in each release and that will over time be more and more stable and beneficial to the business as a whole. So before trying to do CD or CI, make sure you can put ticks next to the bulleted list above else you’re just wasting time.

Releasing your first Devops Application

First the worry

When it comes to releasing the first version of an application it’s always worth weighing up the constraints of your environment and the time frame in which the task was delivered versus the skill set available. Inevitably as a skilled DevOps professional you want to do a good job, well done you; however you have to be strong and realise it is not about delivering perfection from day one but about the journey you must take to get there.

I recall the first deployment I did for a version 1 and every time I do one since then I get better, be it a bit more focused or a better starting point. The very first one I did was all over the place, no real configuration management, quite a few manual steps but a well written process, unfortunately that project remained in the depths of secrecy and I ended up moving on.

Constantly I see over engineering and complication added to projects and the root cause of this is worry, I know, I use to be there doing it, it is difficult to step back and be objective to what the business needs, but as a DevOps professional that is your job. When delivering a solution try and remember these things to help you worry less and focus more:

  1. Before being perfect you must first just “be”
  2. When in doubt, do less
  3. If you do not know when the site is down you will not have a job
  4. Always have a backup

Then the delivery

The above list is rather quite useful, use it as a bit of guidance. Starting with point 1, some elaboration; when delivering a solution the most important thing is to deliver the solution, so many people forget this part and focus on the technicalities or whether or not it is the “best” way to deliver the solution. In reality, who cares, no one will care when you are in that meeting explaining why you’re late and have not got a working solution.

Getting stuck in the detail is a horrible place to be and sometimes it gets too involved or too complicated leading to much discussion and inevitably the solution comes out complicated and will take a while to deliver, in these situations point 2 comes in, just do less. It sounds silly but if you’re rushing around struggling to meet a deadline then you need to take things out of scope, and focus on what the actual solution needs to be, maybe you have to have a manual step, then at a later point you can automate it.

The last two points are along the same lines, and those lines are things that get you fired. If your site is down and you don’t know that it is totally down, that’s a bad thing; likewise loosing data is considered pretty poor. However do not get stuck in the trap of assuming you must have full monitoring of every server or that the backup needs to be anything more than a cron job for now.

The “trick” is always around identifying what needs to be done and could be done, by focusing on what needs to be done first you can then come back to improve the rest.

build, improve, rinse, repeat

As touched on earlier You are allowed to cut corners and focus on what is necessary, failure to do this will just lead to delays and a business that is getting rapidly turned off of DevOps. The first release you do can be complete and utter crap, it can be all manual, with nothing more than a simple web check on port 80, that is okay. The important thing is you deliver to the deadline, You have mitigated the main risks of not knowing when the site is down or the potential loss of data, heck even having single points failure are allowed as long as you can clearly identify what the risk is and a solution if that were to happen. In fact, I’d almost go as far to say this is expected.

The key is as always to improve, little and often. Step 1, Manual, Step 2, automate what is easy, Step 3, automate the rest. It has never been and will never be about perfection from version 0.1 onwards you just need to improve a little each time in line with that golden view of what perfection is. As long as you know what the end goal is you can work towards it, just don’t get carried away by trying to deliver it all for the first version.

Sinatra – partial templates

Singing a different song

Firstly apologies, it’s been over a month since my last blog post but unfortunately with holidays, illness and change of jobs I’ve been struggling to find any time to write about what I’ve been doing.

A few months back I did a little research into micro web frameworks, did quite a bit of reading around Sinatra, Bottle and Flask. To be honest they all seem good, and I want to play with Flask or bottle at some point too, but Sinatra is the one I’ve gone for so far as he documentation was the best and it seemed the easiest to use and he easiest to extend, not that I’ll ever be doing that!

Either way I thought I’d have a bit of a play with it and see if I could get something up and working and, locally for now due to a lack of documentation from my hosting provider… I have re-created my website Practical DevOps within Sinatra using rackup, bundler and ERB templates.

Now there’s a few reasons I did this, one, I wanted to stop updating every page when ever I needed to update the header or footer of a page, two, I want to implement Split testing (also know as A/B Testing) using something like Split. With all of this in mind and the necessity of having a bit more programmatic control over the website it seemed like a good idea to go with Sinatra.

The Basics of Sinatra

By default Sinatra looks for a static content in a directory called “public” and will look for templates in a folder called “views” which if needed can be configured within the app. So for basic sites this works fine and would have worked fine for me, but I really wanted partial templates to save having to enter the same details on multiple pages, and this can be done with Sinatra partial.

!/usr/bin/ruby

require 'sinatra'
require 'sinatra/partial'
require 'erb'


module Sinatra
  class App < Sinatra::Base

   register Sinatra::Partial
   set :partial_template_engine, :erb

    #Index page
    ['/?', '/index.html'].each do |path|
      get path do
        erb :index, :locals => {:js_plugins => ["assets/plugins/parallax-slider/js/modernizr.js", "assets/plugins/parallax-slider/js/jquery.cslider.js", "assets/js/pages/index.js"], :js_init => '<script type="text/javascript">
        jQuery(document).ready(function() {
            App.init();
            App.initSliders();
            Index.initParallaxSlider();
        });
        </script>', :css_plugins => ['assets/plugins/parallax-slider/css/parallax-slider.css'], :home_active => true}
      end
    end
  end
end

So let’s look at the above which is simply to serve the index of the site.

   register Sinatra::Partial
   set :partial_template_engine, :erb

The register command is how you extend Sinatra, so the sinatra-partial gem when it is installed simply drops it’s code in the sinatra area and when you call register all of the public methods are registered, this allows you to do stuff like this with magic, or you can use it in the ERB template like this. The next line simple tells sinatra to use ERB rather than haml, I chose this because of puppet and chef all using erb and as a result i’m a lot more familiar with that.

 #Index page
    ['/?', '/index.html'].each do |path|
      get path do
        erb :index, :locals => {:js_plugins => ["assets/plugins/parallax-slider/js/modernizr.js", "assets/plugins/parallax-slider/js/jquery.cslider.js", "assets/js/pages/index.js"], :js_init => '<script type="text/javascript">
        jQuery(document).ready(function() {
            App.init();
            App.initSliders();
            Index.initParallaxSlider();
        });
        </script>', :css_plugins => ['assets/plugins/parallax-slider/css/parallax-slider.css'], :home_active => true}
      end
    end

One of the nice things with sinatra is it’s simple to use the same provider for multiple routes, and the easiest way of doing this is to define an array and simply iterate over it for each path. Sinatra uses the http methods to define it’s own functions of what should happen, so a http get requires a route to be defined using the “get [path] block” style syntax and likewise the same for post, delete etc, see the Routes section of the sinatra docs for more info.

The last section is calling the template, so typically the syntax could just be “erb :page_minus_extension” which would load the erb template from the “views” directory created earlier. If you wanted to pass in variables to this you would define a signal ‘:locals’ which takes a hash of variables. All of these variables are only available to the the template that was called at the beginning, so to get the variables to the partial requires some work within the template.

Now within the the views/index.erb file I have the following:

<%= #include header
partial :"partials/header", :locals => {:css_plugins => css_plugins, :home_active => home_active}
%>

Partial calls another template within the views directory, so as I have a partial called header.erb in views/partials/ it loads that, and by defining the locals again from within the template I am able to pass the variables from index into the header or any other partial as needed.

Okay, that’s all folks, Hopefully that’s enough to get people up and running, have a good look at the examples in the git projects they’re useful, and be sure to read the entire Intro to sinatra, very useful!

What challenges you?

Over the last few weeks

I have been wondering what most people find challenging in the “modern” IT world. There’s been a recent upsurge in tools and technology that address most problems which only leaves me to wonder what is filling that gap? What is the current big annoying problem, maybe it’s not being able to push your architecture into multiple clouds, or having to live with the constraints of small root disk volumes; Who knows? Hence the poll :)

Configuration management alone is not the answer

Everything in one place

Normally when businesses start out building s product, especially those that don’t have the pre-existing knowledge of configuration management, tend to just throw the config on the server and then forget what it is. This is all fine, it’s a way of life and progression and sometime just bashing it out could prove very valuable indeed, but typically this becomes a nightmare to manage. Very quickly when there is then 100 servers all manually built it’s a pain in the arse so then everyone jumps into configuration management.

This is sort of phase 1, everything has become too complicated to manage, no one knows what settings are on what boxes and more time is spent working out if box 1 is the same as box 2. This leads to the need to have some consistency which leads to configuration management, the sensible approach is to move an application at a time into configuration management fully, not just the configuration files.

During this phase of execution it is critical to be pedantic and get as much as possible into configuration management, if you only do certain components there will always be the question of does X affect Y which isn’t in configuration management? and quite frankly, every time you have that conversation a sysadmin dies due to embarrassment.

Reduce & Reuse

After getting to Phase 1, probably in a hack and slash way, the same problems that caused the need for Phase 1 happen. 100 servers in configuration management lots of environments with variables set in them, and servers, and in the manifests themselves and the question starts to be come well is that variable overriding that one, why is there settings for var X in 5 places, which one wins? Granted in configuration management systems there are hierarchies that determine what takes precedence but that requires someone to always look through multiple definitions. On top of having the variables set in multiple locations, it is probably becoming clear that more variables are needed, more logic is needed, what was once a sensible default is now crazy.

This is where phase 2 comes in, aim to move 80%+ of each configuration into variables, have chunks of configuration turned on or off through key variables being set and set sensible defaults inside a module/cookbook. This is half of phase 2, the second half and probably the more important side is to reduce the definitions of the systems down to as few as possible. Back in the day, we use to have a server manifest, an environment manifest and a role manifest each of these set different variables in different places, how do you make sure that your 5 web servers in prod have the same config as the 5 in staging? that’s 14 manifests! why not have 1? just define a role and set the variables appropriately, this can then contain the sensible defaults for that role, all other variables would need to be externalised in something like hiera, or you would need to push them into Facter / ohai.

By taking this approach to minimising the definitions of what a server should be and reducing it down to one you are able to reuse the same configuration so all of your roleX servers are now identical except what ever variables are set in your external data store which can now easily be diff’d.

build, don’t configure

By this point, phase 1 & 2 are done, all is well with the world but still there’s some oddities Box X has a patch level y and box A has a patch level z, or there’s some left over hack to solve a prod issue which causes a problem on one of the servers. Well treat your servers as configurable and throw-away-able, There’s many technologies to help with this be it cloud based with Amazon and OpenStack or maybe VMWare, even physical servers with cobbler. This is Phase 3, build everything from scratch every time, at this point the consistency of the environment is pretty good leaving only the data in each environment to contend with.

Summary

Try and treat configuration management as something more than just config files on servers and be persistent about making everything as simple as possible while trying to get everything into it. If you’re only going to manage the files you might as well use tar’s and if that sounds crazy it’s the same level as phase 1 which is why you have to get everything in and I realise it can seem a massive task but start with the application stack you’re running and then cherry pick the modules/cookbooks that already exist for the main OS components like ntp, ssh etc