Cloud deployment 101 – Part3

The final instalment

Over the last couple of weeks I have posted a Foundation to what the cloud really is and How to make the best use of your cloud. This week is about tying off lose ends, better ways of working, distilling a few myths and setting some things straight.

Infrastructure as code

DevOps is not the silver bullet, but it is a framework that encourages teamwork across departments to have rapid agility on deployment of code to a production environment.

  • Agile development
    • Frequent releases of minor changes
    • Often the changes are simpler as they are broken down into smaller pieces
  • Configuration management
    • This allows a server (or hundreds) to be managed by a single sysadmin and produce reliable results
    • No need to debug 1 faulty server of 100, re-build and move on
  • Close, co-ordinated partnership with engineering
    • Mitigates “over the wall” mentality
    • Encourages a team mentality to solving issues
    • Better utilises the skills of everyone to solve complex issues

Infrastructure as code is the fundamentals of rapid deployment. Why hand build 20 systems when you can create an automated way of doing it. Utilising the api tools provided by cloud providers it is possible to build entire infrastructures automatically and rapidly.

Automation through code is not a new concept, Sysadmins have been doing this for a long time through the use Bash, Perl, Ruby and other such languages, as a result the ability to program and understand complicated object orientated code is becoming more and more important within a sysadmin role, typically this was the domain of the developer and a sysadmin just needed to ”hack” together a few commands. Likewise in this new world, Development teams are being utilised by the sysadmins to fine tune the configuration of the application platforms such as tomcat, or to make specific code changes that benefit the operation of the service.

Through using an agile delivery method frequent changes are possible. At first this can be seen to be crazy, why would you make frequent changes to a stable system? Well for one, when the changes are made they are smaller, so between each iteration there is a less likely total outage. This also means that if an update does have a negative impact it can be very quickly identified and fixed, again minimising the total outage of a system.

In an Ideal world you’d be rolling out every individual feature rather than a bunch of features together, this is a difficult concept for development teams and sysadmins to get use to, especially are they are more use to the on-premise way of doing things.

Automation is not everything

I know I said automation is key, the more we automate the more things become stable. However, as automating everything is not practical and can be very time consuming, it can also lead to large scale disaster.

  • Automation, although handy can make life difficult
    • Solutions become more complex
    • When something fails, it fails in style
  • Understand what should be automated
    • Yes you can automate everything, but ask your self, Should you?
    • Automate boring, repetitive tasks
    • Don’t automate largely complex tasks, simplify the tasks and then automate

We need to make sure we automate the things that need to be automated, deployments, updates, DR
We do not want to spend time automating a solution that is complex, it needs to be simplified first and then automated; the whole point of automation is to free up more time, if you are spending all of your time automating you are no longer saving the time.

Failure is not an option!

Anyone that thinks things won’t fail is being rather naïve, The most important thing to understand about failures is what you will do when there is one.

  • Things will fail
    • Data will be lost
    • A server will crash
    • An update will make it through QA and then into production that reduces functionality
    • A sysadmin will remove data by accident
    • The users will crash the system
  • Plan for failures
    • If we know things will fail we can think about how we should deal with them when they happen.
    • Create alerts for the failure situations you know could happen
    • Ensure that the common ones are well understood on how to fix them
  • You can not plan for everything
    • Accept this, have good processes in place for DR, Backup and partial failures

Following a process makes it quick to resolve an issue, so creating run books and DR plans is a good thing. Having a wash up after a failure to ensure you understand what happened, why and how you can prevent it in the future will ensure the mitigations are put in place to stop it again.
Regularly review operational issues to ensure that the important ones are being dealt with, there’s little point in logging all of the issues if they are not being prioritised appropriately.

DR, Backup and Restoration of service are the most important elements of an operational service, although no one cares about them until there is a failure, get these sorted first.
Deploying new code and making updates are a nice to have. People want new features, but they pay for uptime and availability of the service. This is kinda counter intuitive for DevOps as you want to allow the most rapid of changes to happen, but it still needs control, testing and gatekeeping.

Summary

Concentrate on the things that no one cares about unless there’s a failure. Make sure that your DR and backup plan is good, test it works regularly, ensure your monitoring is relavent and timely. If you have any issues with any of these fix them quick, put the controls in place to ensure they stay up to date.

In regards to automation, just be sensible about what you are trying to do, if it needs automating and is complicated, find a better way.

Developers in a sysadmin world

Where to start

As I spend more time engrossing into the world of DevOps there’s been a number of occasions where something has not felt right with the relationship between operations and development. Within the team at work we will be hiring developers to work on the integration back with the development team, improving the build process, re-factoring the code to make it quicker to code / build / deploy All of which is good stuff.

However we fore mostly run a service, that is the main reason for the existence of the group. Without the correct support framework we will not be offering a service but instead offering a really fancy technology exercise so we can say we do X or Y or Z and I worry that is the route we are destined for.

Can Developers do Sysadmin tasks?

Of course they can, why couldn’t they, in the same way a mechanic can paint a car, we are not in the business of stopping people achieve their full potential, so have a go Jo is welcome here. The bigger question is if they should be doing it, in much the same way as I am not the right person to code a large enterprise product, Developers are not the right people to be making decisions about about service restarts or process niceness.

So I believe that a Developer can do the tasks of a Sysadmin, I believe that with enough training they can get to a point where they are not making random changes to a system to fix a specific problem with out understanding the consequences. However, I also believe a good graduate would provide the same level of risk and knowledge to bring to the table, so having an understanding of programming is a plus but Sysadmins aren’t in the business or random changes.

Can a Sysadmin be a Developer?

Sure, Why not, Same role in reverse? Almost. So I can program or have programmed in a number of languages which I tend not to bring up so…

  • Pascal
  • Delphi
  • C/C++
  • Java
  • PHP
  • Javascript
  • Perl
  • Ruby (as of a couple of weeks ago)
  • Bash
  • Awk
  • So I have quite a few languages that within about 10 mins and a few nudges on google I can write something reasonable, I have made a lot of different applications (another list I hear you cry out for!)

  • Maze solvers
  • Text based adventure games
  • Arkanoid
  • Web based route planner, Granted I only drew the maps form 6 million data points with zoom functionality
  • Content management system…
  • Geoblog with google maps and email updates / geo tagged pics
  • Web shops
  • System monitor with averages and weekly summaries
  • bit stream cypher
  • cd to mp3 encoder with CDDB lookups
  • Just 10 things I’ve written, So I would say I know enough about programming, probably more than necessary for my role.
    And yet I still have no interest in being a programmer. So shoe on the oppersite foot, maybe I should be doing some more developer focused work. It’s been a while but it could be just what I’m after.

    Based on that I already am a part time developer much in the same way that a Developer is a part time sysadmin, I mean their programs run on systems right…

    Who should do what?

    Well, Developers code, sysadmins admin.. I don’t think it get’s harder than that. I think it is easy for everyone to agree that the Developer will be best spent writing code and helping out with specific system scripts or puppet manifests / capistrano. It is also very easy for everyone to say that the sysadmin should check the RAM utilisation, RAID configuration, Disk layout etc etc.

    If all of that was correct this blog would end here, however; over the past few months something has been niggling at me and every now and then i’m involved in a conversation with an Developer which is ultimately “It’s not that hard just do X or Y” and it’s this which I have the biggest issue with developers on.

    Let’s take rolling out a new Amazon AMI:

    Developers approach

  • Deploy new server with AWS tools
  • Login
  • Done
  • Sysadmin approach

  • Start Deploy new server with AWS tools
  • Pause, because deploying keys isn’t a good idea or secure if everyone is using the same one…
  • Continue but with a generic “emergency key” configured
  • Check file system layout
  • Realise it’s all on one 6gb volume, fix issue
  • Create individual users
  • and so on…

    Different skill sets, Any monkey in a suit can click on a few buttons in a web UI, Knowing that splitting out the linux file system to different partitions, or at least understanding the impact of that is important.

    Summary

    I think the two skill sets can work harmoniously, but there is still a boundary, caused by experiences and expertise and for DevOps it’s about using each other strengths and avoiding the weaknesses. There’s been times when i’ve been doing OOP PHP or working with inheritance or writing a very complicated script where having someone that I can say is this or that better than that or this? so having someone around would be good I imagine it works both ways, especially when it comes to configuring the system.

    Developers tend to be very focused from my experience and because sysadmins are more generalists they are looked down on, I hope as DevOps becomes more common place and the realisation of harmony comes about. Lets see if in 6 months there’s another post on about how disastrous or successful this integrated approach becomes, it is new ground and it will be interesting to find out what happens if nothing else.