Oh no, not java

How strange

Over the past few months I’ve been writing more and more applications to help maintain and deliver the services we run, from metric gathering to regional dr and anything in the middle. For A while now one of the developers at Alfresco has been writing a framework that makes it easier to write selenium tests for Alfresco share which takes a lot of the hassle out of looking for certain elements or class id’s or updating everything if the UI changes. So we have been talking about it for a couple of months and today I decided to get some time to look at it and ask loads of silly questions about eclipse and maven and so on and so forth.

It took about 3 hours to get everything set up and working, most of the time was just spent learning to use eclipse and maven with a walk through of what it can do, how to extend it and how to do stuff with it. Considering I hadn’t done any Java for 6 years it wasn’t that bad and within 15 mins of being left to it I had made a class that loged in and searched for content inside the repo.

One of the reasons we’re so interested in it is because as DevOps we like simple things and it takes a lot of the hassle out, it means we also get to do some complicated things with Share and we only have to worry about what we want to test or measure. All of this got me thinking about the languages we use and the problems they solve.

Right tool, Right job

Currently in our team we are using bash, ruby, python and java. Bash is simple and can achieve some good results although typically quite slow, typically if it is a short script it will end up in bash, although we do our orchestration in bash and it manages the bear metal to working OS by triggering what ever apps we need or setting config.
Ruby is the language of choice for me when I need to do something that requires data to be manipulated or retrying actions or anything that is more than procedural and you can rely on it to do a good job in a reasonable time.
Python is new to the team, it feels a gap which is that it’s as easy to write as Ruby but is more scalable at size, I haven’t done any python yet so I can’t really comment but the web app that has been built with it in a couple of weeks is quite impressive. Java is more complicated harder to write but can offer more complex apps, but typically I’m not sure that you need to make apps that complex.

So I’m not a fan of Java, but mainly because I think it takes a long time to get anything of any value out of it, especially on a small task. If I had to write an application to manage backups I would not go to Java as it’s like using a bazooka to hit a fly, likewise using Bash is like using a feather duster where as ruby and python fit nicely in the middle. Well after todays experience I’m glad I’m doing it in Java, I would have spent weeks making something half as good in Ruby to just avoid using Java and I guess it’s not really that bad.

I could have wasted time doing it all from scratch or just take what’s already written, so I stole like any good DevOps guy would.

Summary

I’m probably going to spend some more time in Java over the next week writing something a bit more useful than todays experiment so hopefully I will still be optimistic about it all, and maybe I’ll remember why I don’t like Java or maybe I’ll change my opinion, who knows!

DevOps team DNA

Hi, this is my first post on Matt’s blog. I’ve been an avid supporter of his blogging for a while and today got an invite to contribute. So here’s my post (created very quickly before he changes his mind).

My job has always been within an operations department of software product companies. I started at a small company as ‘everything’ support and slowly drifting towards a specialisation in the more recently branded DevOp’sy areas as I made my way through various acquisitions and mergers. Over the past couple of years I’ve found myself building DevOps teams. During that time I’ve discovered some of the things that work and almost everything that doesn’t work (or it feels like that :) ).

Some of the things that have worked..   (for me anyway)

Obviously these are going to be quite subjective and I doubt they will work for everyone. I’ll focus mostly on what I think are the key ingredients of a successful team. Maybe some people will find it interesting. Bare in mind that this only really applies to an operations team that supports a Cloud service.

I’m not a big football fan but I can draw some parallels between football managers and DevOps teams. You don’t see Arsenal winning and losing games based on their process redesigns. I may be simplifying, and I’m sure tactics plays a large part, but I believe you get quite a bit more out of a team when you have excellent players. Players who excel in different areas. My teams tend to be 5 – 7 players nowadays and between all of us we need to cover a few areas.

The first is product knowledge.. If you have a product guru in your team then you’ve got a productivity catalyst. So many aspects of our work involves investigating whether issues are product vs config and whether we can improve things from an operational perspective that will result in the product running better. The most recent team has a Product Architect and he’s awesome. He’s on the cutting edge of ideas for the product, for Amazon AWS and for all of the supporting technologies. Having a dedicated resource to do all of this in the background is great – it means that when we automate his prototypes and release them we get the maximum benefit. Recent examples include our Public API work and the work being done on our Amazon architecture to improve speed (CDN’s etc).

The second role I’ve always tried to fill is an engineer (at least one person, preferably two). Get the most senior developer(s) that you can, who knows the language of the product and build system of the product that you are supporting. You can now write the high level instrumentation that every DevOps teams need – as is true with any automation project. There is only ever so far you can go with Bash (I tend to take things beyond where they are supposed to be with Bash as it is). Ultimately having a senior developer or two buys you a massive amount of flexibility. Need a web service for something like externalised Puppet variables?.. you can write your own. Backup scripts not fast enough?.. a senior developer will make those scripts look very feeble in comparison when rewritten in their preferred language and multithreaded. I’m careful about not reinventing the wheel and will usually go off and clone something from Github before starting from scratch myself. But having some people who can write stuff from scratch is a major advantage. One caveat I would say for this role – hire from outside. Developers usually end up getting pulled back to work on stuff they did at the company at some stage. If you can, hire a new person and liven things up. Obviously tell the engineering teams that the hire(s) are for instrumentation in case they get worried that you want to start adding buttons to the product :)

Lastly, the sysadmins. I’d actually consider myself one of these at heart. Getting a good sysadmin can be tricky. It’s not uncommon to read 100 CV’s before finding someone even remotely eligible.  For a DevOps team you need a reasonably rare mix of skills.. people who know linux inside out, who can script and get excited by the latest batch of tools, and nowadays you need to throw Puppet / Chef into the mix. I have a couple of these currently and consider myself extremely blessed. Everything that we do is checked into source control (we use AWS as our data center) and this buys us a lot of things.. like the ability to automate everything, reduce costs by deleting and recreating at whim and disaster recovery. However, you pay for those things buy hiring really good people.. which is a cost saving in the long run once the cost saving benefits of the team start to show.

Now if you add in all of those types of role.. what I’ve found works quite nicely is running the team without being too focussed on the separation of responsibility. Everyone is on call 24/7. Everyone is expect to know the product inside out (although nobody will get near the level of expertise of the Product Architect), everyone scripts (even me) and ultimately everyone will end up doing some programming tasks. You can probably see from Matt’s previous blog posts about the Metrics project he got the chance to learn some Ruby. I think it’s important that everyone knows a bit of everyone else’s job.. although when under pressure everyone naturally drops back into doing what they are good at to speed things along.

This probably looks a little odd from the outside. But it makes things fun, everyone stays engaged and ultimately we all share the same goal: scale to 1 million users :D

Sysadmins in a Developers world

It’s all back to front

Well it was about 9 months back when I was touching on Developers in a sysadmin world and my initial thoughts were along the lines of we are better at different tasks, and after spending a week doing only development I am of the same opinion still.

Over the last 6 months we have had our solitary developer, coding away making great things happen, predominately developing a portal that allows us to deploy environments in 15 mins vs the 2 days it took before and the whole things is very pretty, it even has its own Favicon.ico which we are all pleased about. In addition to just deploying, it also allows us to scale up and down the environments it creates and despite constant interruptions it is coming along really well and in the next month we will be providing it as a service to the engineering teams to self serve.

As more and more of our tools are developing we are also in-housing more and more of our tools. As the regular readers know I do dable with the odd slightly more complex program than the average sysadmin might tackle. When we are faced with a situation such as monitoring the operations, by this I mean, the number of user growth week on week and the cost of running the environment(s) it just made more sense to do it our selves. There are tools out there that provide various dashboards like Geckoboard which can all do approximately 80% of the job, but it’s that last 20% that adds the usefulness, as such we are trying to develop tolls that are pluggable and extensible and support multiple outputs. For example the Metrics report we have will also support Geckoboard, Graphite, Email and probably have it’s own web interface.

For us it is becoming more about having the flexibility to add and remove components and keeping the flexibility around it, this introduces challenges with what ever being written needing to be pluggable and easy to maintain, which often make sit complicated.

I used classes, as a necessity

Typically when I program there is not much need for classes or even objects for that matter, a simple var and some nice loops and conditional statements would be plenty. Well not so much anymore, The last project was metrics and as with other projects I got it working within a day or two, and I hated it, it took over 30 seconds for it to run and generate the report I needed but not in the right format and then the level of detail in the metrics was not high enough, it could manage weekly but it was not good enough.

I decided that I’d have a chat with a few developers to help with the structure of the application, at first I was dubious, but it turned out well. The key step which I wouldn’t have made until it was a real problem was to separate out the the tasks that gathers the raw data, the tasks that manipulates the data into useful numbers, the bit that stores the data, the bit that manipulates the data into useful numbers and then finally the bit that outputs the pretty data.

This was an evolutionary step, I would have got to the point of understanding the need to separate each step out but not until it had become a real big pain many months later. Another advantage of splitting it out was how much simpler each step was, there were classes defining methods for getting data that were being used in classes to format the data that were being used… you get the idea. Rather than being one class to connect to amazon, manipulate the data and return an object that could be used to generate the metrics everything was done on much smaller steps. As a result it was a lot easier to write small chunks of code “that just worked” and it made debugging a lot easier, and I feel like I progressed my understanding, and this is always a good thing.

Who should do what

I touched on this in my other post, but I want to amend it based on a better understanding. To summarise I pretty much said as it is, Developers develop, sysadmins admin. They do, and certainly that should be their focus, but I think there is a lot to be gained from both points of view when pushed to work in the others world.

Before our developer joined the focus was on making the build, test and release process better, after forcing the developer to do sysadmin work for a month or so while the team was trying to grow and cope with the loss of a team member, it became clear that the time wasted for us all was not getting a build though but by us not being able to paralise the testing or being agile enough to re-deploy an environment if it was not quite right. These steps and understandings would not have happened if we didn’t encroach on each others work and gain the understanding from the other persons perspective.

Summary

This is what DevOps is really about, forget sysadmins doing code, forget about developers doing sysadmin work, it is about us meeting in the middle and understanding the issues we each face and working together to solve bigger problems.

Cloud deployment 101 – Part3

The final instalment

Over the last couple of weeks I have posted a Foundation to what the cloud really is and How to make the best use of your cloud. This week is about tying off lose ends, better ways of working, distilling a few myths and setting some things straight.

Infrastructure as code

DevOps is not the silver bullet, but it is a framework that encourages teamwork across departments to have rapid agility on deployment of code to a production environment.

  • Agile development
    • Frequent releases of minor changes
    • Often the changes are simpler as they are broken down into smaller pieces
  • Configuration management
    • This allows a server (or hundreds) to be managed by a single sysadmin and produce reliable results
    • No need to debug 1 faulty server of 100, re-build and move on
  • Close, co-ordinated partnership with engineering
    • Mitigates “over the wall” mentality
    • Encourages a team mentality to solving issues
    • Better utilises the skills of everyone to solve complex issues

Infrastructure as code is the fundamentals of rapid deployment. Why hand build 20 systems when you can create an automated way of doing it. Utilising the api tools provided by cloud providers it is possible to build entire infrastructures automatically and rapidly.

Automation through code is not a new concept, Sysadmins have been doing this for a long time through the use Bash, Perl, Ruby and other such languages, as a result the ability to program and understand complicated object orientated code is becoming more and more important within a sysadmin role, typically this was the domain of the developer and a sysadmin just needed to ”hack” together a few commands. Likewise in this new world, Development teams are being utilised by the sysadmins to fine tune the configuration of the application platforms such as tomcat, or to make specific code changes that benefit the operation of the service.

Through using an agile delivery method frequent changes are possible. At first this can be seen to be crazy, why would you make frequent changes to a stable system? Well for one, when the changes are made they are smaller, so between each iteration there is a less likely total outage. This also means that if an update does have a negative impact it can be very quickly identified and fixed, again minimising the total outage of a system.

In an Ideal world you’d be rolling out every individual feature rather than a bunch of features together, this is a difficult concept for development teams and sysadmins to get use to, especially are they are more use to the on-premise way of doing things.

Automation is not everything

I know I said automation is key, the more we automate the more things become stable. However, as automating everything is not practical and can be very time consuming, it can also lead to large scale disaster.

  • Automation, although handy can make life difficult
    • Solutions become more complex
    • When something fails, it fails in style
  • Understand what should be automated
    • Yes you can automate everything, but ask your self, Should you?
    • Automate boring, repetitive tasks
    • Don’t automate largely complex tasks, simplify the tasks and then automate

We need to make sure we automate the things that need to be automated, deployments, updates, DR
We do not want to spend time automating a solution that is complex, it needs to be simplified first and then automated; the whole point of automation is to free up more time, if you are spending all of your time automating you are no longer saving the time.

Failure is not an option!

Anyone that thinks things won’t fail is being rather naïve, The most important thing to understand about failures is what you will do when there is one.

  • Things will fail
    • Data will be lost
    • A server will crash
    • An update will make it through QA and then into production that reduces functionality
    • A sysadmin will remove data by accident
    • The users will crash the system
  • Plan for failures
    • If we know things will fail we can think about how we should deal with them when they happen.
    • Create alerts for the failure situations you know could happen
    • Ensure that the common ones are well understood on how to fix them
  • You can not plan for everything
    • Accept this, have good processes in place for DR, Backup and partial failures

Following a process makes it quick to resolve an issue, so creating run books and DR plans is a good thing. Having a wash up after a failure to ensure you understand what happened, why and how you can prevent it in the future will ensure the mitigations are put in place to stop it again.
Regularly review operational issues to ensure that the important ones are being dealt with, there’s little point in logging all of the issues if they are not being prioritised appropriately.

DR, Backup and Restoration of service are the most important elements of an operational service, although no one cares about them until there is a failure, get these sorted first.
Deploying new code and making updates are a nice to have. People want new features, but they pay for uptime and availability of the service. This is kinda counter intuitive for DevOps as you want to allow the most rapid of changes to happen, but it still needs control, testing and gatekeeping.

Summary

Concentrate on the things that no one cares about unless there’s a failure. Make sure that your DR and backup plan is good, test it works regularly, ensure your monitoring is relavent and timely. If you have any issues with any of these fix them quick, put the controls in place to ensure they stay up to date.

In regards to automation, just be sensible about what you are trying to do, if it needs automating and is complicated, find a better way.

Developers in a sysadmin world

Where to start

As I spend more time engrossing into the world of DevOps there’s been a number of occasions where something has not felt right with the relationship between operations and development. Within the team at work we will be hiring developers to work on the integration back with the development team, improving the build process, re-factoring the code to make it quicker to code / build / deploy All of which is good stuff.

However we fore mostly run a service, that is the main reason for the existence of the group. Without the correct support framework we will not be offering a service but instead offering a really fancy technology exercise so we can say we do X or Y or Z and I worry that is the route we are destined for.

Can Developers do Sysadmin tasks?

Of course they can, why couldn’t they, in the same way a mechanic can paint a car, we are not in the business of stopping people achieve their full potential, so have a go Jo is welcome here. The bigger question is if they should be doing it, in much the same way as I am not the right person to code a large enterprise product, Developers are not the right people to be making decisions about about service restarts or process niceness.

So I believe that a Developer can do the tasks of a Sysadmin, I believe that with enough training they can get to a point where they are not making random changes to a system to fix a specific problem with out understanding the consequences. However, I also believe a good graduate would provide the same level of risk and knowledge to bring to the table, so having an understanding of programming is a plus but Sysadmins aren’t in the business or random changes.

Can a Sysadmin be a Developer?

Sure, Why not, Same role in reverse? Almost. So I can program or have programmed in a number of languages which I tend not to bring up so…

  • Pascal
  • Delphi
  • C/C++
  • Java
  • PHP
  • Javascript
  • Perl
  • Ruby (as of a couple of weeks ago)
  • Bash
  • Awk
  • So I have quite a few languages that within about 10 mins and a few nudges on google I can write something reasonable, I have made a lot of different applications (another list I hear you cry out for!)

  • Maze solvers
  • Text based adventure games
  • Arkanoid
  • Web based route planner, Granted I only drew the maps form 6 million data points with zoom functionality
  • Content management system…
  • Geoblog with google maps and email updates / geo tagged pics
  • Web shops
  • System monitor with averages and weekly summaries
  • bit stream cypher
  • cd to mp3 encoder with CDDB lookups
  • Just 10 things I’ve written, So I would say I know enough about programming, probably more than necessary for my role.
    And yet I still have no interest in being a programmer. So shoe on the oppersite foot, maybe I should be doing some more developer focused work. It’s been a while but it could be just what I’m after.

    Based on that I already am a part time developer much in the same way that a Developer is a part time sysadmin, I mean their programs run on systems right…

    Who should do what?

    Well, Developers code, sysadmins admin.. I don’t think it get’s harder than that. I think it is easy for everyone to agree that the Developer will be best spent writing code and helping out with specific system scripts or puppet manifests / capistrano. It is also very easy for everyone to say that the sysadmin should check the RAM utilisation, RAID configuration, Disk layout etc etc.

    If all of that was correct this blog would end here, however; over the past few months something has been niggling at me and every now and then i’m involved in a conversation with an Developer which is ultimately “It’s not that hard just do X or Y” and it’s this which I have the biggest issue with developers on.

    Let’s take rolling out a new Amazon AMI:

    Developers approach

  • Deploy new server with AWS tools
  • Login
  • Done
  • Sysadmin approach

  • Start Deploy new server with AWS tools
  • Pause, because deploying keys isn’t a good idea or secure if everyone is using the same one…
  • Continue but with a generic “emergency key” configured
  • Check file system layout
  • Realise it’s all on one 6gb volume, fix issue
  • Create individual users
  • and so on…

    Different skill sets, Any monkey in a suit can click on a few buttons in a web UI, Knowing that splitting out the linux file system to different partitions, or at least understanding the impact of that is important.

    Summary

    I think the two skill sets can work harmoniously, but there is still a boundary, caused by experiences and expertise and for DevOps it’s about using each other strengths and avoiding the weaknesses. There’s been times when i’ve been doing OOP PHP or working with inheritance or writing a very complicated script where having someone that I can say is this or that better than that or this? so having someone around would be good I imagine it works both ways, especially when it comes to configuring the system.

    Developers tend to be very focused from my experience and because sysadmins are more generalists they are looked down on, I hope as DevOps becomes more common place and the realisation of harmony comes about. Lets see if in 6 months there’s another post on about how disastrous or successful this integrated approach becomes, it is new ground and it will be interesting to find out what happens if nothing else.