Configuration management alone is not the answer

Everything in one place

Normally when businesses start out building s product, especially those that don’t have the pre-existing knowledge of configuration management, tend to just throw the config on the server and then forget what it is. This is all fine, it’s a way of life and progression and sometime just bashing it out could prove very valuable indeed, but typically this becomes a nightmare to manage. Very quickly when there is then 100 servers all manually built it’s a pain in the arse so then everyone jumps into configuration management.

This is sort of phase 1, everything has become too complicated to manage, no one knows what settings are on what boxes and more time is spent working out if box 1 is the same as box 2. This leads to the need to have some consistency which leads to configuration management, the sensible approach is to move an application at a time into configuration management fully, not just the configuration files.

During this phase of execution it is critical to be pedantic and get as much as possible into configuration management, if you only do certain components there will always be the question of does X affect Y which isn’t in configuration management? and quite frankly, every time you have that conversation a sysadmin dies due to embarrassment.

Reduce & Reuse

After getting to Phase 1, probably in a hack and slash way, the same problems that caused the need for Phase 1 happen. 100 servers in configuration management lots of environments with variables set in them, and servers, and in the manifests themselves and the question starts to be come well is that variable overriding that one, why is there settings for var X in 5 places, which one wins? Granted in configuration management systems there are hierarchies that determine what takes precedence but that requires someone to always look through multiple definitions. On top of having the variables set in multiple locations, it is probably becoming clear that more variables are needed, more logic is needed, what was once a sensible default is now crazy.

This is where phase 2 comes in, aim to move 80%+ of each configuration into variables, have chunks of configuration turned on or off through key variables being set and set sensible defaults inside a module/cookbook. This is half of phase 2, the second half and probably the more important side is to reduce the definitions of the systems down to as few as possible. Back in the day, we use to have a server manifest, an environment manifest and a role manifest each of these set different variables in different places, how do you make sure that your 5 web servers in prod have the same config as the 5 in staging? that’s 14 manifests! why not have 1? just define a role and set the variables appropriately, this can then contain the sensible defaults for that role, all other variables would need to be externalised in something like hiera, or you would need to push them into Facter / ohai.

By taking this approach to minimising the definitions of what a server should be and reducing it down to one you are able to reuse the same configuration so all of your roleX servers are now identical except what ever variables are set in your external data store which can now easily be diff’d.

build, don’t configure

By this point, phase 1 & 2 are done, all is well with the world but still there’s some oddities Box X has a patch level y and box A has a patch level z, or there’s some left over hack to solve a prod issue which causes a problem on one of the servers. Well treat your servers as configurable and throw-away-able, There’s many technologies to help with this be it cloud based with Amazon and OpenStack or maybe VMWare, even physical servers with cobbler. This is Phase 3, build everything from scratch every time, at this point the consistency of the environment is pretty good leaving only the data in each environment to contend with.

Summary

Try and treat configuration management as something more than just config files on servers and be persistent about making everything as simple as possible while trying to get everything into it. If you’re only going to manage the files you might as well use tar’s and if that sounds crazy it’s the same level as phase 1 which is why you have to get everything in and I realise it can seem a massive task but start with the application stack you’re running and then cherry pick the modules/cookbooks that already exist for the main OS components like ntp, ssh etc

Vagrant & Chef

Sooo……

Last week I was starting to Play with chef and as part of that I was convinced by Tom (one of our Sysadmins) to stop building servers in Amazon and using them as development boxes. Traditionally I like to build things from the ground up and in stages, get the OS right, get the App right, get the tools right roughly in that order. By building the box from scratch, making a few changes in puppet and then re-running the scripts and continuing to iterate over the deployment you gradually build up to something that works. The down side with this can be that if you don’t re-deploy you never know for sure if it will work from scratch incase a step was missed, it can be costly and you are dependant on the network be it local or internet to be available.

So over the last few months on and off there had been various attempts to get Vagrant to work with virtual box on my Mac (10.6.8) now for what ever reason it never quite worked and therefore was pointless. However a swift reset of the laptop and an upgrade to Lion (10.7.5) seems to have resolved my issues, now being unblocked on a Vagrant front it was worth giving it a go.

Set up

It’s really quite straight forward, Firstly you need VirtualBox Secondly you need Vagrant Then you need a well constructed getting started guide like This

That’s pretty much it to get up and running, You may want to get a few more Boxes for centos or something else but other than that getting a simple box up and working is easy, and to login you just type “vagrant ssh” Simple.

But what about making it useful? So with the help of Tom we were able to hook Vagrant into chef, to do this we set a number of chef type options that define where various cook books or roles are that enables the virtual guest to access the files and there fore you can run them locally, try following This

I was using this with Roles, but just defining a list of recipes also works well. Best of all you can now open up 3 terminals, One logged in to the Vagrant box, one in the vagrant boxes directory and one in your chef repo. I found this worked well as I was able to make changes to my recipes in the chef repo, run “vagrant provision” or “vagrant reload” as needed in the other and tail any logs or watch any starts in the vagrant image. All in all it works quite well you have a disposable box if it all goes wrong you just start again and easy way to update / test the configuration before committing it or pushing it anywhere near production.

Gotcha

So overall after set up all is good. Unless for example your role file has something in it that works perfectly locally and not remotely, I’m not talking about recipes not working I’m talking about the role file having slight differences which was annoying, particularly when you’re new to it.

In particular I had

"json_class": "Chef::Role",

in my role file and this worked fine locally and then failed remotely, not sure what was causing this to be an issue but at least it is easy to resolve by just removing it.

According to This it’s needed (as below)

json_class
This should always be set to Chef::Role.
This is used internally by Chef to auto-inflate this type of object. It should be ignored if you are re-building objects outside of Ruby, and its value may change in the future.

But it id cause me probs, but in some ways much less annoying than the other issue I had. I had run chef, but it seemed to fail to update my yum repos, they already existed before hand and it just ignored them, I kept re-provisioning and nothing, was very confused by all of this. So I stopped playing around and went for a vagrant reload, Nothing still, Turns out chef has a stupid setting in its provider for yum repos. In short for file sin yum.repos.d it will not replace it if it exists which for a configuration management tool is pretty poor, every other type of file seems fine but they are “protecting” yum repo files for an unknown reason, I can only assume to stop people nuking their box, but That’s not opscodes’ call.

You can see a bit more detail Here

That was annoying, an annoying chefism, but at least it is possible to easily disable it as mentioned in the link above or simple to just remove the file first

Summary

All in all quite pleased with the local development, I hit some issues when I deployed to an actual box which is to be expected, but other than that it’s all been quite good. Going forward I’m going to carry on and I will also see how I get on trying vagrant with Puppet as well seing as it can do it, so it should help a lot with development. Unfortunately because of the laptop re-build I am yet to reap the rewards of this new efficiency but I can definitely see it helping in the longer term, particularly when I’m without Internet access. I’d recommend everyone at least spends an hour or two having a play with this as it could simplify your life, especially if you are not able to build servers in 5 mins, or if you just want to work “off grid”

Playing with Chef

It was bound to happen somewhen

Over the last few months we have gradually been building up a chef installation alongside our puppet configuration, Totally insane you may think; well it is. In our team I am pretty good with Puppet, Tom is pretty good with Chef so the only way to come up with a good solution is to use both and evaluate, that is where we are. So currently we are using puppet for all of our application based configuration and chef for our Infrastructure, and over the last few months I’ve poked it a couple of times but not too much, well today I’ve been poking it a lot more and doing stuff of use.

So I have to learn how to use Chef else I can’t decided between the two, so far so good. It has some issues but nothing major or nothing that’s bitten me yet. I should probably clarify that we run both our puppet and chef distributed so we can’t make use of the more powerful / useful server side, either way it’s handy.

Things I like so far

One of the things I like about chef instantly is the fact it is just ruby, it sounds silly but it means I can do things like …

node["rssh"]["chroot_files"].each do |link|
  link "#{node["rssh"]["chroot_jail"]}#{link}" do
    to "#{link}"
    link_type :hard
  end
end

Now in puppet, you would have to create a define and call the define with an array of “names” to do the same thing, but the chef code is more readable, especially as I do know some ruby.

In addition to this, inside Cookbooks you have an attributes directory, which s exactly what I try and do within puppet with my params stuff, Here but because it’s a well known structure it’s used more, this does mean that people can write cookbooks in a standard-ish way or at least it seems that way it is also a lot easier to maintain in this way.

Things I don’t like

At the moment i’m not too sure about having to have a recipes directory and then everything in that one place, some times the cookbooks I write may have many recipes and they look messy having so many on one directory. I don’t know if you can put them in folders but it doesn’t look like it, at least in puppet it will recursively go through folders to load the files.

Error messages, Chef’s are seemingly pointless, it may as well say Error in file at line. There is basically almost no interpretation of the actual error, but lots of information as bellow.

[2012-11-14T16:39:05+00:00] INFO: Processing template[/etc/rssh.conf] action create (rssh::default line 14)

================================================================================

Error executing action `create` on resource 'template[/etc/rssh.conf]'

================================================================================


Chef::Mixin::Template::TemplateError
------------------------------------
undefined local variable or method `nodes' for #<Erubis::Context:0x0000000458f9b8 @node=node[localhost]>

Resource Declaration:
---------------------
# In /tmp/vagrant-chef-1/chef-solo-2/cookbooks/rssh/recipes/default.rb

 14: template "/etc/rssh.conf" do
 15:   source "rssh.conf.erb"
 16:   owner  "root"
 17:   group  "root"
 18:   mode   0644
 19: end
 20: 

For those confused it’s line 12 with the actual error, and that is only half the message so it’s easy to miss one line out of the 40 odd.

Now to be fair, puppet’s errors are sometimes stupid and typically not enough information to actually be useful, but they are mostly straight forward, these errors are not as clear, but they at least do have some more value. I guess the real failing here is me not understanding how to interpret the error messages, either way it was a little annoying.

Summary

All in all, I quite like it, i’ve not seen anything in my limited play with it to say it’s going to be impossible to use and it has some nice features. Hopefully over the next few months I’ll sart using it a bit more so I can understand which one is best for us to use longer term, or we keep them both, and develop a nice hybrid solution… hopefully with clearly defined boundaries :)