Distributed Puppet

Some might say…

Some might say that running puppet as a server is the right way to go, it certainly provides some advantages like puppet db, sotred configs, external resources etc etc but is that really what you want?

If you have a centralised puppet server with 200 or so clients, there’s some fancy things that can be done to ensure that not all nodes hit the server at the same time but that requires setting up and configuring additional tools etc ect…

What if you just didn’t need any of that? what if you just needed a git repo with your manifests and modules in and puppet to be installed?
Have the script download / install puppet, pull down the git repo and then run it locally. This method puts more overhead on a per node basis but not much, it had to run puppet anyway, and in all cases this can still provide the same level of configuration as server client method, you just need to think out side of the server.

Don’t do it it’s stupid!

My response to my boss some 10 months ago when he said we should ditch puppet servers, manifests per server and make all variables outlawed. Our mission was to be able to scale our service to over 1 million users and we realised that manually having to add extra node manifests to puppet was not sustainable so we started on a journey to get rid of the puppet server and redo our entire puppet infrastructure.

Step 1 Set up a git repo, You should already be using one, if you aren’t Shame on you! We chose github, why do something yourself when there are better people out there doing a better job and are dedicated to doing just one thing, spend your time looking after your service not your infrastructure!

Step 2 Remove all manifests based on nodes, replace with a manifest per tier / role. For us this meant consolidation of our prod web role with our qa, test and dev roles so it was just one role file regardless of environment. This forces the management of the environment specific bits into variables.

Step 3 Implement hiera – Hiera gives puppet the ability to externalise variables into configuration files so we now end up with a configuration file per environment and only one role manifest. This, as my boss would say “removes the madness” Now if someone says “what’s the differences between prod and test you diff two files regardless of how complicated you want to make your manifests inherited or not. It’s probably worth noting you can set default variables for Hiera… hiera(“my_var”,”default value”)

Step 4 Parameterise everything – We had lengthy talks about parameterising modules vs just using hiera, but to help keep the modules transparent to what ever is coming into them, and that I was writing them, we kept parameters, I did however move all parameters for all manifests in a module into a “params.pp” file and inherit that everywhere to re-use the variables, within each manifest that always defaults to the params.pp value or is blank (to make it mandatory) This means that if you have sensible defaults you can set them here and reduce the size of your hiera files, which in turn makes it easier to see what is happening. Remember most people don’t care about the underlying technology just the top level settings and trust that the rest is magic… for the lower level bits see these: Puppet with out barriers part one for a generic overview Puppet with out barriers part two for manifest consolidation and Puppet with out barriers part three for params & hiera

This is all good, But what if you were in Amazon? and you don’t know what your box is? Well it’s in a security group but that is not enough information, especially if your security groups are dynamic, you can also Tag your boxes and you should make use, where possible of the aws cli tools to do this. We decided a long time ago to set n a per node basis a few details, Env, Role & Name From this we know what to set the hostname, what puppet manifests to apply and what set of hiera variables to apply as follows…

Step 5 Facts are cool – Write your own custom facts for facter. We did this in two ways, the first was to just pull down the tags from amazon (where we host) and return them as ec2_<tag>, this works but AWS has issues so it fails occasionally, Version2, was to get the tags, cache them locally in files and then facter can pull it from the files locally… something like this…

#!/bin/bash
# Load the AWS config
source /tmp/awsconfig.inc

# Grab all tags locally
IFS=$'\n'
for i in $($EC2_HOME/bin/ec2-describe-tags --filter "resource-type=instance" --filter "resource-id=`facter ec2_instance_id`" | grep -v cloudformation | cut -f 4-)
do
        key=$(echo $i | cut -f1)
        value=$(echo $i | cut -f2-)

        if [ ! -d "/opt/facts/tags/" ]
        then
                mkdir -p /opt/facts/tags
        fi
        if [ -n $value ]
        then
                echo $value > /opt/facts/tags/$key
        /usr/bin/logger set fact $key to $value
        fi
done

The AWS config file just contais the same info you would use to set up any of the CLI tools on linux and you can turn them to tags with this:

tags=`ls /opt/facts/tags/`

tags.each do |keys|
        value = `cat /opt/facts/tags/#{keys}`
        fact = "ec2_#{keys.chomp}"
        Facter.add(fact) { setcode { value.chomp } }
end

Also see: Simple facts with puppet

Step 6 Write your own boot scripts – This is a good one, scripts make the world run. Make a script that installs puppet, make a script that pulls down your git repo, then run puppet at the end (like the following)

The node_name_fact is awesome, as it kicks everything into gear and hooks your deployed boxes in a security group with the appropriate tags to become fully built servers.

Summary

So now, puppet is on each box, every box from the time it’s built knows what it is (thanks to tags) and bootstraps it’s self to a fully working box thanks to your own boot script and puppet. With some well written scripts you can cron the pulling of git and a re-run of puppet if so desired. The main advantage of this method is the distribution, as long as it manages to pull that git repo it will build a box. and if something changes on the box, it’ll put it back, because it has everything locally so no network issues to worry about.

This time, We survived the AWS outage

Another minor bump

Anyone based in the US East region in AWS knows that yet again there were issues with EBS volumes, although you wouldn’t know it if you looked at their website. It’s a bit of a joke when you see headlines like Amazon outage takes down Reddit, Foursquare, others yet on their status page a tiny little note icon appears that states there’s a slight issue, extremely minor, don’t worry about it. Yeah right.

The main culprits were EC2 and the API, both of which were EBS related.

“Degraded EBS performance in a single Availability Zone
10:38 AM PDT We are currently investigating degraded performance for a small number of EBS volumes in a single Availability Zone in the US-EAST-1 Region.
11:11 AM PDT We can confirm degraded performance for a small number of EBS volumes in a single Availability Zone in the US-EAST-1 Region. Instances using affected EBS volumes will also experience degraded performance.
11:26 AM PDT We are currently experiencing degraded performance for EBS volumes in a single Availability Zone in the US-EAST-1 Region. New launches for EBS backed instances are failing and instances using affected EBS volumes will experience degraded performance.
12:32 PM PDT We are working on recovering the impacted EBS volumes in a single Availability Zone in the US-EAST-1 Region.
1:02 PM PDT We continue to work to resolve the issue affecting EBS volumes in a single availability zone in the US-EAST-1 region. The AWS Management Console for EC2 indicates which availability zone is impaired. “

The actual message is much much longer but you get the gist, a small number of people were affected. Yet most of the major websites that use amazon were affected, how can that be considered small?

Either way, this time we survived, and we survived because we learnt. Back in June and July we experienced these issues with EBS so we did something about it, now why didn’t everyone else?

How Alfresco Cloud Survived

So back in June and July we were heavily reliant on EBS just like everyone else, we had an EBS backed AMI that we then used puppet to build out the OS, this is pretty much what everyone does and this is why everyone was affected, back then we probably had 100 – 150 EBS volumes so the likely hood of one of them going funny was quite high, now we have about 18, and as soon as we can we will ditch those as well.

After being hit twice in relatively quick succession we realised we had a choice, be lazy or be crazy, we went for crazy and now it paid out. We could have been lazy and just said that Amazon had issues and it wasn’t that frequent and it is not likely to happen again, or we could be crazy and reduce all of our EBS usage as much as possible, we did that.

Over the last few months I’ve added a numer or posts about The Cloud, Amazon and Architecting for the cloud along with a few funky Abnormal puppet set ups and oddities in the middle. All of this was spawned from the EBS outages, we had to be crazy, Amazon tell us all the time don’t have state, don’t rely on anything other than failure use multiple AZ’s etc etc all of those big players that were affected would have been told that they should use multiple availability zones, but as I pointed out Here their AZ can’t be fully independent and yet again this outage proves it.

Now up until those outages we had done all of that, but we still trusted Amazon to remain operational, since July we have made a concerted effort to move our infrastructure to elements within Amazon that are more stable, hence the removal of EBS. We now only deploy instance backed EC2 nodes which means we have no ability to restart a server, but it also means that we can build them quickly and consistently.

We possibly took it to the extreme, our base AMI, now instance backed, consists of a single file that does a git checkout, once it has done that it simply builds its self to the point that chef and puppet can take over and run. The tools used to do this are many but needless to say many hundreds of of lines of bash, supported by Ruby, Java, Go and any number of other languages or tools.

We combined this with fully distributing puppet so it runs locally, in theory once a box is built it is there for the long run; we externalised all of the configuration so puppet was simpler and easier to maintain. Puppet, its config, the Base OS, the tools to manage and maintain the systems are all pulled from remote services including our DNS which automatically updates its self based on a set of tags.

Summary

So, how did we survive, we decided every box was not important, if some crazy person can not randomly delete a box or service and the system keeps working then we had failed. I can only imagine that the bigger companies with a lot more money and people and time looking at this are still treating Amazon more as a datacentre rather than a collection of web services that may or may not work. With the distributed puppet and config once our servers are built they run happily on a local copy of the data, no network, and that is important because AWS’s network is not always reliable and nor is their data access. If a box no longer works delete it, if an environment stops working rebuild it; if amazon has a glitch, keep working, simple.

Release consistency

It’s gotta be… perfect

This is an odd one that the world of devops sits in an awkward place on but is very vital to the operational success of a product. As an operations team we want to ensure the product is performing well, there’s no issues and there’s some control over the release process, some times this leads to long winded and bloaty procedures that are common in service providers where people stop thinking and start following, this is a good sign a sysadmin has died inside. From the more Development side we want to be quick and efficient and reuse tooling. Consistency is important, a known state of release is important, the process side of it should not exists as it should be automated with the minimal interaction.

So as Operations guys, we may have to make changes ad-hoc to ensure the service continues to work, as a result a retrospective change is made to ensure the config is good for long term; often this can stay untested as you’re not going to re-build the system from scratch to re-test, are you?

The development teams want it all; rapid, agile change, quick testing suites and an infallible release process with plenty of resilience and double checks and catches for all sorts of error codes etc etc, the important thing is that the code makes it out, but it has to be perfect, which leads on to QA.

The QA team want to test the features, the product, the environment, combinations of each and the impact of going from X to Y and tracking the performance of the environment in between. All QA want is an environment that never changes, with a code base that never changes so they can test everything in every combination, those bugs must be squashed.

Obviously, everyones right but with so many contradicting opinions it is easy to end up in a blame circle, which isn’t all that productive. The good news is we know what we want…

  • Quick release
  • Quick accurate testing
  • Known Product build
  • Known configuration
  • No ad-hoc changes to systems
  • Ability to make ad-hoc to systems
  • Infallible release process

All in all not that hard, there are two issue points here. Point one, infallible releases are a thing of wonder and can only ever be iterated over to make them perfect, in time it will get better. Day 1 will be crap, day 101 will be better, day 1000 better still. point two, you can’t have no ad-hoc changes and the ability to make ad-hoc changes, can you? Well you can.

If you love it, let it go

As a sysadmin, if there’s an issue on production I will in most cases fix it in production, if it is risky I will use our staging environment and test it on there, but this is usually no good as staging typically won’t show the issues production does, i.e. all of its servers would be working yet production is missing one. This means I have to make ad-hoc changes to production, this causes challenges for the QA team as now the test environments aren’t always the same, it then screws up the release process as we made a change in one place and didn’t manually add it in to all other environments.

So, What if we throw away, production, staging or any other environment with every release? This is typically a no-no for traditional operations, why would you throw away a working server? well it provides several useful functions:

  1. Removes ad-hoc changes
  2. Tests documentation / configuration management
  3. Enhances DR strategy and familiarisation with it
  4. Clean environment

The typical reason why the throw away approach isn’t done is due to a lack in confidence of the tools. Well, Bollocks. What good is it having configuration management and DR polices if you aren’t using it? if in an operational place now you are making changes to puppet and rolling them forward you achieve some of this, but it’s not good enough, you still have the risk of not having tested the configuration on a new system.

With every environment being built from scratch with every release, we can version control a release to a specific build number and specific git commit which is better for QA as it’s always a known state, if there’s any doubt delete and re-build.

The release process can now start being baked into the startup of a server / puppet run so the consistency is increasing and hopefully the infallibility of the release is improving, adding to this a set of System wide checks for basic health, a set of checks for unit testing the environment, a quick user level test before handing over for testing then it’s more likely, more often that the environments are at a known consistence.

All of this goodness starts with deleting it and building fresh, some of it will happen organically but by at least making sure all of the configuration / deployment is managed at startup you can re-build, and you have some DR. Writing some simple smoke tests and some basic automation is a starting point, from here at least you can build upon the basics to start making it fully bullet proof.

Summary

Be radical, delete something in production today and see if you survive, I know I will.

Simple facts with Puppet

It’s not that hard

When I first started looking at facter it was magic, things just happened and when I entered facter a list of variables appeared and all of these variables are available to use within puppet modules / manifests to help make life easier. After approximately 2 years of thinking how good they were and how nice it would be to have my own I finally took the time to look at it and try to work it out….

For those of you that don’t know, facter is a framework for providing facts about a host system that puppet can use to make intelligent decisions about what to do and can be used to determine the operating system, release of it, local IPs etc etc. This gives you flexibility in puppet to do things like choose what packages to install based on Linux distribution or insert the local IP address into a template.

Writing Facts

So, Lets look at a standard fact that comes with it so you can see the complexity involved an understand why after glancing at it I never went much further.

# Fact: interfaces
#
# Purpose:
#
# Resolution:
#
# Caveats:
#

# interfaces.rb
# Try to get additional Facts about the machine's network interfaces
#
# Original concept Copyright (C) 2007 psychedelys <psychedelys@gmail.com>
# Update and *BSD support (C) 2007 James Turnbull <james@lovedthanlost.net>
#

require 'facter/util/ip'

# Note that most of this only works on a fixed list of platforms; notably, Darwin
# is missing.

Facter.add(:interfaces) do
  confine :kernel => Facter::Util::IP.supported_platforms
  setcode do
    Facter::Util::IP.get_interfaces.collect { |iface| Facter::Util::IP.alphafy(iface) }.join(",")
  end
end

Facter::Util::IP.get_interfaces.each do |interface|

  # Make a fact for each detail of each interface.  Yay.
  #   There's no point in confining these facts, since we wouldn't be able to create
  # them if we weren't running on a supported platform.
  %w{ipaddress ipaddress6 macaddress netmask}.each do |label|
    Facter.add(label + "_" + Facter::Util::IP.alphafy(interface)) do
      setcode do
        Facter::Util::IP.get_interface_value(interface, label)
      end
    end
  end
end

This is stolen, and all it does is provide a comma separated list of interfaces as follows: eth0, eth1 etc etc

Now, when I started looking at facter I knew no ruby and it was a bit daunting, but alas I learnt some and never bothered looking at facter again until my boss managed to simplify one down to it’s bear essentials, which is the one line…

Facter.add("bob") { setcode { "bob" } }

At this point onwards all you need to do is learn some ruby to make sure you can populate that appropriately or, use bash to get the details and populate the fact, in the next example I just grab the pid of apache from ps

apachepid=`ps -fu apache | grep apache | awk '{ print $2}'`

Facter.add(:apachepid) {
	setcode { apachepid }
}

So if you know bash, and you can copy and paste you can do something like the above, now this is ruby, so you can do a lot more complex things but that’s not for today

Okay, so now something more complex is needed…. What if you’re in Amazon and use the Tags on your EC2 instances and you want to use them in puppet ? well you can just query amazon and get a result and use that, although that will take forever and 1 day to run as AWS is not the quickest. This is an issue we had to over come, so we decided to run a script that would query amazon in it’s own time and populate the tags onto the file system, at which point we can read them quickly with facter.

So first, a shell script.

#!/bin/bash
source /path/to/aws/config.inc

# Grab all tags
IFS=$'\n'
for i in $($EC2_HOME/bin/ec2-describe-tags --filter "resource-type=instance" --filter "resource-id=`facter ec2_instance_id`" | grep -v cloudformation | cut -f 4-)
do
        key=$(echo $i | cut -f1)
        value=$(echo $i | cut -f2-)

        if [ ! -d "/opt/facts/tags/" ]
        then
                mkdir -p /opt/facts/tags
        fi
        if [ -n $value ]
        then
                echo $value > /opt/facts/tags/$key
		/usr/bin/logger set fact $key to $value
        fi
done

So this isn’t the best script in the world, but it is simple, it pulls a set of tags out of amazon and basically stores them in a directory where the file name is the tag name and the content is the tag value.
So now we have the facts locally with bash, something we’re all a bit more familiar with we can then take something like facter which is alien ruby and force some bash inside it but still generate facts that provide value

tags=`ls /opt/facts/tags/`

tags.each do |keys|
        value = `cat /opt/facts/tags/#{keys}`
        fact = "ec2_#{keys.chomp}"
        Facter.add(fact) { setcode { value.chomp } }
end

The first thing we do is produce a list of tags (directory list) and then we use some ruby to loop through it and yet more bash to get the values.
None of this is complicated, and hopefully these few examples are enough to encourage people to start writing facts even if they are an abomination to the ruby language but at least you have value without needing to spend time understanding or learning ruby.

Summary

Facts aren’t that hard to write, and thanks to being ruby you can make them as complicated as you like or as simple as you like, and you can even break into bash as needed. So now a caveat, although you can write facts quickly with this half bash/ruby mix by far, just learning ruby will make life easier in the long run, you can then start to incorporate some more complex logic into the facts to provide more value within puppet.

A useful link for facter incase you feel like reading more

Puppet with out barriers -part three

The examples

Over the last two weeks (part one & part two) I have slowly going into detail about module set up and some architecture, well nows the time for real world.

To save me writing loads of puppet code I am going to abbreviate and leave some bits out. First things first a simple module.

init.pp

class javaapp ( $conf_dir = $javaapp::params::conf_dir) inherits javaapp::params {

  class {
    "javaapp::install":
    conf_dir => $conf_dir
  }

}

install.pp

class javaapp::install (conf_dir = $javaapp::params::conf_dir ) inherits javaapp::params {

 package {
    "javaapp":
    name => "$javaapp::params::package",
    ensure => installed,
    before => Class["javaapp::config"],
  }

  file {
    "/var/lib/tomcat6/shared":
    ensure => directory,
  }

}

config.pp

class javaapp::config (app_var1 = $javaapp::params::app_var1,
                       app_var2 = $javaapp::params::app_var2) inherits javaapp::params {

  file {
    "/etc/javaapp/javaapp.conf":
    content => template("javaapp/javaapp.conf"),
    owner   => 'tomcat',
    group   => 'tomcat'
  }
}

params.pp

class javaapp::params ( ) {

$conf_dir = "/etc/javaapp"
$app_var1 = "1.2.3.4/32"
$app_var2 = "host.domain.com"

}

One simple module in the module directory. As you can see I have put all parameters into one file, it use to be that you’d specify the same defaults in every file so in init and config you would duplicate the same variables. Well that is just insane and if you have complicated modules with several manifests in each one it gets difficult to maintain all the defaults. This way they are all in one file and are easy to identify and maintain, it by far isn’t perfect, it does work though and i’m not even sure if puppet supports it and if it doesn’t that is a failing of puppet but it does work with the latest 2.7.18 and i’m sure i’ve had it on all 2.7 variants at some point.

You should be aiming to set sensible defaults set every parameter regardless, but make sure it’s sensible, if you want to enforce the variable is set you can still not put an entry in params and just specify it without a default.

Now the /etc/puppet directory

Matthew-Smiths-MacBook-Pro-2:puppet soimafreak$ ls
auth.conf	extdata		hiera.yaml	modules
autosign.conf	fileserver.conf	manifests	puppet.conf

the auth, autosign and fileserver configs will depend on your infrastructure, but the two important configurations here are puppet.conf and hiera.conf

puppet.conf

[master]
certname=server.domain.com
modulepath = /etc/puppet/modules 
[main]
    # The Puppet log directory.
    # The default value is '$vardir/log'.
    logdir = /var/log/puppet

    # Where Puppet PID files are kept.
    # The default value is '$vardir/run'.
    rundir = /var/run/puppet

    # Where SSL certificates are kept.
    # The default value is '$confdir/ssl'.
    ssldir = $vardir/ssl

		autosign = true
		autosign = /etc/puppet/autosign.conf

[agent]
    # The file in which puppetd stores a list of the classes
    # associated with the retrieved configuratiion.  Can be loaded in
    # the separate ``puppet`` executable using the ``--loadclasses``
    # option.
    # The default value is '$confdir/classes.txt'.
    classfile = $vardir/classes.txt

    # Where puppetd caches the local configuration.  An
    # extension indicating the cache format is added automatically.
    # The default value is '$confdir/localconfig'.
    localconfig = $vardir/localconfig
    pluginsync = true

The only real change worth making to this is in the agent sector, plugin sync ensures that any plugins you install in puppet, like Firewall, VCSRepo, hiera etc are loaded by the agent, obviously on the agent you do not want all of the master config at the top.

Now the hiera.yaml file

hiera.yaml

---
:hierarchy:
      - %{env}
:backends:
    - yaml
:yaml:
    :datadir: '/etc/puppet/extdata'

Okay, to the trained eye this is sort of pointless, it tells puppet that it should look in a directory called /etc/puppet/extdata for a file called %{env}.yaml so in this case if env were to equal bob it would like for a file /etc/puppet/extdata/bob.yaml The advantage to this is that at some point if needed that file could be changed to for example

hiera.yaml

---
:hierarchy:
      - common
      - %{location}
      - %{env}
      - %{hostname}
:backends:
    - yaml
:yaml:
    :datadir: '/etc/puppet/extdata'

This basically provides a location for all variables that you are not able to tie down to a role which will be defined by the manifests.

Matthew-Smiths-MacBook-Pro-2:puppet soimafreak$ ls manifests/roles/
tomcat.pp	default.pp	app.pp	apache.pp

So we’ll look at the default node and tomcat to get a full picture of the variables being passed around.

default.pp

node default {
	
	#
	# Default node - base packages for all systems
	#

  # Define stages
  class {
    "sshd":     stage =>  first;
    "ntp":      stage =>  first;
  }
	
  # Needed for Facter to generate OS related information
  package {
    "redhat-lsb":
    ensure => "installed"
  }

  # mcollective
  class {
    "mcollective":
    mc_password   => "bobbypassword6",
    puppet_server => "puppet.domain.com",
    activemq_host => "mq.domain.com",
  }

  # Manage puppet
  include puppet
}

As you can see, this default node sets up some classes that must be on every box and ensures that packages that are vital are also installed. If you so feel the need to extrapolate this further you could have the default node inherit another node, for example you may have a company manifest as follows:

company.pp

node company {
$mc_password = "bobbypassword6"
$activemq_host = "mq.domain.com"

$puppet_server = $env ? {
    "bob" => 'bobpuppet.domain.com',
    default => 'puppet.domain.com',
  }
}

This company node manifest could be inherited by the default and then instead of having puppet_server => “puppet.domain.com”, you could have puppet_server => $puppet_server, which I think is nice and clear. The only recommendation is to keep your default and your role manifests as simple as possible, try and keep if statements out of them, can you push the decision into hiera? do you have a company.pp that would be sensible to have some env logic in it? are you able to take some existing logic and turn it into a fact ?

Be ruthless and push as much logic out as possible, use the tools to do the leg work and keep puppet manifests and modules simple to maintain.

Now finally the role,

tomcat.pp

node /*tomcat.*/ inherits default {

  include tomcat6

  # Installs java app using the init/install classes and default params, 
  include javaapp

  class {
    "javaapp::config"
    app_var1 => hiera('app_var1') 
    app_var2 => $fqdn
  }
}

The role should be “simple” but it also needs to make it clear what it’s setting, if you notice that several roles use the same class and in most cases the params are the same, change the params file, remove the options from the roles, try and keep it so what is in the roles is only overrides and as minimal as possible. The hiera vars and any set in the default / other node inheritance can all be referenced here.

Summary

Hopefully that helps some of you understand the options for placing variables within puppet in different locations. As I mentioned in the other posts, this method has 3 files, the params, the role the hiera file that’s it, all variables are in one fo those three so there’s no need to hunt through all of the manifests in a module to identify where the the variable may or may not be set, it is either defaulted of overridden, if it’s overridden it will be in the role manifest, from there you can work out if it’s in your default or hiera and so on.

Puppet with out barriers -part two

Magnificent manifests

Back in the day, we use to have maybe 7 node manifests per environment and 6 or 7 environments, which is a lot of individual nodes to manage. As a result not everything was always the same.

We got to the point where each environment, although using the same modules was not always quite right, we used variables to set each environment up, but each variable was in each environment and different, yes you can keep it in check with diff and good process, but that’s time consuming. Why not just have one manifest for each tier of the application? that’s what we have now every tier is identical in terms of its use of the manifest, but obviously every environment has its own settings. So how do you have 1 manifest 100s of environments and guarantee that everything is the same across each environment except a small number of differences which can be easily seen in one file.

Now you could try and manage the different sets of vars through node inheritance but you then end up with a quite complex set of inheritance which can be a pain to track through. At first I was a little confused at how we would reduce our manifests down and still be able to set variables in the right places. Before going to far, lets look at a typical node inheritance set up.

[12:56]msmith@pinf01:/etc/puppet/manifests$ tree
.
|-- company
|   `-- init.pp
|-- dev
|   `-- init.pp
|-- site1
|   |-- init.pp
|   `-- nodes.pp
|-- site2
|   |-- dev_nodes.pp
|   |-- init.pp
|   |-- nodes.pp
|   `-- test_nodes.pp
|-- site3
|   |-- init.pp
|   `-- nodes.pp
|-- prod
|   `-- init.pp
|-- site.pp
`-- test
    `-- init.pp

In this set up, The company init.pp sets up company wide details, domain names, IP addresses for firewall rules or anything that is applicable to multiple sites. The site init.pp inherits the company one and sets up any variables needed for that specific site like DNS servers. In some of the sites we have additional configuration which inherits that site’s init.pp and sets more specific information for say the test or dev environment and finally there is an environment init.pp which sets variables for that type of environment such as environment prefix or other settings that are specific to that environment.

With this set up it gives you plenty of room to grow but can be harder to work out where the variables come from as you will have to track them back through many levels of inheritance; this can make it hard for new people to the team to identify where the variables should be set which is made harder with parameterised classes and defaults.

Now with classes you can create a class/subclass that is used to set a majority of your defaults and this can be useful if you have multiple companies or distinct groups which always insist on having different settings, the advantage of this is that your node declarations then remain minimal. However, this can also make things complicated as you have meta classes that set some of the params needed for a class but not all, so you’ll still end up passing in params within your nodes to the meta class which in turns passes them into your modules and so on. This is typically more inheritance and paramaters than most sysadmins can handle.

So we had the same issue, it was never a 5 min talk to explain what is set where and when because it could be set in 3 or 4 nodes, passed as a param and this was the same for each of the 6 environments. After much talking and disbelief we eventually started on a journey to reduce 7 environments and what was 50+ node files (each controlling multiple nodes via regexp) down to a handful.

[msmith@puppet manifests]$ tree
.
├── account.pp
├── roles
│   ├── app1.pp
│   ├── app2.pp
│   ├── default.pp
│   ├── app3.pp
│   ├── app4.pp
│   ├── app5.pp
│   └── app6.pp
└── site.pp

So to start with, the modules are still parameterised as before, heavily so to make it easy to turn features on and off and where possible everything is set as a default within the module within a params.pp file, so there is one place within a module to change the defaults for all subclasses. Each appX.pp will load which ever sets of classes it needs and where possible it will set the params to make use of various internal variables that are needed.

Each appX.pp is inherited from the default which sets most of the classes we want on each app such as ssh, mcollecitve, aws tools. This inherits the account.pp which sets variables that are relavent to each account we use, or classes that are relavent to each account, the account in this case is an amazon account. We set different yum servers, infrastructure servers and so on.

Now we still have environments, and we choose to load the variables for this via hiera, but we have gone to great lengths to restrict the variables in the environment down to those that are absolutely necessary as such we have not yet needed to make sue of hiera’s hierarchy but we understand the necessity of keeping that option open to us.

The new structure means we have at the moment no more than 5 places to edit vars for every app / env / node. This particular structure works for us because we run a small set of node types but in multiple environments. As time goes on the roles manifests directory will grow to incorporate new servers and if we need to deal with nodes of a similar role type apart from option X/Y/Z we will make use of hiera to add another layer to the system.

Next week I’ll go through the node manifests in some detail which should hopefully cement some of this blurb.

Puppet with out barriers -part one

Structure is good

Like everyone that has used puppet and is using puppet; the puppet structure we use to use to manage our nodes was relatively straight forward. But before getting into that lets go into the two or three ways you would know of already.

1. Node inheritance, Typically you define a number of nodes that define physical and logical structures that suite your business and inherit from each of these to end up with a host node that has the correct details relavent to the data centre it is in and its logical assignment within it.
2. Class inheritance, Similar to node inheritance, you create your modules which are typically agnostic and more than likely take a number of parameters; then create a module that contains a set number of node types, for example, “web” nodes. So the web node would include the apache module and would do all of the configuration you expect of all apache nodes, this can be done to the point of each node manifest simple including a web, or wiki type class.
3, By far I imagine the most common, is a mixture of them both.

Either of these methods is fine, however what will essentially happen is duplication and you’ll be faced with situations where you want 90% of a web node but not that last 10%; you then create a one of module or node and before you know it you have several web nodes or several roles defining the same task.

It’s fair to say we do not do any one of those, node inheritance a little to save duplication of code but that’s about it, I’ll touch on this more next week, but this week is about foundations, and in puppet that means modules.

Parameterise everything

Parameterised classes help with this level of flexibility but if your module isn’t fully parameterised it’s a bit of a pain, so a slight detour based on experience.

I imagine like everyone out there, I write puppet modules that are half arsed on most days. I parameterise the variables I need for my templates and stick to a simple module layout to stop my classes getting to busy. This isn’t a bad way of doing things, you get quite a lot of benefit with out a lot of investment in time, and I for one will continue to write modules that are half arsed, but with one difference, more parameters.

Occasionally I have to extend my module, some times I don’t have time to re-factor and make it perfect so I work around my module, if you’re looking to build technical debt, this is a good way of going. and if you continue down this route you will end up with modules that are complicated and hard to follow.
Initially when introducing module layouts to people they assume they will make life more complicated because you have split a module into classes, but the reality is rather than having 1 file with 100 lines you have 10 files with 10 lines, and of those 10 files you may only be using 2 files in your configuration which can make life a lot simpler than sifting through code.

A typical module layout I use will consist of the following

[root@rincewind modules]# ls example/
files  manifests  README  templates
[root@rincewind modules]# ls example/manifests/
install.pp  config.pp  init.pp  params.pp

For those not familiar with such naming schemes… init is loaded by default and will typically call include the install class and the config class. So if I want to install / configure an entire module that has had some reasonable params set I just “include modulename”. You can read more about modules and how I typically use them Here It was rather experimental at the time but with only a few tweaks it is now in a production environment and is by far the easiest module we have to maintain.

The main reason this structure works for us is that we are able to set a number of sensible defaults in params which reduces the number of params we need to pass in while allowing us to still have a lot of parameters on each class. typically it means our actual use of the classes even though they may have 40+ params won’t normally go past 10.

The big thing we did different with this module is to parameterise about 90% of all the configuration options, now you may think that’s fine for small apps but what about large ones, well we did it with our own product Alfresco, which is quite large and has a lot of options. Granted we can’t account for every possible configuration but we gave it a shot. Consequently we now have a puppet module that out of the box should work with multiple application containers (tomcat, jboss etc ) but also allows for the application to be run in different containers on the same machine.

The advantage of this up front effort on a module which is core to what we are doing is that we are now able to just change options with out having to do a major re-factor of code, adding more complexity in terms of configuration is simpler as it has been split into multiple classes that are very specific to their tasks. If we want to change most of the configuration it is simply a matter of changing that parameter.

Be specific
So you have a module that is nicely parameterise and that’s it, well not quite, you have to make sure your module is specific to it’s task, and this is one that was a little odd to get over. Puppet is great at configuration management, it’s not so good at managing dependancies and order, so we made the decision to simplify our installation process so rather than having the puppet module try and role out the application and multiple versions of it, with different configurations for different versions, it simply set’s up the environment and sets the configuration with no application deployed.

So yes, you can install within the module if you wish, and we use to do that and some complicated expanding / re-zipping of the application to include plugins, but we also have a yum repo, or git, or a ftp server where we can just write a much more specific script to deal with the installation.

By separating out the installation we have been able to cut a lot of code from our module that was trying to do far more than was sensible.

Be sensible
When you look through configuration there are sometimes options that will only ever be set if X or Y is set, in this case write more complicated modules to cope with that logic and don’t just add everything as a parameter. When you look through your module and you start seeing lots of logic, you’ve gone wrong.

Typically now when writing modules the only logic that ends up in the pp file is is feature enabled or disabled, if you try to make a manifest in a module all things to all people you end up with a lot of logic and complications that are not needed. To tackle this, after writing a base for the module I then cloned the params file and the application manifest which does most of the config and made it more specific to the needs of the main task that I’m trying to achieve. It is 90% the same as the normal application one but with a few extra params and allows me to add specific files that are 100% relavent to what we do in the cloud with out ruining the main application one incase we decide to change direction or if we need a more traditional configuration.

The point is more that you should avoid over complicating each manifest in the module, create more simple manifests that are sensible for the task in hand.

This module layout will give you something to work with rather than against, we spent a lot of time making our whole puppet infrastructure as simple as possible to explain, this doesn’t mean it’s simple it just means everyone understands what it does, me, my boss, my bosses boss and anyone else that asks “How does it work?”