Some might say…
Some might say that running puppet as a server is the right way to go, it certainly provides some advantages like puppet db, sotred configs, external resources etc etc but is that really what you want?
If you have a centralised puppet server with 200 or so clients, there’s some fancy things that can be done to ensure that not all nodes hit the server at the same time but that requires setting up and configuring additional tools etc ect…
What if you just didn’t need any of that? what if you just needed a git repo with your manifests and modules in and puppet to be installed?
Have the script download / install puppet, pull down the git repo and then run it locally. This method puts more overhead on a per node basis but not much, it had to run puppet anyway, and in all cases this can still provide the same level of configuration as server client method, you just need to think out side of the server.
Don’t do it it’s stupid!
My response to my boss some 10 months ago when he said we should ditch puppet servers, manifests per server and make all variables outlawed. Our mission was to be able to scale our service to over 1 million users and we realised that manually having to add extra node manifests to puppet was not sustainable so we started on a journey to get rid of the puppet server and redo our entire puppet infrastructure.
Step 1 Set up a git repo, You should already be using one, if you aren’t Shame on you! We chose github, why do something yourself when there are better people out there doing a better job and are dedicated to doing just one thing, spend your time looking after your service not your infrastructure!
Step 2 Remove all manifests based on nodes, replace with a manifest per tier / role. For us this meant consolidation of our prod web role with our qa, test and dev roles so it was just one role file regardless of environment. This forces the management of the environment specific bits into variables.
Step 3 Implement hiera – Hiera gives puppet the ability to externalise variables into configuration files so we now end up with a configuration file per environment and only one role manifest. This, as my boss would say “removes the madness” Now if someone says “what’s the differences between prod and test you diff two files regardless of how complicated you want to make your manifests inherited or not. It’s probably worth noting you can set default variables for Hiera… hiera(“my_var”,”default value”)
Step 4 Parameterise everything – We had lengthy talks about parameterising modules vs just using hiera, but to help keep the modules transparent to what ever is coming into them, and that I was writing them, we kept parameters, I did however move all parameters for all manifests in a module into a “params.pp” file and inherit that everywhere to re-use the variables, within each manifest that always defaults to the params.pp value or is blank (to make it mandatory) This means that if you have sensible defaults you can set them here and reduce the size of your hiera files, which in turn makes it easier to see what is happening. Remember most people don’t care about the underlying technology just the top level settings and trust that the rest is magic… for the lower level bits see these: Puppet with out barriers part one for a generic overview Puppet with out barriers part two for manifest consolidation and Puppet with out barriers part three for params & hiera
This is all good, But what if you were in Amazon? and you don’t know what your box is? Well it’s in a security group but that is not enough information, especially if your security groups are dynamic, you can also Tag your boxes and you should make use, where possible of the aws cli tools to do this. We decided a long time ago to set n a per node basis a few details, Env, Role & Name From this we know what to set the hostname, what puppet manifests to apply and what set of hiera variables to apply as follows…
Step 5 Facts are cool – Write your own custom facts for facter. We did this in two ways, the first was to just pull down the tags from amazon (where we host) and return them as ec2_<tag>, this works but AWS has issues so it fails occasionally, Version2, was to get the tags, cache them locally in files and then facter can pull it from the files locally… something like this…
#!/bin/bash # Load the AWS config source /tmp/awsconfig.inc # Grab all tags locally IFS=$'\n' for i in $($EC2_HOME/bin/ec2-describe-tags --filter "resource-type=instance" --filter "resource-id=`facter ec2_instance_id`" | grep -v cloudformation | cut -f 4-) do key=$(echo $i | cut -f1) value=$(echo $i | cut -f2-) if [ ! -d "/opt/facts/tags/" ] then mkdir -p /opt/facts/tags fi if [ -n $value ] then echo $value > /opt/facts/tags/$key /usr/bin/logger set fact $key to $value fi done
The AWS config file just contais the same info you would use to set up any of the CLI tools on linux and you can turn them to tags with this:
tags=`ls /opt/facts/tags/` tags.each do |keys| value = `cat /opt/facts/tags/#{keys}` fact = "ec2_#{keys.chomp}" Facter.add(fact) { setcode { value.chomp } } end
Also see: Simple facts with puppet
Step 6 Write your own boot scripts – This is a good one, scripts make the world run. Make a script that installs puppet, make a script that pulls down your git repo, then run puppet at the end (like the following)
The node_name_fact is awesome, as it kicks everything into gear and hooks your deployed boxes in a security group with the appropriate tags to become fully built servers.
Summary
So now, puppet is on each box, every box from the time it’s built knows what it is (thanks to tags) and bootstraps it’s self to a fully working box thanks to your own boot script and puppet. With some well written scripts you can cron the pulling of git and a re-run of puppet if so desired. The main advantage of this method is the distribution, as long as it manages to pull that git repo it will build a box. and if something changes on the box, it’ll put it back, because it has everything locally so no network issues to worry about.
I really like this concept and have started using it. It seems to be gaining traction in one form or another. The only concern I have is the risk of putting your entire infrastructure’s configuration, setup, etc. on a single server which doesn’t need to know the full infrastructure. For instance, on your web server you may have the configuration information for your database system residing on the drive of that web server (possibly the most exposed system in your environment). You can imagine there may be information in your configuration for your database that you don’t want on the web server. You may remove those variables which contain more sensitive configs to Hiera, but you still have to get your Hiera yaml files to the webserver and you again, are likely to have DB related Hiera yaml files on that web server.
Just curious if you’ve run across this concern and how you have attempted to address it if you have.
Hi Bill,
We had the same concern, over time we pushed all parameters into hiera, so the modules and roles were agnostic and just populated with variables. As time went one we started populating the hiera variables from a web service so each box only got the details that were relevant to it’s self. We have since moved over to Chef for some better programatic access to the core but do the same thing in the same way just with a slightly different method of getting the vars into chef.
Matt