Open files – why limits are set
Everything has a tolerance…
At some point everything has a boundary of its optimal operation, be it memory allocation, someones temper or the speed of a car. As with all of these things the limits are set for a reason, you can drive a car at 150+ mph all the time, that’s fine, the brakes will wear quicker, the engine will wear quicker. The same can be applied to someones temper, you can push and prod them so much and eventually they will snap and bite your head off.
Sometimes it can be useful to understand what these tolerances are and then once it is understood you can exploit them, but if you don’t understand them you should not be changing the details. One of my pet peeves is people that say “Ergh, why is it limited to 1024 files it needs more than that, set it to unlimited!” Sigh why do these people exist in this world.
So, web apps such as Tomcat require the use of file handles, in fact everything that runs under linux does with out file handles life becomes difficult. To get a better picture of this rather than search for file handles, try searching for file types, you’ll soon see there’s a lot that file handles have to do. As a result whenever any application get’s turned on, in this case Tomcat, it needs to consume a number of these file handles to be able to operate.
When a connection comes into Tomcat, that consumes a file handle, that handle will remain open until it is closed, the application that initiated it is killed or some time out occurs (I think it’s 2 hours but not sure…). While this connection is active, any subsequent work, for example reading a config file would trigger another file handle, a network poll would trigger another and so on. So at any point when tomcat is doing something it is consuming file handles. This is fine it is normal use, now we’ll come back to that later…
So with all this in mind, when the application does stuff… it needs resources… it’s kinda simple in the way it goes upwards, what seems to escape people is that these resources, much like memory needs to be freed again. You ask for a resource, you use it and you give it back, what happens when you don’t give it back? well it waits until a timeout or the application is killed. This is a resource leak which can lead to interesting things from a security point of view, and from an operational point of view it can stop the server from responding, or at least it would if the kernel didn’t have a hard limit in it, either way though, your box could go horribly wrong if this isn’t controlled.
So, By this point it should be making sense why you would put limits on how many file handles you would want each application to use. Which brings us onto the defaults of 1024, why 1024 and not 1000 ? well every file descriptor will take some memory, so by using 1024 rather than 1000 it allows for better utilisation of the memory and it’s easier for the computer to store a number such as 1024 compared to 1023. Moving along, I’d like to say each file descriptor was Xk but I don’t know the answer (my assumption is 4k but that’s an assumption) either way it is a resource and you can think of it as having physical mass.
Let’s increase the limits
Cool, by this point you have understood a bit of background on file descriptors, now someone say’s “Lets increase the file limit to 2048” That’s twice what it was, but not unreasonable. However, you should still have an understanding as to why it was 1024 and now needs to be 2048. If you don’t you are just throwing file descriptors at it because no one knows… this is bad.
Potentially, the application could be leaking file descriptors because someone forgot to close a connection or the application doesn’t handle the close connection in the expected way.
So a sensible thing is to ask how many is needed, in most cases someone can come back with “in our tests we saw it get to X” this is a good point to go to.
But setting it to unlimited is bad, if an argument comes back of it needs lots, or it’s dynamic etc.. Rubbish it’s a computer program it does exactly what it’s told to do, it will only ever produce X as a maximum of 1 triggered event and it can handle X events.
Imagine a file descriptor as a brick, If you asked a builder to build you a house, and you asked how many bricks he would need, i’m going to assume he will come back with a number based on experience and some maths involved to calculate how many bricks were needed.
I certainly wouldn’t expect him to say he needs all the bricks in the world, there’s not enough space on the building site to store all the bricks. Sure you can stack them up to a point, then it will fall over; the same happens with the OS when you set it to unlimited. Luckily the OS takes some precautions on this by hard limiting the number of open files to a number that is less than the total memory available on the computer. In some cases it restricts it even further.
So in short if someone say’s set it to unlimited they are probably trying to avoid doing work, either in working out what it is using or fixing a leaking file descriptor problem, these people need to wake up. It takes the stability of a system from a measurable amount an unknown, which is not good.
If you find your self in a situation where it’s out of your control, try to get to the point where the file descriptors are monitored, you can use this to work out an average and some tolerances on what is considered normal usage, and then wait for the application to crash…