Apache Out Of Memory Error
For sure many people have had this kind of problem when deploying an application over Apache Web Server.
An Out of Memory (OOM) error occurs when the Apache instances start to increase their physical memory usage, overloading the server resources and forcing the system to start swapping RAM. When the swap memory depletes, the whole integrity and stability of the operating system is put at risk, then the kernel decides that is better to start killing processes in order to keep himself alive.
An example of a OOM error:
# vim /var/log/messages
July 12 13:11:19 localhost kernel: [ 8267.512850] Out of memory: kill process 7986 (apache) score 15147 or a child
July 12 13:11:19 localhost kernel: [ 8267.513008] Killed process 7986 (apache)
How the system knows which processes are safe to kill is calculated by a fast -but some times inefficient- algorithm but in general terms the parameters are the following:
- Which processes are running as root?
- How long a process has been running?
- How many RAM/Swap is used by a process?
- if (process == init) Don’t touch it!
Because in an OOM situation, the system has just a fraction of seconds to make the decision, the heuristics are not quite optimal and sometimes they will be inaccurate.
Apache processes use a ton of RAM. This becomes major threat when you realize that after each process has done its job, the bloated process sits and spoon-feeds data to the client, instead of moving on to bigger and better things.
The trick to get around this consists in fixing the Apache configuration file; The default file is the following:
# cd /etc/httpd/conf/
# vi httpd.conf
Max clients defines the maximum number of connections a child process can handle simultaneously. When a child process reach this limit, new connections start to queue; Of course this connections have to be saved somewhere on the system… ungracefully they’re stored on the RAM and then the swap memory.
If you use Apache to serve dynamic content, your simultaneous connections are severely limited. Exceed a certain number, and your system begins cannibalistic swapping.
By default this is set to 150. You could try increasing this value, just keep in mind that in a normal and standard installation of Apache you are limited to 256 MaxClients (I’m covering this topic on another post). If you set it to low, a message like the following will be printed in your error.log file:
[error] server reached MaxClients setting, consider raising the MaxClients setting.
When the Apache daemon (the father process) is started, a fixed number of child processes is spawned creating a pool of servers ready to serve the clients. By default this is set to 5; Creating new child processes is expensive for the system resources and the best practice is to always have servers ready for all the clients that may connect to the server. If you have a lot of traffic you should think in increasing this value.
You can also define the max child processes your Apache server can spawn. As with the MaxClients option, when this value is reached new connections start to queue. A value between 20-50 should be fine in most situations.
Forcing your processes to die after awhile makes them start over with lower RAM usage. This can reduce total memory usage in many situations. This variable is very important on production systems. By default each Apache instance can serve an unlimited amount of clients/traffic. Child processes are never killed by the father process so, as time goes by, you can encounter with a process that has been running for, lets say, 28 days!.
If child processes never “recycle” themselves there is a high chance that the older ones may have used a lot of RAM they are not going to release until they die. Someday you will have to re-spawn this processes in order to keep the memory “clean”.
MaxRequestsPerChild defines the maximum connections a child process can serve. When the value is reached the child processes stops receiving new connections, awaits for the last current connection to be served and then it dies. Apache father process then re-spawns a new child and the cycle continues.
A value of 0 means unlimited (not recommended); In most cases values like 5000 would work fine but also values as low as 10 or 20 also may work for you. This is a game of catch-up, with your dynamic files constantly increasing total RAM usage, and restarting processes constantly reducing it. You have to experiment with this value.
There are more variables you can change and play with them, but those ones does have a little (if not null) relation with the OOM problem.
An OOM error is just a safety measure, NOT a system feature. The very existence of an OOM killer is a bug.
The goal of the sysadmin shall not be fix the OOM, but to get to the root of the problem and fix it. Poorly optimized queries and/or running the application on a piece of crap hardware, are common causes that may lead the server to an OOM.