This document is intended as a general treatment of WebFORCE server tuning. It covers the most commonly encountered tuning questions concerning HTTP serving, and may not cover special cases (such as specialized applications working in conjunction with an HTTP server). This document is also used as an enhancement to the release notes for WebFORCE HTTP servers -- this way, one can always refer to this page for the latest tuning information.
Currently, WebFORCE servers ship with version 6.2 of the IRIX operating
system and the Netscape Enterprise
server or the Netscape FastTrack server. The NCSA server (also known as
the "OutBox" server) is also shipped as part of the OS bundle
on all SGI systems. (See a related document called Tuning Hints for the OutBox Server).
This document covers issues for both the IRIX 6.2 and 5.3 operating
systems and applies to most web-based services (but most importantly
HTTP servers).
For just about any server which provides a network-based service (such as an HTTP server), there are four basic components to the server which must be considered. The
As previously mentioned, and it cannot be stressed enough, memory swapping kills web server performance. Memory utilization is probably the single most tunable variable of the four performance factors just discussed. Taking this into consideration, the following paragraphs will outline a few considerations when monitoring/tuning memory use.
Memory usage on a WebFORCE server can be monitored using any of the following tools:
Network-specific memory usage can be monitored via netstat -m, as seen below.
The Netscape Enterprise and FastTrack servers are configured in the following manner:
Thus, an active Netscape server with one server, one process, and 4 to 128 threads, will look like this in ps -efl:
F | S | UID | PID | PPID | C | PRI | NI | P | SZ: | RSS | WCHAN | STIME | TTY | TIME | CMD |
b0 | S | nobody | 588 | 1 | 0 | 39 | 20 | * | 3244: | 83 | 8821a660 | Aug 30 | ? | 0:00 | ./ns-httpd -d /usr/ns-home/httpd-fa... |
b0 | S | nobody | 592 | 588 | 0 | 75 | 35 | * | 6505: | 3402 | 883862f0 | Aug 30 | ? | 0:03 | ./ns-httpd -d /usr/ns-home/httpd-fa... |
b0 | S | nobody | 820 | 592 | 0 | 75 | 35 | * | 6505: | 3402 | 883cbb04 | Aug 30 | ? | 0:02 | ./ns-httpd -d /usr/ns-home/httpd-fa... |
b0 | S | nobody | 821 | 592 | 0 | 75 | 35 | * | 6505: | 3402 | 883cbb04 | Aug 30 | ? | 0:02 | ./ns-httpd -d /usr/ns-home/httpd-fa... |
b0 | S | nobody | 822 | 592 | 0 | 75 | 35 | * | 6505: | 3402 | 883cbb04 | Aug 30 | ? | 0:02 | ./ns-httpd -d /usr/ns-home/httpd-fa... |
The first process, PID 588, is the main server process. It spawned PID 592, which in turn spawned three children, bringing the total active processes to the minimum of 4. Note that the four processes share their space, so the total resident set size (RSS) for this server is 3402 4 KB pages, or approximately 13 MB, and a total of 25 MB has been reserved for all threads belonging to the one process. These are typical sizes for the Netscape 2.0 server.
The total number of active processes (other than the main server process, which usually has a PPID of 1) determine the number of simultaneous connections that can be serviced by your WebFORCE server. (Recall that simultaneous only means actively serving data, and does not represent the total number of users who are currently viewing your pages.)
When configuring the Netscape server, realize that each server process requires approximately 13 MB of memory, and may require as much as 25 MB. This includes space used by all of the threads under that process as well. (See Performance Tuning in the Netscape Admin server guide.) Often, just a single process is sufficient and other times multiple processes (say, 4 or 5) may improve performance. It is best to experiment a bit with this, but keep the memory requirements in mind, and avoid swapping.
named is the Internet domain name server. named is utilized by a Web server to map the originating IP address of each incoming HTTP requests to its fully qualified domain name (e.g. 192.26.51.11 to sgigate.sgi.com). named caches these mapping requests so it can handle the mappings in a timely fashion. Unfortunately, for a very high traffic Web server, named and its internal cache can grow quite large because requests will come literally from all over the world, yielding a high number of unique IP addresses and fully qualified domain names which must be cached.
Turning off domain name lookup for HTTP request logging would significantly reduce or eliminate named memory requirements. The following table highlights the memory requirements for named:
Log DNS |
named Process |
yes |
4 - 20 MB |
no |
0 - 2 MB |
Of course, enabling DNS lookup can reduce the performance of the web server (in some cases significantly). Simply turning on logging (even without DNS lookup) can slow down the performance of the web server by as much as 15% (based on WebSTONE measurements).
To service as many requests as possible, each process of the Netscape server attempts to process an incoming request and send out the response as fast as possible so it can handle the next incoming request. Unfortunately, the connections to the clients are not infinitely fast, especially if the client connection is over a 14.4 kbps modem. If each server process had to wait for the client to receive all of the data associated with the response, the server could not handle as many connections as it might if it could just hand off the data.
The solution to this problem is built in to the IRIX kernel: rather than having the process send data directly to the client, the data is instead passed into a network buffer. The IRIX kernel takes care of transferring the data from the network buffer to the client, and (at the same time) the server process can go on to handle the next request.
Control for both the size and number of network buffers available. Enough buffers are needed to be able to handle the large number of requests expected without forcing the server processes to wait for an available buffer (such delays can be seen via netstat -m, for example). When sizing these buffers, keep two things in mind:
Obviously, a compromise is necessary. For a typical web server, 95% or more of the content is less than 30 KB. We recommend reducing the maximum size for the TCP send buffers to 30 KB down from the 60 KB default setting. This is controlled by the TCP/IP kernel parameter, tcp_sendspace. (See TCP/IP Tuning below). Each connection is able to reserve and utilize an outgoing (send) buffer of up to 30 KB. There are two basic scenarios for how the Web server utilizes these buffers:
In one special case, it may be beneficial to decrease the buffer size further. For Ethernets, a situation can arise known as the Ethernet capture effect which can lead to excessive collisions. If excessive collisions are persistent on your Ethernet, try tuning down tcp_sendspace to 12 KB. Note that it may be required to increase the number of buffers (nm_clusters, see below) if this is done.
So, there is a need to have sufficient buffer space in the kernel to support not only the connections which are associated with Netscape server processes but there is also a need to have sufficient space for the buffers for the connections which are draining to the clients. This number is almost impossible to predict because it is based on factors such as:
All of these factors essentially determine the amount of kernel buffer storage. We want to have enough buffer space so as not to be the bottleneck resource of the web server. If insufficient buffer space were available, a server process would block its sending of data until buffer space became available. This insufficient buffer space situation is denoted as a "request for memory denied" or "request for memory delayed" in netstat -m (or Performance Co-Pilot).
A rough estimate of the kernel memory requirements is 2-4 times the number of server processes (total number of threads) configured for the server times the buffer size (tcp_sendspace) reserved per outgoing buffer (e.g., 30 KB). The 2-4 multiplier takes into consideration overhead of other supporting data structures. It is possible that a server would require more or less than this amount of memory dedicated to networking data structures. The IRIX kernel dynamically allocates chunks of network memory in 4 KB units called clusters. Allocation occurs up to a limit called nm_clusters (another kernel parameter.) See TCP/IP Tuning section below. If memory requests for are denied or delayed, it may be an indication that your Internet connection throughput is insufficient during peak load; the connections may be backing up and clients may be experiencing long wait times.
The command:
will report kernel memory utilization statistics. For example:
The key things to look for are in bold above and described in a little more detail below.
Each of the following parameters is found in /var/sysgen/master.d/bsd. Before making any changes to these parameters, it is strongly recommended that a copy of the file be made before making any modifications, and add comments indicating the changes. Once these change these parameters, a reboot of the system is required for them to take effect.
tcp_sendspace
For a typical web server, 95% or more of the content is less than 30 KB. We recommend reducing the maximum size for the TCP send buffers to 30 KB down from 60 KB default setting. This is controlled by the TCP/IP kernel parameter, tcp_sendspace. The line should look like this:
unsigned long tcp_sendspace = 30 * 1024; /* must be < 512 K */
nm_clusters
Once the buffer size is determined, the amount of buffer space needed can be tuned also. This is determined by the number of "clusters" reserved by the kernel. This is what the default setting looks like:
int nm_clusters = 0;
The default value, 0, instructs the kernel to determine the value for nm_clusters, which is dynamically calculated to be one eighth of the total physical memory. The value of nm_clusters represents number of 4 KB pages (clusters) reserved for network data buffers. Compare one eighth of the physical memory to [(2-4) x (the total number of threads) x (the buffer size, tcp_sendspace)], and be sure that the greater of the two is represented by nm_clusters. For example, suppose the machine has 128 MB of physical memory, then the buffer space would be by default 128 MB/8 = 16 MB which is (16*1024*1024)/(4*1024) = 4096 clusters. This formula can be used to calculate the nm_clusters parameter if more memory is required.
tcp_keepidle, tcp_keepintvl, and tcp_keep_timer_in_close
Web servers on the Internet encounter many more connection hiccups than normally occur in typical high-speed local area networks. Slow, flaky PPP connections over modems are one of the largest contributors. A high traffic Web site can often be drowned with accumulated, dead, idle connections which have not properly closed. The original timeout specified by the TCP specifications was over 8 hours! In both IRIX 6.2 and IRIX 5.3 you can reduce this timeout to a value more appropriate for the traffic patterns of a Web server. Note: IRIX 5.3 requires an operating system patch to allow modification.
The default timeout value is 2 hours, specified in half-second units (i.e. 2 hours * 60 minutes/hour * 60 seconds/minute * 2 half-second-units/second). You may want to tune this value lower if you have a very high traffic site. Why? Consider that during a really busy time, Web users browsing HTML files have probably lost interest in the content you were sending them if there has been no acknowledgement in the last, say, 15 minutes. Some other manufacturers say you should just buy more memory to accommodate these probably dead connections and let them sit around for hours taking up resources. We'll be more than happy to sell you more memory if you want it.
The best value should allow enough time for most connections to complete so you don't end up dealing with unnecessary interrupts. A good value seems to be between 15 minutes (15 * 60 * 2) and 30 minutes (30 * 60 * 2). Don't set the value below 10 minutes or your server may spend too much time handling interrupts for active connections instead only timing out real idle connections.
Once the time for a connection exceeds the tcp_keepidle interval above, the TCP/IP code periodically checks at a shorter interval to see if the connection is still alive. The parameter for adjusting this value, tcp_keepintvl, is shown below only for discussion purposes and probably should not be changed from its default value.
Earlier TCP/IP implementations did not provide a means for timing out connections which were ungraciously dropped while in the processing of closing a connection. The server TCP buffers would still be filled with data to be drained by the client and would hang around indefinitely. The IRIX operating system (again, via a patch for IRIX 5.3) fixes this by allowing timing out of connections which have been dropped while in this closing state so that the associated memory is released. Be sure tcp_keep_timer_in_close is set to 1, rather than its default value of 0:
somaxconn
The default number of pending socket connections in most flavors of UNIX is 5 (sometimes 7). HTTP's model of opening a socket per request as well as the tendency of popular Web browsers such as Netscape Navigator to issue multiple simultaneous requests (and thus open multiple sockets) to download the inline images on an HTML page greatly increases the chance that the default queue of pending requests will overflow, thus refusing connections.
In IRIX 6.2, somaxconn is automatically set to 1024, and is no longer a tunable parameter. In IRIX 5.3, however, it is necessary to tune this parameter. To effect this change, first install the latest "networking rollup patch" currently available. Then edit the file /var/sysgen/master.d/bsd (be sure to save a copy first!), and modify the value of somaxconn. Note that in 5.3, the maximum value is 1000. Setting it to a higher value will cause it to be automatically be reset to 5! Once you have made your changes, reconfigure the kernel and reboot. (See below for details.)
Note: the Netscape Server will, by default, request a listen queue
depth of 128.
The latest versions of the Netscape HTTP servers are significantly different in how they manage system resources (such as memory - see above). As multiple server processes (threads) share the same memory space, they communicate via semaphores. If there is a large number of independent server processes (e.g., each listening to a different IP address), the following parameters will probably need to be modified:
Netscape 2.0 and later browsers include the hostname in a GET
request (earlier versions, and some other browsers, strip the hostname
from the GET request, only using the hostname for the DNS
lookup). In this case, one Netscape server can be used to service
multiple hostnames, where each hostname has the same IP address.
Here, one server acts as several servers, without having to have one
separate server process (or set of processes, really) for each hostname
- and you don't have to use up your IP addresses on IP aliases!
This is referred to as "Software Virtual Servers" - refer to
the Netscape Administration guide, or the admin server, for help.
Tuning WebFORCE servers requires operating system patches which have corrected bugs found by the highest traffic Web sites and for IRIX 5.3 are required to enable tuning of the various kernel parameters suggested.
Patches are available to customers under support contracts. For more
information, see Important
Information About Patches available from Silicon Graphics' Supportfolio Online.
Follow these steps when modifying kernel parameters as described above.
These guidelines are intended to be a quick reference for tuning. If you are unfamiliar with Unix(TM), IRIX(TM), or the Netscape servers, you should definitely read the following books for background information. These books are available online:
Other references of interest: