SGI WebFORCE Server Tuning Guide


Table of Contents


Scope of this Document

This document is intended as a general treatment of WebFORCE server tuning. It covers the most commonly encountered tuning questions concerning HTTP serving, and may not cover special cases (such as specialized applications working in conjunction with an HTTP server). This document is also used as an enhancement to the release notes for WebFORCE HTTP servers -- this way, one can always refer to this page for the latest tuning information.

Currently, WebFORCE servers ship with version 6.2 of the IRIX operating system and the Netscape Enterprise server or the Netscape FastTrack server. The NCSA server (also known as the "OutBox" server) is also shipped as part of the OS bundle on all SGI systems. (See a related document called Tuning Hints for the OutBox Server). This document covers issues for both the IRIX 6.2 and 5.3 operating systems and applies to most web-based services (but most importantly HTTP servers).

[Return to Table of Contents]

Server Performance Factors

For just about any server which provides a network-based service (such as an HTTP server), there are four basic components to the server which must be considered. The

[Return to Table of Contents]

Understanding Memory Usage

As previously mentioned, and it cannot be stressed enough, memory swapping kills web server performance. Memory utilization is probably the single most tunable variable of the four performance factors just discussed. Taking this into consideration, the following paragraphs will outline a few considerations when monitoring/tuning memory use.

Monitoring of Memory Usage

Memory usage on a WebFORCE server can be monitored using any of the following tools:

Network-specific memory usage can be monitored via netstat -m, as seen below.

Netscape Server Process Basic Memory Requirements

The Netscape Enterprise and FastTrack servers are configured in the following manner:

Thus, an active Netscape server with one server, one process, and 4 to 128 threads, will look like this in ps -efl:

F S UID PID PPID C PRI NI P SZ: RSS WCHAN STIME TTY TIME CMD
b0 S nobody 588 1 0 39 20 * 3244: 83 8821a660 Aug 30 ? 0:00 ./ns-httpd -d /usr/ns-home/httpd-fa...
b0 S nobody 592 588 0 75 35 * 6505: 3402 883862f0 Aug 30 ? 0:03 ./ns-httpd -d /usr/ns-home/httpd-fa...
b0 S nobody 820 592 0 75 35 * 6505: 3402 883cbb04 Aug 30 ? 0:02 ./ns-httpd -d /usr/ns-home/httpd-fa...
b0 S nobody 821 592 0 75 35 * 6505: 3402 883cbb04 Aug 30 ? 0:02 ./ns-httpd -d /usr/ns-home/httpd-fa...
b0 S nobody 822 592 0 75 35 * 6505: 3402 883cbb04 Aug 30 ? 0:02 ./ns-httpd -d /usr/ns-home/httpd-fa...

The first process, PID 588, is the main server process. It spawned PID 592, which in turn spawned three children, bringing the total active processes to the minimum of 4. Note that the four processes share their space, so the total resident set size (RSS) for this server is 3402 4 KB pages, or approximately 13 MB, and a total of 25 MB has been reserved for all threads belonging to the one process. These are typical sizes for the Netscape 2.0 server.

The total number of active processes (other than the main server process, which usually has a PPID of 1) determine the number of simultaneous connections that can be serviced by your WebFORCE server. (Recall that simultaneous only means actively serving data, and does not represent the total number of users who are currently viewing your pages.)

When configuring the Netscape server, realize that each server process requires approximately 13 MB of memory, and may require as much as 25 MB. This includes space used by all of the threads under that process as well. (See Performance Tuning in the Netscape Admin server guide.) Often, just a single process is sufficient and other times multiple processes (say, 4 or 5) may improve performance. It is best to experiment a bit with this, but keep the memory requirements in mind, and avoid swapping.

DNS named Memory Requirements

named is the Internet domain name server. named is utilized by a Web server to map the originating IP address of each incoming HTTP requests to its fully qualified domain name (e.g. 192.26.51.11 to sgigate.sgi.com). named caches these mapping requests so it can handle the mappings in a timely fashion. Unfortunately, for a very high traffic Web server, named and its internal cache can grow quite large because requests will come literally from all over the world, yielding a high number of unique IP addresses and fully qualified domain names which must be cached.

Turning off domain name lookup for HTTP request logging would significantly reduce or eliminate named memory requirements. The following table highlights the memory requirements for named:

Log DNS
Lookup

named Process
Memory Requirements

yes

4 - 20 MB

no

0 -   2 MB

Of course, enabling DNS lookup can reduce the performance of the web server (in some cases significantly). Simply turning on logging (even without DNS lookup) can slow down the performance of the web server by as much as 15% (based on WebSTONE measurements).

Networking Kernel Memory Buffer Requirements

To service as many requests as possible, each process of the Netscape server attempts to process an incoming request and send out the response as fast as possible so it can handle the next incoming request. Unfortunately, the connections to the clients are not infinitely fast, especially if the client connection is over a 14.4 kbps modem. If each server process had to wait for the client to receive all of the data associated with the response, the server could not handle as many connections as it might if it could just hand off the data.

The solution to this problem is built in to the IRIX kernel: rather than having the process send data directly to the client, the data is instead passed into a network buffer. The IRIX kernel takes care of transferring the data from the network buffer to the client, and (at the same time) the server process can go on to handle the next request.

Control for both the size and number of network buffers available. Enough buffers are needed to be able to handle the large number of requests expected without forcing the server processes to wait for an available buffer (such delays can be seen via netstat -m, for example). When sizing these buffers, keep two things in mind:

  1. If the server process wants to send something which is larger than the buffer, it pushes out a buffer-sized chunk first, waits until that has fully drained to the client, and then sends out the next chunk. The server process is incapable of handling the next request until it has sent the last chunk of data to the network buffer. Thus, the network buffers should be large enough so that the server processes don't have to spend all of their time breaking off chunks of data to send to the buffers.

  2. At the same time, clients often drop connections leaving large amounts of data sitting in kernel buffers waiting to be drained to the client. This data never actually gets transmitted to the client so it is dead-weight in the kernel until the kernel does buffer cleanup. Thus, buffers should also not be too large.

Obviously, a compromise is necessary. For a typical web server, 95% or more of the content is less than 30 KB. We recommend reducing the maximum size for the TCP send buffers to 30 KB down from the 60 KB default setting. This is controlled by the TCP/IP kernel parameter, tcp_sendspace. (See TCP/IP Tuning below). Each connection is able to reserve and utilize an outgoing (send) buffer of up to 30 KB. There are two basic scenarios for how the Web server utilizes these buffers:

  1. HTTP response <= 30 KB: server process sends data to the buffer, and is immediately available to service another request, letting the IRIX kernel drain the buffer to the client and close the connection.

  2. HTTP response > 30 KB: server process sends data up to the buffer limit (30 KB) to the network buffer, waits for the buffer to drain to the client, then sends next batch of data up to the buffer limit. This continues until the last 30 KB or less of data is sent at which time the server process is able to service the next request, letting the IRIX kernel drain the buffer to the client and close the connection.

In one special case, it may be beneficial to decrease the buffer size further. For Ethernets, a situation can arise known as the Ethernet capture effect which can lead to excessive collisions. If excessive collisions are persistent on your Ethernet, try tuning down tcp_sendspace to 12 KB. Note that it may be required to increase the number of buffers (nm_clusters, see below) if this is done.

So, there is a need to have sufficient buffer space in the kernel to support not only the connections which are associated with Netscape server processes but there is also a need to have sufficient space for the buffers for the connections which are draining to the clients. This number is almost impossible to predict because it is based on factors such as:

All of these factors essentially determine the amount of kernel buffer storage. We want to have enough buffer space so as not to be the bottleneck resource of the web server. If insufficient buffer space were available, a server process would block its sending of data until buffer space became available. This insufficient buffer space situation is denoted as a "request for memory denied" or "request for memory delayed" in netstat -m (or Performance Co-Pilot).

A rough estimate of the kernel memory requirements is 2-4 times the number of server processes (total number of threads) configured for the server times the buffer size (tcp_sendspace) reserved per outgoing buffer (e.g., 30 KB). The 2-4 multiplier takes into consideration overhead of other supporting data structures. It is possible that a server would require more or less than this amount of memory dedicated to networking data structures. The IRIX kernel dynamically allocates chunks of network memory in 4 KB units called clusters. Allocation occurs up to a limit called nm_clusters (another kernel parameter.) See TCP/IP Tuning section below. If memory requests for are denied or delayed, it may be an indication that your Internet connection throughput is insufficient during peak load; the connections may be backing up and clients may be experiencing long wait times.

Using netstat to Determine IRIX Kernel Networking Memory Utilization

The command:

/usr/etc/netstat -m

will report kernel memory utilization statistics. For example:

35602/35712 mbufs in use:
        33992 mbufs allocated to data
        367 mbufs allocated to packet headers
        417 mbufs allocated to socket structures
        812 mbufs allocated to protocol control blocks
        11 mbufs allocated to routing table entries
        1 mbufs allocated to socket names and addresses
        2 mbufs allocated to interface addresses
1412/1419 mapped pages in use
5676 Kbytes allocated to network (99% in use)
0 requests for memory denied
0 requests for memory delayed
0 calls to protocol drain routines
Resource    Failures  Avail   In Use   Max Used   Total Used
streams           0      0       35         39         6886
events            0      0        2          2            2
queues            0      0      162        182        40774
link blks         0      0        0          0            0
mdb blks          0      3       69       1408       512347
msg blks          0      7       13        401       123598

The key things to look for are in bold above and described in a little more detail below.

[Return to Table of Contents]

Networking (TCP/IP) Kernel Tuning Parameters

Each of the following parameters is found in /var/sysgen/master.d/bsd. Before making any changes to these parameters, it is strongly recommended that a copy of the file be made before making any modifications, and add comments indicating the changes. Once these change these parameters, a reboot of the system is required for them to take effect.

TCP Buffer Size

tcp_sendspace

For a typical web server, 95% or more of the content is less than 30 KB. We recommend reducing the maximum size for the TCP send buffers to 30 KB down from 60 KB default setting. This is controlled by the TCP/IP kernel parameter, tcp_sendspace. The line should look like this:

unsigned long tcp_sendspace = 30 * 1024; /* must be < 512 K */

Network Buffer Space

nm_clusters

Once the buffer size is determined, the amount of buffer space needed can be tuned also. This is determined by the number of "clusters" reserved by the kernel. This is what the default setting looks like:

int nm_clusters = 0;

The default value, 0, instructs the kernel to determine the value for nm_clusters, which is dynamically calculated to be one eighth of the total physical memory. The value of nm_clusters represents number of 4 KB pages (clusters) reserved for network data buffers. Compare one eighth of the physical memory to [(2-4) x (the total number of threads) x (the buffer size, tcp_sendspace)], and be sure that the greater of the two is represented by nm_clusters. For example, suppose the machine has 128 MB of physical memory, then the buffer space would be by default 128 MB/8 = 16 MB which is (16*1024*1024)/(4*1024) = 4096 clusters. This formula can be used to calculate the nm_clusters parameter if more memory is required.

Closing of Idle Connections

tcp_keepidle, tcp_keepintvl, and tcp_keep_timer_in_close

Web servers on the Internet encounter many more connection hiccups than normally occur in typical high-speed local area networks. Slow, flaky PPP connections over modems are one of the largest contributors. A high traffic Web site can often be drowned with accumulated, dead, idle connections which have not properly closed. The original timeout specified by the TCP specifications was over 8 hours! In both IRIX 6.2 and IRIX 5.3 you can reduce this timeout to a value more appropriate for the traffic patterns of a Web server. Note: IRIX 5.3 requires an operating system patch to allow modification.

The default timeout value is 2 hours, specified in half-second units (i.e. 2 hours * 60 minutes/hour * 60 seconds/minute * 2 half-second-units/second). You may want to tune this value lower if you have a very high traffic site. Why? Consider that during a really busy time, Web users browsing HTML files have probably lost interest in the content you were sending them if there has been no acknowledgement in the last, say, 15 minutes. Some other manufacturers say you should just buy more memory to accommodate these probably dead connections and let them sit around for hours taking up resources. We'll be more than happy to sell you more memory if you want it.

The best value should allow enough time for most connections to complete so you don't end up dealing with unnecessary interrupts. A good value seems to be between 15 minutes (15 * 60 * 2) and 30 minutes (30 * 60 * 2). Don't set the value below 10 minutes or your server may spend too much time handling interrupts for active connections instead only timing out real idle connections.

int tcp_keepidle = (15 * 60 * 2); /* You may want to change this */

Once the time for a connection exceeds the tcp_keepidle interval above, the TCP/IP code periodically checks at a shorter interval to see if the connection is still alive. The parameter for adjusting this value, tcp_keepintvl, is shown below only for discussion purposes and probably should not be changed from its default value.

int tcp_keepintvl = (75 * 2); /* Don't change! */

Earlier TCP/IP implementations did not provide a means for timing out connections which were ungraciously dropped while in the processing of closing a connection. The server TCP buffers would still be filled with data to be drained by the client and would hang around indefinitely. The IRIX operating system (again, via a patch for IRIX 5.3) fixes this by allowing timing out of connections which have been dropped while in this closing state so that the associated memory is released. Be sure tcp_keep_timer_in_close is set to 1, rather than its default value of 0:

int tcp_keep_timer_in_close = 1; /* Change this; default is 0 */

Maximum Number of Pending Socket Connections

somaxconn

The default number of pending socket connections in most flavors of UNIX is 5 (sometimes 7). HTTP's model of opening a socket per request as well as the tendency of popular Web browsers such as Netscape Navigator to issue multiple simultaneous requests (and thus open multiple sockets) to download the inline images on an HTML page greatly increases the chance that the default queue of pending requests will overflow, thus refusing connections.

In IRIX 6.2, somaxconn is automatically set to 1024, and is no longer a tunable parameter. In IRIX 5.3, however, it is necessary to tune this parameter. To effect this change, first install the latest "networking rollup patch" currently available. Then edit the file /var/sysgen/master.d/bsd (be sure to save a copy first!), and modify the value of somaxconn. Note that in 5.3, the maximum value is 1000. Setting it to a higher value will cause it to be automatically be reset to 5! Once you have made your changes, reconfigure the kernel and reboot. (See below for details.)

Note: the Netscape Server will, by default, request a listen queue depth of 128.

[Return to Table of Contents]

Tuning for Many Server Processes

The latest versions of the Netscape HTTP servers are significantly different in how they manage system resources (such as memory - see above). As multiple server processes (threads) share the same memory space, they communicate via semaphores. If there is a large number of independent server processes (e.g., each listening to a different IP address), the following parameters will probably need to be modified:

How Can I Avoid These Process Limits?

Netscape 2.0 and later browsers include the hostname in a GET request (earlier versions, and some other browsers, strip the hostname from the GET request, only using the hostname for the DNS lookup). In this case, one Netscape server can be used to service multiple hostnames, where each hostname has the same IP address. Here, one server acts as several servers, without having to have one separate server process (or set of processes, really) for each hostname - and you don't have to use up your IP addresses on IP aliases! This is referred to as "Software Virtual Servers" - refer to the Netscape Administration guide, or the admin server, for help.

[Return to Table of Contents]

IRIX Operating System Patches

Tuning WebFORCE servers requires operating system patches which have corrected bugs found by the highest traffic Web sites and for IRIX 5.3 are required to enable tuning of the various kernel parameters suggested.

Patches are available to customers under support contracts. For more information, see Important Information About Patches available from Silicon Graphics' Supportfolio Online.

[Return to Table of Contents]

Modifying Kernel Parameters

Follow these steps when modifying kernel parameters as described above.

  1. First become the root user via the su command:

    %su
     
  2. Copy the file to modified (as a safety measure).

    # cd /var/sysgen/master.d
    # cp bsd bsd.orig
     
  3. Edit the file, using your favorite editor.

    # jot bsd
     
  4. Save and close the file.
     
  5. Reconfigure the kernel.

    # autoconfig -f
     
    This will create a new version of the kernel, called /unix.install, which will replace the older version of the kernel, /unix, upon reboot. If desired, to retain the older version of the kernel, rename it to something like /unix.good, and rename /unix.install to /unix.

  6. Reboot your system. After this, the changes made will take effect.
[Return to Table of Contents]

Further Reading

These guidelines are intended to be a quick reference for tuning. If you are unfamiliar with Unix(TM), IRIX(TM), or the Netscape servers, you should definitely read the following books for background information. These books are available online:

Other references of interest:

[Return to Table of Contents]