A program that does not perform acceptably is not functional.
Every program has to satisfy a set of users--admittedly, sometimes a large and diverse set. If the performance of the program is truly unacceptable to a significant group of those users, it will not be used. A program that is not being used is not performing its intended function.
This is true of licensed software packages as well as user-written applications, although most developers of software packages are aware of the effects of poor performance and take pains to make their programs run as fast as possible. Unfortunately, they can't anticipate all of the environments and uses that their programs will experience. Final responsibility for acceptable performance falls on the people who select or write, plan for, and install software packages.
This chapter attempts to describe the stages by which a programmer or system administrator can ensure that a newly written or purchased program has acceptable performance. (Wherever the word programmer appears alone, the term includes system administrators and anyone else who is responsible for the ultimate success of a program.)
The way to achieve acceptable performance in a program is to identify and quantify acceptability at the start of the project and never lose sight of the measures and resources needed to achieve it. This prescription borders on banal, but some programming projects consciously reject it. They adopt a policy that might be fairly described as "design, code, debug, maybe document, and if we have time, fix up the performance."
The only way that programs can predictably be made to function in time, not just in logic, is by integrating performance considerations in the software planning and development process. Advance planning is perhaps more critical when existing software is being installed, because the installer has fewer degrees of freedom than the developer.
Although the detail of this process may seem burdensome for a small program, remember that we have a second agenda. Not only must the new program have satisfactory performance; we must also ensure that the addition of that program to an existing system does not cause the performance of other programs run on that system to become unsatisfactory.
This topic includes the following major sections:
Related sections are:
Whether the program is new or purchased, small or large, the developers, the installers, and the prospective users have assumptions about the use of the program, such as:
Unless these ideas are elicited as part of the design process, they will probably be vague, and the programmers will almost certainly have different assumptions than the prospective users. Even in the apparently trivial case in which the programmer is also the user, leaving the assumptions unarticulated makes it impossible to compare design to assumptions in any rigorous way. Worse, it is impossible to identify performance requirements without a complete understanding of the work being performed.
In identifying and quantifying performance requirements, it is important to identify the reasoning behind a particular requirement. Users may be basing their statements of requirements on assumptions about the logic of the program that do not match the programmer's assumptions. At a minimum, a set of performance requirements should document:
If the user denies interest in response time and indicates that only the answer is of interest, you can ask whether (ten times your current estimate of stand-alone execution time) would be acceptable. If the answer is "yes," you can proceed to discuss throughput. Otherwise, you can continue the discussion of response time with the user's full attention.
Unless you are purchasing a software package that comes with detailed resource-requirement documentation, resource estimation can be the most difficult task in the performance-planning process. The difficulty has several causes:
A useful guideline is that, the higher the level of abstraction, the more caution is needed to ensure that one doesn't receive a performance surprise. One must think very carefully about the data volumes and number of iterations implied by some apparently harmless constructs.
There are two approaches to dealing with resource-report ambiguity and variability. The first is to ignore the ambiguity and to keep eliminating sources of variability until the measurements become acceptably consistent. The second approach is to try to make the measurements as realistic as possible and describe the results statistically. We prefer the latter, since it yields results that have some correlation with production situations.
Our recommendation is to keep your estimate as close to reality as the specific situation allows:
In resource estimation, we are primarily interested in four dimensions (in no particular order):
CPU time | Processor cost of the workload |
Disk accesses | Rate at which the workload generates disk reads or writes |
Real memory | Amount of RAM the workload requires |
LAN traffic | Number of packets the workload generates and the number of bytes of data exchanged |
The following sections describe, or refer you to descriptions of, the techniques for determining these values in the various situations just described.
If the real program, a comparable program, or a prototype is available for measurement, the choice of technique depends on:
This is the ideal situation because it allows us to use measurements that include system overhead as well as the cost of individual processes.
To measure CPU and disk activity, we can use iostat. The command
$ iostat 5 >iostat.output
gives us a picture of the state of the system every 5 seconds during the measurement run. Remember that the first set of iostat output contains the cumulative data from the last boot to the start of the iostat command. The remaining sets are the results for the preceding interval, in this case 5 seconds. A typical set of iostat output on a large system looks like this:
tty: tin tout cpu: % user % sys % idle % iowait 1.2 1.6 60.2 10.8 23.4 5.6 Disks: % tm_act Kbps tps Kb_read Kb_wrtn hdisk1 0.0 0.0 0.0 0 0 hdisk2 0.0 0.0 0.0 0 0 hdisk3 0.0 0.0 0.0 0 0 hdisk4 0.0 0.0 0.0 0 0 hdisk11 0.0 0.0 0.0 0 0 hdisk5 0.0 0.0 0.0 0 0 hdisk6 0.0 0.0 0.0 0 0 hdisk7 3.0 11.2 0.8 8 48 hdisk8 1.8 4.8 1.2 0 24 hdisk9 0.0 0.0 0.0 0 0 hdisk0 2.0 4.8 1.2 24 0 hdisk10 0.0 0.0 0.0 0 0
To measure memory, we would use svmon. The command svmon -G gives a picture of overall memory use. The statistics are in terms of 4KB pages:
$ svmon -G m e m o r y i n u s e p i n p g s p a c e size inuse free pin work pers clnt work pers clnt size inuse 24576 24366 210 2209 15659 6863 1844 2209 0 0 40960 26270
This machine's 96MB memory is fully used. About 64% of RAM is in use for working segments--the read/write memory of running programs. If there are long-running processes that we are interested in, we can review their memory requirements in detail. The following example determines the memory used by one of user xxxxxx's processes.
$ ps -fu xxxxxx USER PID PPID C STIME TTY TIME CMD xxxxxx 28031 51445 15 14:01:56 pts/9 0:00 ps -fu xxxxxx xxxxxx 51445 54772 1 07:57:47 pts/9 0:00 -ksh xxxxxx 54772 6864 0 07:57:47 - 0:02 rlogind $ svmon -P 51445 Pid Command Inuse Pin Pgspace 51445 ksh 1668 2 4077 Pid: 51445 Command: ksh Segid Type Description Inuse Pin Pgspace Address Range 8270 pers /dev/fslv00:86079 1 0 0 0..0 4809 work shared library 1558 0 4039 0..4673 : 60123..65535 9213 work private 37 2 38 0..31 : 65406..65535 8a1 pers code,/dev/hd2:14400 72 0 0 0..91
The working segment (9213), with 37 pages in use, is the cost of this instance of ksh. The 1558-page cost of the shared library and the 72-page cost of the ksh executable are spread across all of the running programs and all instances of ksh, respectively.
If we believe that our 96MB system is larger than necessary, we can use the rmss command to reduce the effective size of the machine and remeasure the workload. If paging increases significantly or response time deteriorates, we have reduced memory too far. This technique can be continued until we find a size that just runs our workload without degradation. See Assessing Memory Requirements via the rmss Command for more information on this technique.
The primary command for measuring network usage is netstat. The following example shows the activity of a specific Token-Ring interface:
$ netstat -I tr0 5 input (tr0) output input (Total) output packets errs packets errs colls packets errs packets errs colls 35552822 213488 30283693 0 0 35608011 213488 30338882 0 0 300 0 426 0 0 300 0 426 0 0 272 2 190 0 0 272 2 190 0 0 231 0 192 0 0 231 0 192 0 0 143 0 113 0 0 143 0 113 0 0 408 1 176 0 0 408 1 176 0 0
The first line of the report shows the cumulative network traffic since the last boot. Each subsequent line shows the activity for the preceding 5-second interval.
The techniques of measurement on production systems are similar to those on dedicated systems, but we must take pains to avoid degrading system performance. For example, the svmon -G command is very expensive to run. The one shown earlier took about 5 seconds of CPU time on a Model 950. Estimates of the resource costs of the most frequently used performance tools are shown in Appendix E, Performance of the Performance Tools.
Probably the most cost-effective tool is vmstat, which supplies data on memory, I/O, and CPU usage in a single report. If the vmstat intervals are kept reasonably long, say 10 seconds, the average cost is low--about .01 CPU seconds per report on a model 950. See Identifying the Performance-Limiting Resource for more information on the use of vmstat.
By partial workload we mean measuring a part of the production system's workload for possible transfer to or duplication on a different system. Because this is a production system, we must be as unobtrusive as possible. At the same time, we have to analyze the workload in more detail to distinguish between the parts we are interested in and those we aren't. To do a partial measurement we need to discover what the workload elements of interest have in common. Are they:
Depending on the commonality, we could use one of the following:
ps -ef | grep pgmname ps -fuusername, . . . ps -ftttyname, . . .
to identify the processes of interest and report the cumulative CPU time consumption of those processes. We can then use svmon (judiciously!) to assess the memory use of the processes.
There are many tools for measuring the resource consumption of individual programs. Some of these programs are capable of more comprehensive workload measurements as well, but are too intrusive for use on production systems. Most of these tools are discussed in depth in the chapters that discuss tuning for minimum consumption of specific resources. Some of the more prominent are:
time | measures the elapsed execution time and CPU consumption of an individual program. Discussed in Using the time Command to Measure CPU Use. |
tprof | measures the relative CPU consumption of programs, subroutine libraries, and the AIX kernel. Discussed in Using tprof to Analyze Programs for CPU Use. |
svmon | measures the real memory used by a process. Discussed in How Much Memory is Really Being Used?. |
vmstat -s | can be used to measure the I/O load generated by a program. Discussed in Measuring Overall Disk I/O with vmstat. |
It is impossible to make precise estimates of unwritten programs. The invention and redesign that take place during the coding phase defy prediction, but the following rules of thumb may help you to get a general sense of the requirements. As a starting point, a minimal program would need:
Add to that basic cost allowances for demands implied by the design (the CPU times given are for a Model 580):
The best method for estimating peak and typical resource requirements is to use a queuing model such as BEST/1. Static models can be used, but you run the risk of overestimating or underestimating the peak resource. In either case, you need to understand how multiple programs in a workload interact from the standpoint of resource requirements.
If you are building a static model, use a time interval that is the specified worst-acceptable response time for the most frequent or demanding program (usually they are the same). Determine, based on your projected number of users, their think time, their key entry rate, and the anticipated mix of operations, which programs will typically be running during each interval.
Remember that these guidelines for a "back of an envelope" estimate are intended for use only when no extensive measurement is possible. Any application-specific measurement that can be used in place of a guideline will improve the accuracy of the estimate considerably.