[ Next Article | Previous Article | Book Contents | Library Home | Legal | Search ]
Commands Reference, Volume 5

tprof Command

Purpose

Reports CPU usage.

Syntax

tprof Program

tprofProgram | { -d | -e  | -k | -p Program | -s | -t Process_Id | -v | { [ -i Trace_File ] [ -n Gennames_File ] | { -x Command } }

Notes:

Description

The tprof command reports CPU usage for individual programs and the system as a whole. This command is a useful tool for anyone with a C or C++ or FORTRAN program that might be CPU-bound and who wants to know which sections of this program are most heavily using the CPU. The tprof command also reports the fraction of time the CPU is idle. These reports can be useful in determining CPU usage in a global sense.

Note: Only the root user and members of the security group should have execute (x) access to this command.

The tprof command specifies the user program to be profiled, executes the user program, and then produces a set of files containing reports. The user specifies the name of the program to be profiled, or alternatively, the name of the program to be profiled and a command line to be executed. Both the Program and Command variables must be executable.

In the AIX operating system, an interrupt occurs periodically to allow a "housekeeping" kernel routine to run. This housekeeping occurs 100 times per second. When the tprof command is invoked, the housekeeping kernel routine records the process ID and the address of the instruction executing when the interrupt occurred. With both the instruction address and process ID, the tprof analysis routines can charge CPU time to processes and threads, to subprograms, and even to source lines of programs. Charging CPU time to source program lines is called microprofiling.

Note: Microprofiling can only be performed on code compiled with the -g option. It is also necessary for the source code to reside in the working directory, that is, the directory in which the tprof command is invoked. The proftest working directory is used in the examples for this command.

For subprogram-level profiling, the tprof command can be run without modifying your executable program. You do not have to recompile with special compiler flags or linker options. This means that you can obtain a subprogram profile of any executable module that has already been built. However, as previously stated, recompilation is required to get a microprofile.

The files containing the reports are left in the working directory. All files created by the tprof command are prefixed by ___ (two underscores). Prefixing with underscores allows the tprof command user to easily identify all tprof-created files. In this text, it is assumed that all executables are kept in the working directory to simplify the explanation of the tool. (To keep executables in another directory, specify the full path name of the executable file to the tprof command.) It is recommended that a working directory be created specifically for profiling and that copies of executables and source files be placed (or linked) in that directory. In the examples , the working directory is proftest.

In its simplest form, the tprof command is issued as follows:

tprof program

When used without arguments, the tprof usage statement is displayed. When using the -x Command flag, one of the -s, -k, or -p flag must be specified to get the specific profiling report. If none of these flags are specified, only the summary report is produced. Using the -t flag limits the report to information about the specified process, which is named in the Process_Id parameter.

The -i Trace_File flag allows for offline processing by tprof of trace data files created by the system trace command. The -n flag allows you to specify a Gennames_File to be used when procesing an offline file. These flags are useful when it is necessary to postprocess a trace file from a remote machine or perform the trace data collection at one time and postprocess it at another time. In this case -n with a Gennames_File must be used from the machine that the trace came from. The flags are also useful when system loading is high and trace hooks are being missed by tprof. The offline option relieves this problem. Trace hooks relevent to tprof must be collected by the trace command and are specified by the trace -j flag. The Gennames_File is then executed to collected additional information for tprof. Once the trace and Ggennames_File has executed trcrpt -r must be executed on the trace logfile and redirected to another file. At this point an adjusted trace logfile and a Gennames_File is feed into tprof.

Example:

trace -a -T 768000 -L 10000000 -o trace.out -j 000,001,002,003,005,006,234,106,
10C,134,139,00A,465

gennames > gennames.out
trcrpt -r   trace.out > trace.rpt

Then tprof with at least -i and -n flag:

tprof -i trace.rpt -n gennames.out -s -k -e

The trace hook numbers above are defined as follows:

HKWD_TRACE_SYNC      0x000
HKWD_TRACE_TRCON     0x001
HKWD_TRACE_TRCOFF    0x002
HKWD_TRACE_HEADER    0x003
HKWD_TRACE_LWRAP     0x005
HKWD_TRACE_TWRAP     0x006
HKWD_KERN_PROF       0x234
HKWD_KERN_DISPATCH   0x106
HKWD_KERN_IDLE       0x10C
HKWD_SYSC_EXECVE     0x134
HKWD_SYSC_FORK       0x139
HKWD_TRACE_UTIL      0x00A
HKWD_SYSC_CRTHREAD   0x465
Note: Review the /usr/lpp/perfagent/README.perfagent.tools file for the latest information on changes to the performance analysis tools.

Reports

A summary report with the suffix .all is always produced. If no program is specified, the report is named __prof.all. If the sample program is specified with the -p option, the report is named __sample.all. This report contains an estimate of the amount of CPU time spent in each process that was executing while the tprof program was monitoring the system. This report also contains an estimate of the amount of CPU time spent in each subprogram of the sample program. The summary report shows the amount of time the CPU was idle as well as the amount of time spent in the kernel. The tprof command reports CPU time in ticks, where 100 ticks equals 1 second.

Example 1 Report A shows an example summary report. The first section lists the CPU time estimates for each process. The second section of the report summarizes the first by combining the reports for multiple occurrences of each process into a single line.

There may be numerous processes active in the system which are not related to the user process being profiled. The number and type of processes observed vary from run to run.

A third section of the report is produced only if there are User ticks associated with the program being profiled. This section of the report notes the subprogram names and the number of ticks they collected. The source file from which the subprogram was compiled is also given as well as the length, in bytes, of the subprogram.

When microprofiling occurs, two more reports are produced only if there are User ticks associated with the program being microprofiled.The two reports are left as additional files in the working directory by the tprof command:

__t.routine_source
Profiled source-statement listing, one for each routine in the source file.
__h.source
Hottest source-statement listing, one for each source file.

Example 2 Reports A and B are the result of microprofiling an example program.

As previously stated, if microprofiling is desired, copies of the source files (usually suffixed with .c, .h, or .f) must be placed in the working directory. Additionally, executable files compiled with the -g flag (which are generated by the AIX C or FORTRAN compilers) must be placed in the working directory.

When microprofiling occurs, there is one __t.routine_source file for each routine in each source file compiled with the -g flag. The source line number and number of ticks associated with that source line are to the left of the microprofile text. Count totals for each routine are at the end of the function. Counts should not be associated with a comment line, data declaration, or macro definition.

The tprof command uses the following tools to create reports:

Shared Libraries

When you run executable modules built with shared libraries, the tprof command reports an estimate of the aggregate CPU time spent in shared code. When invoked with the -s flag, the tprof command resolves the time spent in shared libraries to the subprogram level. For example:

tprof -p cwhet -v -s -x cwhet

provides a report on CPU time spent in shared libraries, and resolves that time to the subprogram level for each shared library incurring ticks.

Note: The -v flag is only required if you intend to microprofile shared libraries.

Using the information gathered in this report, it is now possible to microprofile a single shared library subprogram to the source-statement level. If microprofiling of shared libraries is desired, a copy of the associated source file must be placed in the current working directory in which the tprof command is invoked. The object file (generated by the AIX C or FORTRAN compiler using the -g flag) must also be placed in the working directory. The following is an example:

cc -g -c malloc.c -lm

This example recompiles the malloc.c program and creates an object file (.o) that contains symbolic debug information used for microprofiling.

Multiple Runs to Obtain Greater Accuracy

The degree to which CPU activity can be resolved is determined by the number of samples captured and the degree to which "hot spots" dominate. While a program with a few "hot spots" can be profiled with relatively few samples, less-frequently executed sections of the program will not be visible in the profiling reports unless more samples are captured. In cases where user programs run less than a minute, there may be insufficient resolution to have a high degree of confidence in the estimates.

A simple solution is to repeatedly execute the user program or script until you achieve the degree of resolution you need. The longer a program is run, the finer the degree of resolution of the profile. If you doubt the accuracy of a profile, run the tprof command several times and compare the resulting profiles.

In order to profile a program or script that is executed multiple times using different sets of arguments, you must invoke the tprof command using a script that executes the desired program accordingly. The following is an example of a script called lin5 that executes the lin program five times:

lin; lin; lin; lin; lin   # execute program lin 5 times

The tprof command is then invoked with:

tprof -p lin -x lin5

Another example can be found in the script DOIT , which executes the lin program n times (where n is an argument to DOIT).

How tprof Capabilities Differ from prof and gprof

The most significant difference between these three commands is that the tprof command provides microprofiling, while the gprof and prof commands do not.

The prof and gprof commands are standard, supported profiling tools on many UNIX systems, including the AIX operating system. Both the prof and gprof commands provide subprogram profiling and exact counts of the number of times every subprogram is called. The gprof command also provides a very useful "call graph" showing the number of times each subprogram was called by a specific parent and the number of times each subprogram called a child. The tprof command provides neither subprogram call counts nor call graph information.

Like the tprof command, both the prof and gprof commands obtain their CPU consumption estimates for each subprogram by sampling the address counter of the user program at the rate of 100 times per second.

If only subprogram-level profiling is desired, the prof and gprof commands require users to relink their programs using a special linker flag. In order to get subprogram call counts, both the prof and gprof commands require users to both recompile and relink their programs by using the appropriate compile or link flag (-p or -pg, respectively). This may prove to be an impediment to users who do not have the ability to recompile or relink applications that need to be profiled. The tprof command requires recompilation (with the -g flag) only if source statement profiling is required. The tprof command will profile subprograms in any C or FORTRAN executable without recompilation.

The CPU times of user programs relinked for the prof command are the same as the run times of the unprofiled programs. The CPU times of user programs recompiled and relinked for the gprof command are increased by as little as a few percent to a factor of three or more. The CPU times of user programs profiled under the tprof command are the same as the run times of the unprofiled programs.

Also, the tprof command provides a CPU usage summary of all processes active during the execution of the profiled user program, which neither the prof nor gprof command provides.

Notes: CPU activity is sampled at a rate of 100 samples per second. This means that estimates of CPU consumption for each subprogram may not be sufficiently accurate for short-running programs. However, the accuracy is sufficient for programs and processes that run several minutes or more. The system trace facility is used by the tprof command. Only one user can be using the system trace facility at a time. Thus, only one tprof command can be active in the system at a time. The tprof command is a CPU-activity profiler. It does not profile other system resources, such as disks or memory.

Flags

-d
This flag is not needed to microprofile shared libraries. It has been retained for compatibility purposes.
-e
Profiles the kernel extension.
-i Trace_File
Input trace file for offline processing.
Note: The -i and -x flags may not be specified at the same time.
-k
Profiles the kernel.
-n Gennames_File
Specifies a Gennames_ File to be used inconjuction with the -i flag
Note: Do not specify a Gennames_File, with the -n file, without providing a Trace input file, with the -i file.
-p Program
Profiles the user program; also microprofiles the user program if that program is compiled with the -g flag.
Note: In the summary report with the suffix .all, you can find the complete list of programs running in the system while the tprof command was monitoring CPU utilization. The -p option can only succeed if the program name you specified exactly matches a program name that appears in the summary report. If the program you want profiled runs with a full or relative path, then you must enter that full or relative path with the -p flag.
-s
Profiles shared libraries.
-t Process_Id
Constrains reporting to the specified process and its threads.
-v
Specifies verbose mode, which creates files associated with microprofiling (source level profiling) of shared libraries. If microprofiling of shared libraries or kernel extensions is requested, the -v flag produces an error warning that no object file compiled with the -g flag is found.
-x Command
Allows the execution of an arbitrary Command. Subprograms of the program specified by the -p flag are profiled.
If the -x Command flag is omitted, the tprof command uses the __trc_rpt2 trace report file in the current directory to produce its output.
Note: If you do not enter a path with Command, then the -x flag searches your PATH environment variable to find the command since it uses the system() call to run Command. To guarantee that the correct command runs, always specify the path to the command.

If you do not enter a path with Command, and Command is the same as the Program to profile with the -p flag, you must specify Program without a path. This implies that Program is in the current directory.

If you enter a full or relative path with Command, and Command is the same as the Program to profile with the -p flag, then you must enter Program with the same full or relative path of Command.

Examples

1. Profiling a C Program

CAUTION: The following C program does nothing useful. It simply wastes CPU cycles. The program is used only for illustrative purposes.
/* Array Incrementer -- Version 1 */
#include <stdlib.h>
#define Asize 1024
#define RowDim InnerIndex
#define ColDim OuterIndex
main()
{
   int Increment;
   int OuterIndex;
   int InnerIndex;
   int big [Asize][Asize];
   /* Initialize every byte of the array to 0x01 */
   for (OuterIndex=0; OuterIndex<Asize; OuterIndex++)
   {
      for (InnerIndex=0; InnerIndex<Asize; InnerIndex++)
         big [RowDim][ColDim] = 0x01010101;
   }
   Increment = rand();
   /* Increment every element in the array */
   for (OuterIndex=0; OuterIndex<Asize; OuterIndex++)
   {
      for (InnerIndex=0; InnerIndex<Asize; InnerIndex++)
      {
         big [RowDim][ColDim] += Increment;
         if (big [RowDim][ColDim] <0) 
            printf("negative number. %d\n", big[rowdim][coldim]); 
      } 
   } 
   printf("version 1 check num: %d\n", big[rand()%asize][rand()%asize]); 
   return(0); 
} 

In this example, version1.c is the source file and version1 will be the program. The source program, version1.c, already exists in a directory called proftest.

  1. Go to the subdirectory proftest.
    cd proftest
  2. Compile version1.c.
    xlc -o version1 version1.c
  3. Profile and run the program version1.
    tprof version1
  4. Examine the profiling output report in the __version1.all file.

An example of the__version1.all file is found in Example 1 Report. Of course, the __version1.all file that you create is certain to be different from the example __version1.all reported shown here.

Example 1 Report A
         Process      PID      TID    Total   Kernel     User   Shared    Other
         =======      ===      ===    =====   ======     ====   ======    =====
        version1     6624     6633      663       41      622        0        0
       rpc.lockd    11002    11267       13       13        0        0        0
              sh     6624     6633        9        5        0        4        0
           tprof    11472    13017        6        6        0        0        0
             gil     1032     1291        3        3        0        0        0
           tprof    13790    16359        3        1        2        0        0
             gil     1032     1549        2        2        0        0        0
            wait      516      517        1        1        0        0        0
           tprof     6610     6619        1        1        0        0        0
           tprof     6624     6633        1        1        0        0        0
            llbd     8478     8743        1        0        1        0        0
         rlogind    14100    14109        1        1        0        0        0
           trace    14036    16349        1        1        0        0        0
         =======      ===      ===    =====   ======     ====   ======    =====
           Total                        705       76      625        4        0
         Process     FREQ    Total   Kernel     User   Shared    Other
         =======      ===    =====   ======     ====   ======    =====
        version1        1      663       41      622        0        0
       rpc.lockd        1       13       13        0        0        0
           tprof        4       11        9        2        0        0
              sh        1        9        5        0        4        0
             gil        2        5        5        0        0        0
            wait        1        1        1        0        0        0
            llbd        1        1        0        1        0        0
         rlogind        1        1        1        0        0        0
           trace        1        1        1        0        0        0
         =======      ===    =====   ======     ====   ======    =====
           Total       13      705       76      625        4        0
 
 
   Total Ticks For version1(    USER) =    622
 
           Subroutine   Ticks     %            Source   Address  Bytes
        =============  ======  ======         =======   =======  =====
                .main     622    88.2      version1.c  268436032    560

Process names and process identification (PID) numbers can be reused during normal AIX operation. When a process forks another process, the forked process inherits the process name of the original process. When a process execs another process, the execed process inherits the PID of the original process.

The first section of the report shows the number of ticks consumed by, or on behalf of, each process and its thread. The columns give tick counts for Kernel Code, User Code, and Shared Code. The name of the program is version1, so the version1 process (with process ID number 6624) is the process being profiled. This process/program collected a total of 663 ticks: 41 ticks for services performed for it in the kernel's address space, 622 ticks in the user's address space, and 0 ticks in the shared library space. Other is a catchall category that normally is 0.

The second section of the report summarizes the results by program, regardless of process ID. It shows the number (FREQ) of different processes that ran each program at some point.

The third section of the report shows the number of ticks used by each function in the executable and the percentage of the total run's CPU ticks that each function's ticks represent.

Note: If there are no ticks in the User column for the program being profiled, then the third section of the report is not produced.

Sampling occurs at the rate of 100 per second. Each tick corresponds to 1/100 sec or 10 msec. Thus, in Example 1 Report A , version1 is estimated to have consumed 6.63 seconds: 6.22 in user space and 0.41 seconds in kernel (system) space.

2. Microprofiling a C Program

The example of profiling contains summary information at the subprogram level. In this example, we provide CPU-usage profiling at the microprofiling level. Additional information must be provided to the tprof command in the working directory in order to make the correspondences between execution-time estimates and source statements:

Profile:

tprof version1        # invoke the profiler, telling
                      # it to execute and profile "version1"

The output file__version1.all should be nearly identical to that obtained from the previous steps. Of course, the report that summarizes all processes active during the execution of the version1 program will change. You will also find the __t.main_version1.c file, which contains profiled source for main in source, and the __h.version1.c file, which contains the "hottest" lines in the source.

Note: If there are no ticks in the User column in the __program.all file for the program being profiled, then the __t.routine_source and __h.source files are not produced.

Example 2 Reports A and B contain results from the microprofiling example shown above. Example 2 Report A shows the contents of the __t.main_version1.c file. Example 2 Report B shows the contents of the __h.version1.c file.

Example 2 Report A
Ticks Profile for main in version1.c
   Line   Ticks   Source
    15     29     for (OuterIndex=0; OuterIndex<Asize; OuterIndex++)
    16      -        {
    17     32           for (InnerIndex=0; InnerIndex<Asize; InnerIndex++)
    18    209              big [RowDim][ColDim] = 0x01010101;
    19      -        }
    20      -        Increment = rand();
    21      -     
    22      -        /* Increment every element in the array */
    23     52        for (OuterIndex=0; OuterIndex<Asize; OuterIndex++)
    24      -        {
    25      -           for (InnerIndex=0; InnerIndex<Asize; InnerIndex++)
    26      -           {
    27     55              big [RowDim][ColDim] += Increment;
    28     54              if (big [RowDim][ColDim] <0) 
    29    191                 printf("negative number.  %d\n", big[rowdim][coldi m]); 
    30                  } 
    31               } 
    32               printf("version 1 check num: %d\n", big[rand()%asize][rand( )%asize]); 
    33               return(0); 
    34            } 

   622 total ticks for main in version1.c 
Example 2 Report B
Hot Line Profile for version1.c
   Line   Ticks
    18    209
    29    191
    27     55
    28     54
    23     52
    17     32
    15     29

3. Profiling a FORTRAN Program with Multiple Source Files

This example uses two FORTRAN source files, linless.f and daxpy.f, which already exist in a directory called proftest.

  1. Go to the subdirectory proftest.
    cd proftest
  2. Compile lin2 as the program file using the two source files, linless.f and daxpy.f.
    xlf -o lin2 linless.f daxpy.f
  3. Profile and run the program lin2.
    tprof lin2
  4. Examine the profiling output report in the __lin2.all file.

4. Profiling a Shell Script

Sometimes you want to profile a script of sequential executions of programs and shell commands rather than a single program. In this example, a script, DOIT, runs a program n times, with both the program and n specified as arguments to the script. For example, to run the program lin program three times, enter:

DOIT lin 3 
In this example, the shell script, DOIT, and the FORTRAN source file called lin.f, already exists in a directory called proftest.
  1. Go to the subdirectory proftest.
    cd proftest
  2. Compile lin.f as the executable file lin unoptimized with the -g flag.
    xlf -g -o lin lin.f
  3. Profile program lin using -p flag, and specify the name of the script and any script arguments to run using the -x flag.
    tprof -p lin -x DOIT lin 3
    This provides statement-level profiling
  4. Examine the profiling output report in the __lin.all file.

Messages

If your system displays the following message:

/dev/systrace: device busy or trcon: TRCON:no such device

This means the trace facility is in use already. Stop your program and try again after typing trcstop, stops the trace.

Related Information

The cc command, gprof command, prof command, stripnm command.


[ Next Article | Previous Article | Book Contents | Library Home | Legal | Search ]