tprof { Program | { -d | -e | -k | -p Program | -s | -t Process_Id | -v | { [ -i Trace_File ] [ -n Gennames_File ] | { -x Command } }
Notes:
- One of -s, -k, -e or -p flags need be specified to get profile.
- The -x command flag must be the last flag issued on command line.
- The -i and -x flags may not be specified at the same time.
- The -n flag should only be used with the -i flag.
The tprof command reports CPU usage for individual programs and the system as a whole. This command is a useful tool for anyone with a C or C++ or FORTRAN program that might be CPU-bound and who wants to know which sections of this program are most heavily using the CPU. The tprof command also reports the fraction of time the CPU is idle. These reports can be useful in determining CPU usage in a global sense.
Note: Only the root user and members of the security group should have execute (x) access to this command.
The tprof command specifies the user program to be profiled, executes the user program, and then produces a set of files containing reports. The user specifies the name of the program to be profiled, or alternatively, the name of the program to be profiled and a command line to be executed. Both the Program and Command variables must be executable.
In the AIX operating system, an interrupt occurs periodically to allow a "housekeeping" kernel routine to run. This housekeeping occurs 100 times per second. When the tprof command is invoked, the housekeeping kernel routine records the process ID and the address of the instruction executing when the interrupt occurred. With both the instruction address and process ID, the tprof analysis routines can charge CPU time to processes and threads, to subprograms, and even to source lines of programs. Charging CPU time to source program lines is called microprofiling.
Note: Microprofiling can only be performed on code compiled with the -g option. It is also necessary for the source code to reside in the working directory, that is, the directory in which the tprof command is invoked. The proftest working directory is used in the examples for this command.
For subprogram-level profiling, the tprof command can be run without modifying your executable program. You do not have to recompile with special compiler flags or linker options. This means that you can obtain a subprogram profile of any executable module that has already been built. However, as previously stated, recompilation is required to get a microprofile.
The files containing the reports are left in the working directory. All files created by the tprof command are prefixed by ___ (two underscores). Prefixing with underscores allows the tprof command user to easily identify all tprof-created files. In this text, it is assumed that all executables are kept in the working directory to simplify the explanation of the tool. (To keep executables in another directory, specify the full path name of the executable file to the tprof command.) It is recommended that a working directory be created specifically for profiling and that copies of executables and source files be placed (or linked) in that directory. In the examples , the working directory is proftest.
In its simplest form, the tprof command is issued as follows:
tprof program
When used without arguments, the tprof usage statement is displayed. When using the -x Command flag, one of the -s, -k, or -p flag must be specified to get the specific profiling report. If none of these flags are specified, only the summary report is produced. Using the -t flag limits the report to information about the specified process, which is named in the Process_Id parameter.
The -i Trace_File flag allows for offline processing by tprof of trace data files created by the system trace command. The -n flag allows you to specify a Gennames_File to be used when procesing an offline file. These flags are useful when it is necessary to postprocess a trace file from a remote machine or perform the trace data collection at one time and postprocess it at another time. In this case -n with a Gennames_File must be used from the machine that the trace came from. The flags are also useful when system loading is high and trace hooks are being missed by tprof. The offline option relieves this problem. Trace hooks relevent to tprof must be collected by the trace command and are specified by the trace -j flag. The Gennames_File is then executed to collected additional information for tprof. Once the trace and Ggennames_File has executed trcrpt -r must be executed on the trace logfile and redirected to another file. At this point an adjusted trace logfile and a Gennames_File is feed into tprof.
Example:
trace -a -T 768000 -L 10000000 -o trace.out -j 000,001,002,003,005,006,234,106, 10C,134,139,00A,465 gennames > gennames.out trcrpt -r trace.out > trace.rpt
Then tprof with at least -i and -n flag:
tprof -i trace.rpt -n gennames.out -s -k -e
The trace hook numbers above are defined as follows:
HKWD_TRACE_SYNC 0x000 HKWD_TRACE_TRCON 0x001 HKWD_TRACE_TRCOFF 0x002 HKWD_TRACE_HEADER 0x003 HKWD_TRACE_LWRAP 0x005 HKWD_TRACE_TWRAP 0x006 HKWD_KERN_PROF 0x234 HKWD_KERN_DISPATCH 0x106 HKWD_KERN_IDLE 0x10C HKWD_SYSC_EXECVE 0x134 HKWD_SYSC_FORK 0x139 HKWD_TRACE_UTIL 0x00A HKWD_SYSC_CRTHREAD 0x465
Note: Review the /usr/lpp/perfagent/README.perfagent.tools file for the latest information on changes to the performance analysis tools.
A summary report with the suffix .all is always produced. If no program is specified, the report is named __prof.all. If the sample program is specified with the -p option, the report is named __sample.all. This report contains an estimate of the amount of CPU time spent in each process that was executing while the tprof program was monitoring the system. This report also contains an estimate of the amount of CPU time spent in each subprogram of the sample program. The summary report shows the amount of time the CPU was idle as well as the amount of time spent in the kernel. The tprof command reports CPU time in ticks, where 100 ticks equals 1 second.
Example 1 Report A shows an example summary report. The first section lists the CPU time estimates for each process. The second section of the report summarizes the first by combining the reports for multiple occurrences of each process into a single line.
There may be numerous processes active in the system which are not related to the user process being profiled. The number and type of processes observed vary from run to run.
A third section of the report is produced only if there are User ticks associated with the program being profiled. This section of the report notes the subprogram names and the number of ticks they collected. The source file from which the subprogram was compiled is also given as well as the length, in bytes, of the subprogram.
When microprofiling occurs, two more reports are produced only if there are User ticks associated with the program being microprofiled.The two reports are left as additional files in the working directory by the tprof command:
Example 2 Reports A and B are the result of microprofiling an example program.
As previously stated, if microprofiling is desired, copies of the source files (usually suffixed with .c, .h, or .f) must be placed in the working directory. Additionally, executable files compiled with the -g flag (which are generated by the AIX C or FORTRAN compilers) must be placed in the working directory.
When microprofiling occurs, there is one __t.routine_source file for each routine in each source file compiled with the -g flag. The source line number and number of ticks associated with that source line are to the left of the microprofile text. Count totals for each routine are at the end of the function. Counts should not be associated with a comment line, data declaration, or macro definition.
The tprof command uses the following tools to create reports:
When you run executable modules built with shared libraries, the tprof command reports an estimate of the aggregate CPU time spent in shared code. When invoked with the -s flag, the tprof command resolves the time spent in shared libraries to the subprogram level. For example:
tprof -p cwhet -v -s -x cwhet
provides a report on CPU time spent in shared libraries, and resolves that time to the subprogram level for each shared library incurring ticks.
Note: The -v flag is only required if you intend to microprofile shared libraries.
Using the information gathered in this report, it is now possible to microprofile a single shared library subprogram to the source-statement level. If microprofiling of shared libraries is desired, a copy of the associated source file must be placed in the current working directory in which the tprof command is invoked. The object file (generated by the AIX C or FORTRAN compiler using the -g flag) must also be placed in the working directory. The following is an example:
cc -g -c malloc.c -lm
This example recompiles the malloc.c program and creates an object file (.o) that contains symbolic debug information used for microprofiling.
The degree to which CPU activity can be resolved is determined by the number of samples captured and the degree to which "hot spots" dominate. While a program with a few "hot spots" can be profiled with relatively few samples, less-frequently executed sections of the program will not be visible in the profiling reports unless more samples are captured. In cases where user programs run less than a minute, there may be insufficient resolution to have a high degree of confidence in the estimates.
A simple solution is to repeatedly execute the user program or script until you achieve the degree of resolution you need. The longer a program is run, the finer the degree of resolution of the profile. If you doubt the accuracy of a profile, run the tprof command several times and compare the resulting profiles.
In order to profile a program or script that is executed multiple times using different sets of arguments, you must invoke the tprof command using a script that executes the desired program accordingly. The following is an example of a script called lin5 that executes the lin program five times:
lin; lin; lin; lin; lin # execute program lin 5 times
The tprof command is then invoked with:
tprof -p lin -x lin5
Another example can be found in the script DOIT , which executes the lin program n times (where n is an argument to DOIT).
The most significant difference between these three commands is that the tprof command provides microprofiling, while the gprof and prof commands do not.
The prof and gprof commands are standard, supported profiling tools on many UNIX systems, including the AIX operating system. Both the prof and gprof commands provide subprogram profiling and exact counts of the number of times every subprogram is called. The gprof command also provides a very useful "call graph" showing the number of times each subprogram was called by a specific parent and the number of times each subprogram called a child. The tprof command provides neither subprogram call counts nor call graph information.
Like the tprof command, both the prof and gprof commands obtain their CPU consumption estimates for each subprogram by sampling the address counter of the user program at the rate of 100 times per second.
If only subprogram-level profiling is desired, the prof and gprof commands require users to relink their programs using a special linker flag. In order to get subprogram call counts, both the prof and gprof commands require users to both recompile and relink their programs by using the appropriate compile or link flag (-p or -pg, respectively). This may prove to be an impediment to users who do not have the ability to recompile or relink applications that need to be profiled. The tprof command requires recompilation (with the -g flag) only if source statement profiling is required. The tprof command will profile subprograms in any C or FORTRAN executable without recompilation.
The CPU times of user programs relinked for the prof command are the same as the run times of the unprofiled programs. The CPU times of user programs recompiled and relinked for the gprof command are increased by as little as a few percent to a factor of three or more. The CPU times of user programs profiled under the tprof command are the same as the run times of the unprofiled programs.
Also, the tprof command provides a CPU usage summary of all processes active during the execution of the profiled user program, which neither the prof nor gprof command provides.
Notes: CPU activity is sampled at a rate of 100 samples per second. This means that estimates of CPU consumption for each subprogram may not be sufficiently accurate for short-running programs. However, the accuracy is sufficient for programs and processes that run several minutes or more. The system trace facility is used by the tprof command. Only one user can be using the system trace facility at a time. Thus, only one tprof command can be active in the system at a time. The tprof command is a CPU-activity profiler. It does not profile other system resources, such as disks or memory.
Note: The -i and -x flags may not be specified at the same time.
Note: Do not specify a Gennames_File, with the -n file, without providing a Trace input file, with the -i file.
Note: In the summary report with the suffix .all, you can find the complete list of programs running in the system while the tprof command was monitoring CPU utilization. The -p option can only succeed if the program name you specified exactly matches a program name that appears in the summary report. If the program you want profiled runs with a full or relative path, then you must enter that full or relative path with the -p flag.
Note: If you do not enter a path with Command, then the -x flag searches your PATH environment variable to find the command since it uses the system() call to run Command. To guarantee that the correct command runs, always specify the path to the command.
If you do not enter a path with Command, and Command is the same as the Program to profile with the -p flag, you must specify Program without a path. This implies that Program is in the current directory.
If you enter a full or relative path with Command, and Command is the same as the Program to profile with the -p flag, then you must enter Program with the same full or relative path of Command.
CAUTION: The following C program does nothing useful. It simply wastes CPU cycles. The program is used only for illustrative purposes.
/* Array Incrementer -- Version 1 */ #include <stdlib.h> #define Asize 1024 #define RowDim InnerIndex #define ColDim OuterIndex main() { int Increment; int OuterIndex; int InnerIndex; int big [Asize][Asize]; /* Initialize every byte of the array to 0x01 */ for (OuterIndex=0; OuterIndex<Asize; OuterIndex++) { for (InnerIndex=0; InnerIndex<Asize; InnerIndex++) big [RowDim][ColDim] = 0x01010101; } Increment = rand(); /* Increment every element in the array */ for (OuterIndex=0; OuterIndex<Asize; OuterIndex++) { for (InnerIndex=0; InnerIndex<Asize; InnerIndex++) { big [RowDim][ColDim] += Increment; if (big [RowDim][ColDim] <0) printf("negative number. %d\n", big[rowdim][coldim]); } } printf("version 1 check num: %d\n", big[rand()%asize][rand()%asize]); return(0); }
In this example, version1.c is the source file and version1 will be the program. The source program, version1.c, already exists in a directory called proftest.
cd proftest
xlc -o version1 version1.c
tprof version1
An example of the__version1.all file is found in Example 1 Report. Of course, the __version1.all file that you create is certain to be different from the example __version1.all reported shown here.
Process PID TID Total Kernel User Shared Other ======= === === ===== ====== ==== ====== ===== version1 6624 6633 663 41 622 0 0 rpc.lockd 11002 11267 13 13 0 0 0 sh 6624 6633 9 5 0 4 0 tprof 11472 13017 6 6 0 0 0 gil 1032 1291 3 3 0 0 0 tprof 13790 16359 3 1 2 0 0 gil 1032 1549 2 2 0 0 0 wait 516 517 1 1 0 0 0 tprof 6610 6619 1 1 0 0 0 tprof 6624 6633 1 1 0 0 0 llbd 8478 8743 1 0 1 0 0 rlogind 14100 14109 1 1 0 0 0 trace 14036 16349 1 1 0 0 0 ======= === === ===== ====== ==== ====== ===== Total 705 76 625 4 0 Process FREQ Total Kernel User Shared Other ======= === ===== ====== ==== ====== ===== version1 1 663 41 622 0 0 rpc.lockd 1 13 13 0 0 0 tprof 4 11 9 2 0 0 sh 1 9 5 0 4 0 gil 2 5 5 0 0 0 wait 1 1 1 0 0 0 llbd 1 1 0 1 0 0 rlogind 1 1 1 0 0 0 trace 1 1 1 0 0 0 ======= === ===== ====== ==== ====== ===== Total 13 705 76 625 4 0 Total Ticks For version1( USER) = 622 Subroutine Ticks % Source Address Bytes ============= ====== ====== ======= ======= ===== .main 622 88.2 version1.c 268436032 560
Process names and process identification (PID) numbers can be reused during normal AIX operation. When a process forks another process, the forked process inherits the process name of the original process. When a process execs another process, the execed process inherits the PID of the original process.
The first section of the report shows the number of ticks consumed by, or on behalf of, each process and its thread. The columns give tick counts for Kernel Code, User Code, and Shared Code. The name of the program is version1, so the version1 process (with process ID number 6624) is the process being profiled. This process/program collected a total of 663 ticks: 41 ticks for services performed for it in the kernel's address space, 622 ticks in the user's address space, and 0 ticks in the shared library space. Other is a catchall category that normally is 0.
The second section of the report summarizes the results by program, regardless of process ID. It shows the number (FREQ) of different processes that ran each program at some point.
The third section of the report shows the number of ticks used by each function in the executable and the percentage of the total run's CPU ticks that each function's ticks represent.
Note: If there are no ticks in the User column for the program being profiled, then the third section of the report is not produced.
Sampling occurs at the rate of 100 per second. Each tick corresponds to 1/100 sec or 10 msec. Thus, in Example 1 Report A , version1 is estimated to have consumed 6.63 seconds: 6.22 in user space and 0.41 seconds in kernel (system) space.
The example of profiling contains summary information at the subprogram level. In this example, we provide CPU-usage profiling at the microprofiling level. Additional information must be provided to the tprof command in the working directory in order to make the correspondences between execution-time estimates and source statements:
tprof version1 # invoke the profiler, telling # it to execute and profile "version1"
The output file__version1.all should be nearly identical to that obtained from the previous steps. Of course, the report that summarizes all processes active during the execution of the version1 program will change. You will also find the __t.main_version1.c file, which contains profiled source for main in source, and the __h.version1.c file, which contains the "hottest" lines in the source.
Note: If there are no ticks in the User column in the __program.all file for the program being profiled, then the __t.routine_source and __h.source files are not produced.
Example 2 Reports A and B contain results from the microprofiling example shown above. Example 2 Report A shows the contents of the __t.main_version1.c file. Example 2 Report B shows the contents of the __h.version1.c file.
Ticks Profile for main in version1.c Line Ticks Source 15 29 for (OuterIndex=0; OuterIndex<Asize; OuterIndex++) 16 - { 17 32 for (InnerIndex=0; InnerIndex<Asize; InnerIndex++) 18 209 big [RowDim][ColDim] = 0x01010101; 19 - } 20 - Increment = rand(); 21 - 22 - /* Increment every element in the array */ 23 52 for (OuterIndex=0; OuterIndex<Asize; OuterIndex++) 24 - { 25 - for (InnerIndex=0; InnerIndex<Asize; InnerIndex++) 26 - { 27 55 big [RowDim][ColDim] += Increment; 28 54 if (big [RowDim][ColDim] <0) 29 191 printf("negative number. %d\n", big[rowdim][coldi m]); 30 } 31 } 32 printf("version 1 check num: %d\n", big[rand()%asize][rand( )%asize]); 33 return(0); 34 } 622 total ticks for main in version1.c
Hot Line Profile for version1.c Line Ticks 18 209 29 191 27 55 28 54 23 52 17 32 15 29
This example uses two FORTRAN source files, linless.f and daxpy.f, which already exist in a directory called proftest.
cd proftest
xlf -o lin2 linless.f daxpy.f
tprof lin2
Sometimes you want to profile a script of sequential executions of programs and shell commands rather than a single program. In this example, a script, DOIT, runs a program n times, with both the program and n specified as arguments to the script. For example, to run the program lin program three times, enter:
DOIT lin 3 In this example, the shell script, DOIT, and the FORTRAN source file called lin.f, already exists in a directory called proftest.
cd proftest
xlf -g -o lin lin.f
tprof -p lin -x DOIT lin 3This provides statement-level profiling
If your system displays the following message:
/dev/systrace: device busy or trcon: TRCON:no such device
This means the trace facility is in use already. Stop your program and try again after typing trcstop, stops the trace.
The cc command, gprof command, prof command, stripnm command.