The ROCM profiling infrastructure consists of two APIs:
rocProfiler
and rocTracer.
For those familiar with NVIDIA tools, rocProfiler is similar to Nsight Compute
and rocTracer is similar to Nsight Systems. Both tools are based on APIs that
can be embedded in an application or preloaded (using LD_PRELOAD
or similar)
using the rocProf
command-line tool. In this document we provide a short summary of these tools.
For more details consult the rocProfiler documentation
and the rocTracer documentation.
Tracing
The following command line will run an application and create tracing information including kernel invocations, times, and device memory transfers:
rocprof --timestamp on --hip-trace -o <outname>.csv <program> <program arguments>
The specified command line is run with tracing, and the results are placed
in several files beginning with the outname
, or with the string results
if -o is not specified. outname
can be an existing pathname, such as
trace-results/myprog.csv
. In such a case all the generated files will begin
with myprog
.
There are three .csv result files (the actual names are all prefixed as described above):
copy_stats.csv
summarizes memory transfers to and from the device.hip_stats.csv
summarizes the HIP api calls.stats.csv
summarizes the kernel launches.
There is also a .json
file that can be used as input to the Chrome trace
viewer (accessed by visiting “chrome://tracing/” in a chrome browser), and a
.db
file that is a sqlite3 database.
The copy and kernel .csv
files lack some useful information: the number of
bytes transferred for the copy file, and the global and workgroup sizes for
the kernel file. The information is available in the .db
file.
Time accounting
The time to execute an application is composed of host time, data transfer time, and GPU kernel execution time. Some of these different operations may overlap. rocTracer is an API library that intercepts runtime API calls and traces asynchronous activity. The data collected by rocTracer includes begin and end timestamps, so it is possible to tell when overlap occurs; this is one of the uses for the Chrome trace viewer.
Profiling
Profiling uses hardware events to measure various aspects of a kernel’s behavior. From the events, metrics can be derived that give insight into how well a kernel is using the hardware, and may suggest some improvements in the code. For more details consult the rocProfiler documentation.