The ROCM profiling infrastructure consists of two APIs:
For those familiar with NVIDIA tools, rocProfiler is similar to Nsight Compute
and rocTracer is similar to Nsight Systems. Both tools are based on APIs that
can be embedded in an application or preloaded (using
LD_PRELOAD or similar)
using the rocProf
command-line tool. In this document we provide a short summary of these tools.
For more details consult the rocProfiler documentation
and the rocTracer documentation.
The following command line will run an application and create tracing information including kernel invocations, times, and device memory transfers:
rocprof --timestamp on --hip-trace -o <outname>.csv <program> <program arguments>
The specified command line is run with tracing, and the results are placed
in several files beginning with the
outname, or with the string
if -o is not specified.
outname can be an existing pathname, such as
trace-results/myprog.csv. In such a case all the generated files will begin
There are three .csv result files (the actual names are all prefixed as described above):
copy_stats.csvsummarizes memory transfers to and from the device.
hip_stats.csvsummarizes the HIP api calls.
stats.csvsummarizes the kernel launches.
There is also a
.json file that can be used as input to the Chrome trace
viewer (accessed by visiting “chrome://tracing/” in a chrome browser), and a
.db file that is a sqlite3 database.
The copy and kernel
.csv files lack some useful information: the number of
bytes transferred for the copy file, and the global and workgroup sizes for
the kernel file. The information is available in the
The time to execute an application is composed of host time, data transfer time, and GPU kernel execution time. Some of these different operations may overlap. rocTracer is an API library that intercepts runtime API calls and traces asynchronous activity. The data collected by rocTracer includes begin and end timestamps, so it is possible to tell when overlap occurs; this is one of the uses for the Chrome trace viewer.
Profiling uses hardware events to measure various aspects of a kernel’s behavior. From the events, metrics can be derived that give insight into how well a kernel is using the hardware, and may suggest some improvements in the code. For more details consult the rocProfiler documentation.