Version Latest

Introduction

This section describes performance guidelines to write efficient applications using OpenCL 1.2 and SYCL for R-Car IMP-X5+ CVengine.

Overview of R-Car IMP-X5+ CVengine

The CVengine for the R-Car V3H is composed of 5 clusters where each cluster accommodates up to 32 processing threads. Similarly, the CVengine for R-Car V3M has 2 clusters where each cluster accommodates 32 processing threads. There are 19 32-bit general-purpose registers available for each processing thread. Moreover, a processing thread has 2k of dedicated private memory which is used for register spills, large size arrays, or vectors. The local memory size for the CVengine in R-Car V3M is 458KB and the local memory size for the CVengine in R-Car V3H is 409.4 KB.

There are 64 32-bit registers that are dedicated for constant variables per cluster. Any constant variable inside an OpenCL C kernel is mapped to these registers. For example, if an OpenCL C kernel has const int x = 2; the x is mapped to one of the 64 32 bit registers.

However, an OpenCL constant buffer and any OpenCL variable annotated by __constant is mapped to DDR SDRAM. For example __constant int x =2; inside the OpenCL C kernel is mapped to DDR SDRAM.

An OpenCL global buffer is mapped to DDR SDRAM.

There is no support for OpenCL image objects on the CVengine. Each SYCL/OpenCL work-group maps to a cluster and each work-item maps to a processing thread. Diagram @fig:1 represents the architectural view of logical OpenCL/SYCL concepts on the CVengine for the R-Car V3H and R-Car V3M. The CVengine concepts are displayed in bold. Please note that the scratchpad memory in CVengine is shared among all clusters. However, in our OpenCL implementation, the scratchpad memory has been divided into logically separate regions per OpenCL workgroup, where region i is visible only to workgroup i. Hence, in Diagram @fig:1, we logically separate the scratchpad memory (OpenCL local memory) per cluster (OpenCL workgroup), although physically it is a coherent memory unit shared among all clusters.

Architectural view of mapping SYCL/OpenCL concepts on CVengine

CVengine information on R-Car V3M and R-Car V3H

The following tables explain some of the relevant information that is required for optimizing SYCL and OpenCL code on CVengine for both R-Car V3H and R-Car V3M. The OpenCL version is 1.2, the ComputeAorta version is 1.22.0.

Parameter R-Car H3 R-Car V3M R-Car V3H
CL_DEVICE_TYPE CL_DEVICE_TYPE_ACCELERATOR CL_DEVICE_TYPE_ACCELERATOR CL_DEVICE_TYPE_ACCELERATOR
CL_DEVICE_MAX_MEM_ALLOC_SIZE 96 MiB 110 MiB 108 MiB
CL_DEVICE_GLOBAL_MEM_SIZE 96 MiB 110 MiB 108 MiB
CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE 64 KiB 64 KiB 64 KiB
CL_DEVICE_LOCAL_MEM_TYPE CL_LOCAL CL_LOCAL CL_LOCAL
CL_DEVICE_LOCAL_MEM_SIZE 448 KiB 448 KiB 409.3 KiB
CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE 128 bytes 128 bytes 128 bytes
CL_DEVICE_GLOBAL_MEM_CACHE_SIZE 32 KiB 32 KiB 32 KiB
CL_DEVICE_MAX_COMPUTE_UNITS 2 2 5
CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS 3 3 3
CL_DEVICE_MAX_WORK_ITEM_SIZES
CL_DEVICE_MAX_WORK_GROUP_SIZE 32 32 32
CL_DEVICE_ADDRESS_BITS 32 32 32
CL_DEVICE_PREFERRED_VECTOR_WIDTH_CHAR 1 1 1
CL_DEVICE_PREFERRED_VECTOR_WIDTH_SHORT 1 1 1
CL_DEVICE_PREFERRED_VECTOR_WIDTH_INT 1 1 1
CL_DEVICE_PREFERRED_VECTOR_WIDTH_LONG 1 1 1
CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT 1 1 1
CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE 0 0 0
CL_DEVICE_PREFERRED_VECTOR_WIDTH_HALF 0 0 0
CL_DEVICE_HOST_UNIFIED_MEMORY CL_TRUE CL_TRUE CL_TRUE
Sections

    Select a Product

    Please select a product

    ComputeCpp enables developers to integrate parallel computing into applications using SYCL and accelerate code on a wide range of OpenCL devices such as GPUs.

    ComputeSuite for R-Car enables developers to accelerate their applications on a wide range of Renesas R-Car based hardware such as the H3 and V3M, using widely supported open standards such as Khronos SYCL and OpenCL.

    Also,

    part of our network