

Second, the precision of Tensor Core floating-point operations is limited to half-precision.

First, Tensor Cores only perform a specific form of matrix multiply–accumulate operation. Īlthough the Tensor Core’s tremendous performance is tempting to use, its application is more restricted than general CUDA core. Tensor Core-powered GPUs can offer more than a hundred TFLOPS performance. To address this need, NVIDIA has introduced a specialized computing unit called Tensor Core that speeds up neural network training and inferencing operations in deep learning by offering enormous acceleration in matrix computations. Among these, GPU-based machine learning applications, and more specifically deep learning, have significantly grown in use in recent years. Graphics Processing Units (GPUs), as one of the most feasible parallel structured processors, have proven their power in facilitating research in a wide range of fields, including high-performance computing, data centers, medical imaging, and machine learning. As the proposed approach is based on the conjugate gradient method, it can be generalized to extend its application to many research and industrial fields. We then presented an approach for real-time and memory-limited applications by exploiting the symmetry of the system (i.e., the acquisition geometry). The relative reconstruction error due to the mixed-precision computations was almost equal to the error of single-precision (32-bit) floating-point computations. The results show that the proposed method provides about 5 \(\times \) increase in speed and energy saving using the NVIDIA RTX 2080 Ti GPU for the parallel projection of 32 images of size \(512\times 512\). We first studied the reconstruction algorithm’s performance as a function of the hardware related parameters and proposed an approach to accelerate reconstruction on Tensor Cores.


Therefore, CT reconstruction is an application area that could potentially benefit from Tensor Core hardware acceleration. For large CT images and real-time CT scanning, the reconstruction time for many existing iterative reconstruction methods is relatively high, ranging from seconds to minutes, depending on the size of the image. In this paper, we demonstrate the feasibility of using NVIDIA Tensor Cores for the acceleration of a non-machine learning application: iterative Computed Tomography (CT) reconstruction. Due to their specific hardware implementation and programming model, Tensor Cores cannot be straightforwardly applied to other applications outside machine learning. Tensor Cores are specialized hardware units added to recent NVIDIA GPUs to speed up matrix multiplication-related tasks, such as convolutions and densely connected layers in neural networks.
