Reading Assignment 4
Introduction to CUDA Programming
Write your answers in a PDF and upload the document on Gradescope for submission. The due date is given on Gradescope. Each question is worth 10 points.
Please watch the videos 19 through 24 and the slides before answering these questions.
- Describe three features that differentiate CPU from GPU processors.
-
What is the double precision performance of a Quadro RTX 6000 compared to its single precision performance?
-
Assume you launch a CUDA kernel from the CPU code. When the function call returns on the CPU, does it mean that the CUDA kernel execution has completed on the GPU?
-
What is an NVIDIA tensor core?
-
How many SMs are required to run a CUDA thread block? Does the answer depend on the number of threads in the block?
- Explain the difference between
sbatch
andsrun
in SLURM. - What is the SLURM command to cancel a job?
- Explain the meaning of the keywords
__global__
and__device__
in CUDA. - Explain what the following built-in CUDA variables are:
threadIdx
,blockDim
,blockIdx
. - Starter code. Read the program
firstProgram.cu
. Then, fill-in the TODOs inR4.cu
(contained in the zip file) so that you compute an array of typefloat
with entriesout[i] = 1. / i;
Please read as well
addMatrices.cu
where you will find useful examples. The size of the array should be equal to 100,000. Each CUDA thread should compute a single entryout[i]
. The number of threads in a CUDA block should be chosen equal to 512. - Explain the difference between a virtual architecture and a real architecture in
nvcc
. - What are the recommended
nvcc
options to compile CUDA code onicme-gpu
? - Explain what the shorthand option
--gpu-architecture=sm_75
does during the compilation process usingnvcc
.