NCP-AII Exam Question 26
You are configuring a Mellanox InfiniBand network for a DGXAIOO cluster. What is the RECOMMENDED subnet manager for a large, high-performance A1 training environment, and why?
NCP-AII Exam Question 27
You are installing eight NVIDIAAIOO GPUs in a server designed for maximum performance. The server supports NVLink. Which of the following actions will BEST improve the inter-GPU communication bandwidth?
NCP-AII Exam Question 28
You are using the NVIDIA Container Toolkit in a Kubernetes environment with multiple GPUs per node. You want to ensure that pods can request specific GPUs on a node, rather than simply requesting 'any' GPU. Which Kubernetes feature, in conjunction with the NVIDIA Device Plugin, allows you to achieve this fine-grained GPU resource allocation?
NCP-AII Exam Question 29
You have a large dataset stored on a BeeGFS file system. The training job is single node and uses data augmentation to generate more data on the fly. The data augmentation process is CPU-bound, but you notice that the GPU is underutilized due to the training data not being fed to the GPU fast enough. How can you reduce the load on the CPU and improve the overall training throughput?
NCP-AII Exam Question 30
You are troubleshooting an issue where a Docker container utilizing NVIDIA GPUs intermittently fails with a 'CUDA ERROR OUT OF MEMORY error. The host system has sufficient memory and the individual GPU has enough memory as well. You suspect that the problem might be related to how memory is being allocated within the container environment. What steps can you take to investigate and potentially mitigate this issue?
