NCP-AII Exam Question 116

You are setting up a multi-node A1 cluster with NVIDIA GPUs and InfiniBand for inter-node communication. You need to ensure the InfiniBand network is functioning optimally for GPU-accelerated workloads. What steps would you take to validate the InfiniBand installation and performance?
  • NCP-AII Exam Question 117

    You are configuring an InfiniBand subnet with multiple switches. You need to ensure that traffic between two specific nodes always takes the shortest path, bypassing a potentially congested link. Which of the following approaches is MOST effective for achieving this using InfiniBand's routing capabilities?
  • NCP-AII Exam Question 118

    A data scientist reports slow data loading times when training a large language model. The data is stored in a Ceph cluster. You suspect the client-side caching is not properly configured. Which Ceph configuration parameter(s) should you investigate and potentially adjust to improve data loading performance? Select all that apply.
  • NCP-AII Exam Question 119

    Consider a scenario where you are setting up a high-performance computing cluster with several GPU-accelerated nodes using Slurm as the resource manager. You want to ensure that jobs requesting GPUs are only scheduled on nodes with the appropriate NVIDIA drivers and CUDA toolkit installed. How can you achieve this within Slurm?
  • NCP-AII Exam Question 120

    You are deploying a multi-node A1 training cluster using Kubernetes, with each node equipped with multiple NVIDIA GPUs. You want to ensure that the Kubernetes scheduler is aware of the GPU resources available on each node and can efficiently allocate GPU-enabled pods to the appropriate nodes. Besides installing the NVIDIA Container Toolkit, what other components are essential for enabling GPU-aware scheduling in Kubernetes?