Nvidia docker runtime: Failed to create task for container
+1
−0
I installed the Nvidia container toolkit so I can use GPU acceleration in Docker containers. However, I can't get the containers to work:
$ sudo docker run --runtime=nvidia nvidia/cuda:12.2.0-base-ubuntu20.04 nvidia-smi
docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: unable to retrieve OCI runtime error (open /run/containerd/io.containerd.runtime.v2.task/moby/8858057455bac72bc63f4346a3b03f2f63eda68762f7a7c2b5cf135fb23819c6/log.json: no such file or directory): nvidia-container-runtime did not terminate successfully: exit status 127: unknown.
ERRO[0000] error waiting for container:
$ sudo docker run --gpus all nvidia/cuda:12.1.1-runtime-ubuntu22.04 nvidia-smi
docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start containerprocess: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: initialization error: driver rpc error: failed to process request: unknown.
ERRO[0001] error waiting for container:
$ nvidia-container-runtime
nvidia-container-runtime: error while loading shared libraries: unexpected PLT reloc type 0x00
What's the problem?
Relevant information (will expand based on comments):
- Arch Linux with 6.1.52-1-lts (
neofetch
) - GeForce GTX 1060 3GB with driver
nvidia 535.104.05
(inxi -Fz
) - Packages installed:
libnvidia-container libnvidia-container-tools nvidia nvidia-container-runtime nvidia-container-toolkit nvidia-docker nvidia-lts nvidia-utils
(pacman -Q | rg nvi
) -
nouveau
packages not installed (pacman -Q | rg nouv
) - Regular (non-GPU) Docker containers work
- Kernel modules:
nvidia_drm nvidia_modeset nvidia_uvm nvidia video
(sudo lsmod | rg nvi
) -
nvidia-smi
by itself works and prints an ASCII table with some reasonable stats about my video card
1 answer
+0
−0
I found a partial fix. You have to run:
sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml
After this, podman is able to run it:
sudo podman run --rm --device nvidia.com/gpu=all ubuntu nvidia-smi -L
GPU 0: NVIDIA GeForce GTX 1060 3GB (UUID: GPU-...)
However, Docker still gets the same error. Unfortunately, I need this to run in Docker specifically, not podman, so the question is still open.
0 comment threads