Skip to content

Employment of Graphics Processing Units

A graphics processing unit (GPU) is a processor designed to quickly render images on a screen. It excels at performing fast vector and matrix computations, which are commonly used in computer graphics. While not ideal for solving general problems, GPUs are proving useful in domains like machine learning and cryptocurrency mining due to their ability to perform certain operations rapidly. Modern GPUs also offer built-in support for working with specific types of video. We will explore how to use GPUs to speed up video conversion.

The program ffmpeg can utilize GPUs from various vendors using standard interfaces. The two most commonly used interfaces are OpenCL and CUDA (which is exclusive to Nvidia GPUs). These interfaces are also available in the NSC cluster.

Setting Up Containers

By default, the ffmpeg program doesn't support GPU, but it can be enabled by compiling it with special options. However, in this case, we can use an existing Docker container that already has the required version of ffmpeg. To convert it to the Singularity container, simply use the given command.

$ apptainer pull docker://jrottenberg/ffmpeg:4.0.6-nvidia1804
apptainer pull docker://jrottenberg/ffmpeg:4.0.6-nvidia1804

We have made a container named ffmpeg_4.0.6-nvidia1804.sif in the current directory. To use this container, just use the apptainer exec command as you normally would.

$ apptainer exec ffmpeg_4.0.6-nvidia1804.sif ffmpeg -version
ffmpeg version 4.0.6 Copyright (c) 2000-2020 the FFmpeg developers
built with gcc 7 (Ubuntu 7.5.0-3ubuntu1~18.04)
configuration: … --enable-nonfree --enable-nvenc --enable-cuda --enable-cuvid …
apptainer exec ffmpeg_4.0.6-nvidia1804.sif ffmpeg -version

The command above lists the configurations that enable support for GPU (CUDA) and related technologies. The technologies supported by a particular version of ffmpeg can be checked with the -hwaccels argument:

$ apptainer exec ffmpeg_4.0.6-nvidia1804.sif ffmpeg -hwaccels
ffmpeg version 4.0.6 Copyright (c) 2000-2020 the FFmpeg developers

Hardware acceleration methods:
cuda
cuvid
apptainer exec ffmpeg_4.0.6-nvidia1804.sif ffmpeg -hwaccels

Video Processing on a GPU

Let's try using the cuda method, which can encode H.264 on the GPU. To utilize GPUs in jobs, we must provide the necessary arguments.

$ srun --gpus=1 apptainer exec --nv ffmpeg_4.0.6-nvidia1804.sif ffmpeg \
-hwaccel cuda -hwaccel_output_format cuda \
-y -i llama.mp4 -codec:a copy -filter:v scale_npp=640:360 \
-codec:v h264_nvenc gpe-llama.mp4
srun --gpus=1 apptainer exec --nv ffmpeg_4.0.6-nvidia1804.sif ffmpeg \
-hwaccel cuda -hwaccel_output_format cuda \
-y -i llama.mp4 -codec:a copy -filter:v scale_npp=640:360 \
-codec:v h264_nvenc gpe-llama.mp4

To begin, we ask Slurm for a node with a single GPU using the command --gpus=1. Once assigned, we launch the container with apptainer exec --nv, which grants the programs in the container access to the GPU. For the video encoding to utilize graphics acceleration, we must specifically request ffmpeg to do so. This is necessary because ffmpeg cannot detect the presence of a GPU on its own and run the appropriate functions automatically.

  • load the appropriate libraries with -hwaccel cuda -hwaccel_output_format cuda,
  • with scale_npp tell it to use a GPU-ready filter (npp - Nvidia performance primitives) instead of the usual scale filter,
  • and with -codec:v h264_nvenc select the GPU-ready H.264 encoding algorithm.

All the other settings remain unchanged. We did not mention the encoding algorithm in our previous calls to ffmpeg since the default -codec:v h264 can be left out.

If everything went great, encoding is now much faster:

frame= 2160 fps=378 q=14.0 Lsize=   11346kB time=00:01:29.99 bitrate=1032.8kbits/s speed=15.8x    

This method of speeding up the video encoding process differs from previous approaches. Instead of parallel processing at the file level, the hardware performs the required operations faster. Although it is considered parallel computing, it occurs at a lower level. GPU can perform numerous parallel operations when encoding each frame, reaching a speedup that other methods can't achieve.

Exercise

You can visit this link to access exercises that will help you better understand the process of parallelizing video processing on a cluster.