Parallel Processing of Video Sequences

There are vast number of problems that can be divided into independent sub-problems, which can be tackled simultaneously. These are referred to as "embarrassingly parallel" problems.

One typical example is video processing. To speed up this process, we can take the following steps:

first, we divide the video using one core,
then we process each segment separately on its own core, and
finally, we assemble the segment processed on one core back into a whole.

Parallel processing of video sequences

This approach has an advantage because each piece is processed in parallel on its own core. By splitting the video into N equal segments during the first step, the processing time in the second step is roughly N fractions of the time it would take to process the entire video on a single core. Even if we factor in the disassembly and reassembly steps, which aren’t present when processing on a single core, we can still achieve a significant improvement.

First, make sure that the video file llama.mp4¹ is on the cluster.

Step 1: Divide

We want to divide the llama.mp4 video into smaller pieces, which will then be processed in parallel. First, load the appropriate module if you don't already have one.

ExampleCommand

$ module load FFmpeg

module load FFmpeg

Then we divide the video into 20-second segments using the following command:

ExampleCommand

$ srun --ntasks=1 ffmpeg \
-y -i llama.mp4 -codec copy -f segment -segment_time 20 \
-segment_list parts.txt part-%d.mp4

srun --ntasks=1 ffmpeg -y -i llama.mp4 -codec copy -f segment -segment_time 20 -segment_list parts.txt part-%d.mp4

where

-codec copy the audio and video record are copied exactly as they are to the output files.
-f segment selects segmentation option, which splits the input file,
-segment_time 20 gives the desired duration of each chunk in seconds,
-segment_list parts.txt stores the list of generated parts in the parts.txt file,
part-%d.mp4 gives the name of the output files, where %d is a placeholder that ffmpeg replaces with the sequence number of the chunk during the chunking process.

Once the process is complete, the working directory will contain the original video, the list of segments in parts.txt, and the segments labeled part-0.mp4 to part-4.mp4.

ExampleCommand

$ ls
llama.mp4  part-0.mp4  part-1.mp4  part-2.mp4  part-3.mp4  part-4.mp4  parts.txt

ls

Step 2: Process

We process each segment in the same way. When we want to change the resolution of the segment part-0.mp4 to 640×360 and save the result to the file out-part-0.mp4, we use the following command:

ExampleCommand

$ srun --ntasks=1 ffmpeg \
-y -i part-0.mp4 -codec:a copy -filter:v scale=640:360 out-part-0.mp4

srun --ntasks=1 ffmpeg -y -i part-0.mp4 -codec:a copy -filter:v scale=640:360 out-part-0.mp4

To request resources and run our task on the assigned compute node, we typically use the srun command. However, this can be a tedious process if we have many segments to process. Instead, we can use a sbatch script that automates this process for us.

#!/bin/sh
#SBATCH --job-name=ffmpeg1    # job name
#SBATCH --output=ffmpeg1.txt  # log file of job 
#SBATCH --time=00:10:00       # time limit hours:minutes:seconds
#SBATCH --reservation=fri     # reservation, in case we have it; otherwisw delete this line

srun ffmpeg -y -i part-0.mp4 -codec:a copy -filter:v scale=640:360 out-part-0.mp4
srun ffmpeg -y -i part-1.mp4 -codec:a copy -filter:v scale=640:360 out-part-1.mp4
srun ffmpeg -y -i part-2.mp4 -codec:a copy -filter:v scale=640:360 out-part-2.mp4
srun ffmpeg -y -i part-3.mp4 -codec:a copy -filter:v scale=640:360 out-part-3.mp4
srun ffmpeg -y -i part-4.mp4 -codec:a copy -filter:v scale=640:360 out-part-4.mp4

⇩ Download ffmpeg1.sh

We save the script as file ffmpeg1.sh, and run it with the following command:

ExampleCommand

$ sbatch ./ffmpeg1.sh
Submitted batch job 389552

sbatch ./ffmpeg1.sh

We have not yet speeded up the execution because the script waits for each srun call to finish before moving on to the next one. To send all the segments for processing at the same time, add an & character to the end of each srun line to run the command in the background. This will allow the script to immediately resume execution of the next command without waiting for the previous one to finish. However, at the end of the script, the wait command must be added to ensure that all the tasks running in the background are completed before the script terminates. This ensures that all the video segments are processed to completion.

Each call to srun represents one task in our transaction. Therefore, we specify the number of tasks to be executed simultaneously in the script header using --ntasks=5. This ensures that Slurm executes a maximum of five tasks at a time, even if there are more. To prevent Slurm from repeating the task five times for each chunk, which is not helpful, we add --ntasks=1 and --overlap to each srun call.

#!/bin/sh
#SBATCH --job-name=ffmpeg2
#SBATCH --output=ffmpeg2.txt
#SBATCH --time=00:10:00
#SBATCH --ntasks=5            # number of tasks in job, which are executed in parallel

srun --ntasks=1 --overlap ffmpeg -y -i part-0.mp4 -codec:a copy -filter:v scale=640:360 out-part-0.mp4 &
srun --ntasks=1 --overlap ffmpeg -y -i part-1.mp4 -codec:a copy -filter:v scale=640:360 out-part-1.mp4 &
srun --ntasks=1 --overlap ffmpeg -y -i part-2.mp4 -codec:a copy -filter:v scale=640:360 out-part-2.mp4 &
srun --ntasks=1 --overlap ffmpeg -y -i part-3.mp4 -codec:a copy -filter:v scale=640:360 out-part-3.mp4 &
srun --ntasks=1 --overlap ffmpeg -y -i part-4.mp4 -codec:a copy -filter:v scale=640:360 out-part-4.mp4 &

wait

⇩ Download ffmpeg2.sh

Array of jobs

The method mentioned above is functional, but it is not very user-friendly. If we modify the number of pieces, we need to modify or add lines in the script, which can result in errors. Additionally, we can notice that the steps are almost identical except for the number in the file names. Luckily, Slurm offers a solution for this scenario called array jobs. Let's take a look at how we can user array jobs for the above example.

#!/bin/sh
#SBATCH --job-name=ffmpeg3
#SBATCH --time=00:10:00 
#SBATCH --output=ffmpeg3-%a.txt   # %a represents job label
#SBATCH --array=0-4               # values for %a 

srun ffmpeg \
-y -i part-$SLURM_ARRAY_TASK_ID.mp4 -codec:a copy -filter:v scale=640:360 \
out-part-$SLURM_ARRAY_TASK_ID.mp4

⇩ Download ffmpeg3.sh

To run multiple commands in a script using Slurm, we have added the --array=0-4 option. This command instructs Slurm to execute the script for each number within the range of 0 to 4. The number of tasks run by Slurm is determined by the range specified by the --array option, which in this case is 5. If we want to limit the number of tasks running simultaneously to 3, for instance, we can write --array=0-4%3.

We don't need to specify the filename in the command, as Slurm replaces it with the $SLURM_ARRAY_TASK_ID setting for each task. The srun command is executed for each task, and we don't need to specify --ntasks=1. Slurm replaces the $SLURM_ARRAY_TASK_ID setting with one of the numbers from the range specified by the --array switch for each task. Additionally, we've added the % a setting to the log file name, which Slurm also replaces with a number from the range specified by the --array switch. This ensures that each task writes to its own file. The $SLURM_ARRAY_TASK_ID setting is an environment variable that Slurm sets appropriately for each task. When Slurm executes the #SBATCH commands, this variable doesn't exist yet, so we need to use the `% a' setting for switches in Slurm.

Once this step is complete, we'll have the files out-part-0.mp4 to out-part-4.mp4 in the working directory containing the processed segments of the original recording.

Step 3: Assemble

We just need to combine the files out-part-0.mp4, ..., out-part-4.mp4 into a single video. To do this, we need to provide ffmpeg with a list of segments we want to merge. Let's create a file named out-parts.txt with the following content:

file out-part-0.mp4
file out-part-1.mp4
file out-part-2.mp4
file out-part-3.mp4
file out-part-4.mp4

⇩ Download out-parts.txt

We can create it from an existing list of segments of the original video named parts.txt. First, let's rename the file to out-parts.txt. Open the out-parts.txt file in a text editor and find and replace all occurrences of the string part with the string file out-part.

Alternatively, we can create a list of individual video segments using the command line and the sed program (stream editor):

ExampleCommand

$ sed 's/part/file out-part/g' < parts.txt > out-parts.txt

sed 's/part/file out-part/g' < parts.txt > out-parts.txt

Finally, we assemble the output video out-llama.mp4 from the list of segments in the out-parts.txt file.

ExampleCommand

$ srun --ntasks=1 ffmpeg -y -f concat -i out-parts.txt -c copy out-llama.mp4

srun --ntasks=1 ffmpeg -y -f concat -i out-parts.txt -c copy out-llama.mp4

You can remove temporary files with the data transfer tool.

Exercise

Here you will find exercises to strengthen your knowledge of the parallel video processing process on the cluster.

Steps 1, 2, 3 All at Once

In the previous sections, we speeded up the video processing by splitting the task into several pieces, which are executed as several parallel tasks. We ran ffmpeg over each chunk, each task used one kernel and knew nothing about the other segments. This approach can be taken whenever a problem can be split into independent pieces. We do not need to change the processing program.

In principle, each job is limited to one processor core. However, by using threads, a single job can use multiple cores. The ffmpeg program can use threads for many operations. The above three steps done earlier can also be executed with a single command. Now we run the task with a single job for the whole file since the ffmpeg program will itself split the processing into several segments, depending on the number of cores we assign to it.

So far we have done all the steps using the FFmpeg module. This time we use the ffmpeg-alpine.sif container. If we haven't already, we download the FFmpeg container from the link on the cluster. When using the container, add apptainer exec ffmpeg_alpine.sif before calling ffmpeg. There are now three programs included in the command:

srun sends job to Slurm and runs the program apptainer,
program apptainer runs container ffmpeg-alpine.sif, and
inside container ffmpeg-alpine.sif starts program ffmpeg.

ExampleCommand

$ srun --ntasks=1 --cpus-per-task=5 apptainer exec ffmpeg_alpine.sif ffmpeg \
-y -i llama.mp4 -codec:a copy -filter:v scale=640:360 out123-llama.mp4

srun --ntasks=1 --cpus-per-task=5 apptainer exec ffmpeg_alpine.sif ffmpeg -y -i llama.mp4 -codec:a copy -filter:v scale=640:360 out123-llama.mp4

A process that we executed in three steps before is now done for us by ffmpeg in about the same time. With the --cpus-per-task switch, we have requested that Slurm reserve 5 processor cores for each task in our transaction.

While ffmpeg is running, it prints the status on the last line:

frame= 2160 fps=147 q=-1.0 Lsize=    4286kB time=00:01:30.00 bitrate= 390.1kbits/s speed=12.7x

The speed data tells us that encoding is 12.7 times faster than real-time playback. In other words, if a video is 90 seconds long, it will take 90/12.7 ≈ 7.1 seconds to encode.

Hint

If you had read the documentation well before using ffmpeg, you would have made your work much easier. But then you would not have learnt many very useful ways of using a computer cluster.

The video was created by the Blender project. ↩