Simple FFmpeg audio mastering

Before releasing video, audio part need to be mastered. We already write about audio level normalization. Today, we focus on most common cases in applying equalizer to audio.

Filter adeclip

Removes some artefacts created by clipping and prints clipping statistics. Always use. It will not lower volume under 0.0 dbFS, in fact true peaks will be slightly raised.

Filter volume

If volume is too hot, use filter volume to lower true peaks under 0 dbFS. If you are not sure how much negative gain is needed, then go for -6 dB. Can be used for tuning final loudness combined with ebur128 measurement if you do not want to use loudnorm filter. Sometimes loudnorm reduces dynamic range too much.

volume=-6dB

Filter asubcut

Cuts low frequencies. Needed in most cases. There are several good cut points, one is about 26 Hz, next is about 35, but sometimes you need to go much higher - about 45 Hz. You need good speakers to hear the difference.

asubcut=26

Filter afwtdn

Remove noise and slight De-Esser simiar what tube preamps do. It’s not very aggressive and default values works the best. If you are shooting outdoor audio, use it. It removes just slight high pitched noise and overall cleans audio.

Filter alimiter

Limits peaks. It’s not a hard limiter, peaks can and will go slightly over the specified level. Default timing is 5 ms attack, 50 ms release. I recommend to use 100 ms release time as Spotify does.

alimiter=-1dB:release=100

Filter lowpass

I found 6 dB/octave (p=1) lowpass filter to be optimal for removing too many highs without removing all highs. To leave some highs, limit filter bandwidth to 8-12khz (t=h:w=12k). I use cut frequency above 4.1 kHz.

lowpass=4.4k:t=h:w=12k:p=1

Filter atilt

Tilt filter is useful for rebalancing audio, keeping overall tone balance. Negative slope cuts highs, positive slope lows. My favourite middle frequency is about 2200Hz and bandwidth 3000 to 4000 Hz. order change steepness of curve.

atilt=freq=2150:width=4000:slope=-0.5:order=7

Filter loudnorm

Long term volume normalization, shifts volume levels according to delivery platform specification. I and TP are targets for true peaks and integrated loudness. How to find correction offset is described here.

loudnorm=I=-14:TP=-1:print_format=summary:offset=-1.4

Filter aresample

Changes sample rate of audio, can use different dithering methods. This is needed after using loudnorm because loudnorm will do 4X oversampling to get accurate true peak detection. For video applications, standard sample rate is 48k.

aresample=48k:dither_method=triangular_hp

Filter ebur128

Prints integrated loudness, LRA and minimum / maximum short term loudness.

Filter astats

Prints several audio related statistics.

Filter aformat

Changes audio format. Its good to use floating point format during mastering because its more accurate. Two floating point formats are available: flt and fltp.

Finished command

ffmpeg -hide_banner -i <input.mp4> -af adeclip,asubcut=35,lowpass=4.4k:t=h:w=12k:p=1,loudnorm=I=-14:TP=-1:print_format=summary:offset=-1.4,aresample=48k:dither_method=triangular_hp -acodec libfdk_aac -vcodec copy <output.mp4> -y

Loudnorm alternative

Using loudnorm inevitably lowers LRA. This is very often desired because it makes sound louder. In cases where you want to mostly keep original dynamic use two limiters.

ffmpeg -i <zma.mp4> -af aformat=flt,volume=+12.99dB,alimiter=-4.5dB:attack=17:release=100,alimiter=-1.0dB:attack=5,ebur128,astats -vcodec copy -acodec libfdk_aac <zma-normalized.mp4> -y

Club like compressed sound

Caused by s - compression parameter. Lower values compressing more. Too low values like 8 adds basses, we cut basses to get radio friendly sound. Value s=10 doesn’t adds too much sub basses - no need to remove them. Highs needs to be always cut, too much artefacts are created by this compressor. Running it at 96khz will nake more “pro” clean sound.

asubcut=29:order=8,dynaudnorm=p=1/sqrt(2):m=100:s=8:g=15,asupercut=20k

with less compression:

dynaudnorm=p=1/sqrt(2):m=100:s=13:g=15,asupercut=20k

80s sound

aresample=23.1k:dither_method=f_weighted:exact_rational=false,asubcut=32:order=7,dynaudnorm=p=1/sqrt(2):m=100:s=4:g=15,asupercut=20k,aresample=44.1k:dither_method=triangular

spacey sound

aresample=25.1k,asubcut=27:order=4,aresample=192k,rubberband=tempo=0.9:detector=soft:phase=laminar,adelay=20,stereotools=balance_out=-0.4,asupercut,volume=+4dB,apsyclip,aresample=44.1k,ebur128=peak=sample