Powerful Multimedia Command-Line Tools, Part I - SoX

SoX is a power-packed command-line tool for various types of audio processing. It's very useful as an audio format converter, and it can be used for resampling audio files, converting between endianness, audio encoding and modifying other attributes of common audio file formats.

Its main power, however, is its effect plugins. It can apply various effects to audio in the same way a digital audio workstation does. You can add echoes, filter frequencies, reduce or increase volume, remove noise and do various other advanced digital signal processing on sound samples.

Its companion program, play, can be used to test what a particular audio effect does before copying the output to a file with SoX. But, play does not understand MP3 and Ogg Vorbis files. You have to use one of the supported formats—the best bet is the uncompressed wav PCM format. It also supports audio mixing with its companion tool, soxmix.

Very sophisticated filtering and resampling algorithms make it a useful tool in its own right for audio manipulation. However, some of the advanced features of a professional digital audio workstation are missing.

The graphical audio processing tool Audacity is a user-friendly tool that has several of the same effects that SoX has. But, because it's a command-line tool, SoX lends itself to easy scripting, which makes it invaluable when working with hundreds or thousands of sound files.

Producing audio effects is difficult, because it is as much an art as it is a science. You often have to tweak the input values until satisfied with the result. And, you have to use different values for different files, because their frequency spectra differ based on whether the sound file contains high-fidelity music, speech or silence and also whether it is classical music or rock music, and so on.

You also can create a 5.1 channel audio file from matrix-encoded source using a combination of SoX and another companion program called multimux, written by Panteltje.

SoX can be used for recording FM radio or audio from television using the v4l2 driver in Linux, or it can record sound directly from the /dev/dsp sound card input using the ossdsp SoX input. Be aware that sound cards have limitations on sampling rates. You can't expect your sound card to be able to play audio at any sampling rate.

For downmixing from stereo to mono, combining multiple audio tracks and removing silence at the beginning of audio tracks, this is the application you need.

Figure 1. SoX, the Command-Line DAW

SoX Effects

First, let's look at the interesting echo effect:

$ play foo.wav echo 0.7 0.6 50 0.2

You will need to play with different values for the gain parameters and delay and decay values. Most effects take time values as input in seconds. The man page is not very clear about the ranges of values and other finer details, but it should not be too difficult to figure out what values work.

Also note that the echo effect can be distracting in certain circumstances. Although, I have found it adds a certain degree of liveliness to some speeches.

There is also an echos effect. It functions similarly but is more complex:

$ play foo.wav echos 0.4 0.6 900.0 0.25 900.0 0.3

You also can specify a large delay to the echo effect to make it sound eerie:

$ play foo.wav echo 0.7 0.89 1000.0 0.1

Try this with different values (in place of 1000) for the delay, until you arrive a value that you like.

Songs often have some silence in the beginning, which can be a distraction on playlists. Silence is fine for a couple seconds, but for more than that, it becomes annoying. You can delete periods of silence in your music collection with SoX using the trim effect plugin.

If you don't have any wav files and if all your music is in MP3, Ogg, aac or ac3 formats, don't despair; FFmpeg can fix this for you:

$ ffmpeg -i foo.mp3 foo.wav

You can convert it back after SoX processing using the same command but reversing the arguments.

Doing the following:

$ sox foo.wav trim 0 10 trimmed.wav

removes the first ten seconds of audio in foo.wav. You can figure out what value to use instead of ten by observing the time counter in XMMS or whatever player you use to listen to music.

SoX can do better of course. It can figure out the amount silence for you by using the silence effect plugin. Check out the man page for details. You also can specify the threshold of what you consider silence, because noise levels interfere with silence processing.

Speaking of noise, you can filter noise patterns that have a fixed spectra easily with SoX. Typically, noise in audio files comes from static sources and is not too hard to remove. Well, that's not always the case, but once you figure out how to remove noise from one input file, and if all input files were recorded from the same source, you can bank on using that strategy for the other files as well.

Other types of noise removal, however, are not easy at all. It often requires several experiments, and most of the time it backfires and removes the signal along with the noise.

In such a situation, you would be better off using high-fidelity recording equipment. As far as dealing with ambient noise, again that depends—if it's someone talking, it's difficult; if it's a constant hum, it's not. Doing:

$ sox foo.wav -t nul /dev/null trim 0 0.5 noiseprof profile

will derive a noise profile from the periods of silence at the beginning half second of the input file. Later, you can see whether this removed the noise:

$ play foo.wav noisered profile

However, that didn't work for me the first time. I was remarkably successful once—I could convert a very noisy DAT tape recording into crystal clear audio with ease. But, other times I had trouble. You will need to do some tweaking and a lot of experimentation.

Let's move on to the chorus effect. A typical chorus has many voices (both human and instrumental) that are slightly out of phase. The phase usually remains constant, as the singers try to perform with a fixed lag. They may attempt to correct this, but for the most part, chorus singers don't sing in perfect unison. The chorus effect reproduces this beautifully. Try it with the following:

$ play foo.wav chorus 0.6 0.9 50.0 0.4 0.25 2.0 -t 60.0 0.32 0.4 1.3 -s

Note the -s and -t arguments; they are used to specify sinusoidal and triangular patterns for the filters.

SoX makes good use of mathematics for its DSP work, and you can specify which primitive to use for a particular effect. You also can set up a SoX pipeline by using the - output filename. And, you can specify multiple effects on the command line. For example:

$ play foo.wav fade 5

will fade in for five seconds while slowly increasing the volume. You can do a fade out with the same effect.

The following command will let you hear devil music (play a song backward):

$ play foo.wav reverse

SoX has several highly advanced resampling algorithms, and there are several effects I have not covered in this article, so you should spend some time exploring SoX for yourself. On its own, SoX is very powerful, and if you use it in concert with other tools, command-line or graphical, it provides even more power. Its ability to accept input from standard input and spit out the processed file to standard output comes in handy for setting up an audio processing pipeline.

Girish Venkatachalam is an open-source hacker deeply interested in UNIX. In his free time, he likes to cook vegetarian dishes and actually eat them. He can be contacted at girish1729@gmail.com.

Resources

SoX: https://sox.sourceforge.net

Audacity: https://audacity.sourceforge.net

Audio Physics: https://www.harmony-central.com

Audio Engineering: https://www.ee.washington.edu/conselec/CE/kuhn/audio/95x3.htm

HRTF: https://sound.media.mit.edu/KEMAR.html

Load Disqus comments