Why high samplerates do make sense
To begin with, let me point out that this article is about the production of music, not the delivery to the end-user. Selling files that are 44100 or 48000 Hz is typically enough for that purpose. However, if you're a musician, sound designer, mix engineer or mastering technician, increasing the samplerate might be a way to improve the general clarity of your sound productions.
One theory to rule them all?
The upper frequency limit of the human hearing is typically somewhere between 14 and 18 kHz depending on the person's age. However there is a tendency for people to recognize the lack of over overtones slightly above this, and the upper limit of people's hearing is usually not an entirely steep brickwall limit, but rather a soft roll-off. So to please the vast majority of the population, the audio content should be fairly correct (free of unwanted artifacts) within the range of 20 to 20000 Hz.
Many tutorials have been written on the subject of the Nyquist theorem which explains why the samplerate needs only to be twice that of the highest frequency you need to reproduce. My article here is not an attempt to disprove that theory. What the Nyquist theorem tells us is correct, and because of this, the general opinion among musicians, recording engineers and even mastering technicians is that a samplerate of 44.1 or 48 kHz is enough. But there's more to samplerates than just this theory...
When converting analog audio into digital (or if downsampling to a lower rate), we need to get rid of frequencies which are higher than 50% of the samplerate to avoid nasty sounding aliasing artifacts. This requires some sort of lowpass filter which will inevitably have some impact on the sound depending on the chosen filter implementation. The various side-effects (such as ringing, phase changes or pass-band ripple) gradually grow worse towards the Nyquist frequency, so the further away from the Nyquest frequency your audio content is, the better. Of course, this degradation is usually extremely small if it's only happening once.
Real life scenario
However, in a real-life music production scenario there are many more things to take into account. First of all, one has to consider the large number of processing steps that any piece of sound is typically subjected to during music production. A typical signal path for a pop/rock vocal may easily involve this many steps:
In case the project has been moved back and forth between different musicians and producers, you may end up with even more effects applied, depending on the actual workflow. Each connecting arrow may even represent a conversion between analog and digital, which means one steep lowpass filter is added for each step - on top of the effects themselves.
Each effect itself may also be problematic. Software effects plug-ins are rarely 100% mathematically correct. They may contain optimizations which are a trade-off between sound quality and CPU load on a typical computer at the time the plug-in was developed. Also, not all audio programmers write code that is entirely free of aliasing or other samplerate dependant phenomena. Many plug-ins start losing a bit of precision at the upper 1-2 octaves below the Nyquist frequency. Drum machine plug-ins often sound quite differently whether you run them at 32 kHz, 44.1 kHz or 96 kHz, especially hihats and snares.
So let's say the incoming unspoiled signal looks like this:
After just a few generations of processing, imprecisions and degradation due to aliasing, filtering and such, the treble will start becoming either slightly more harsh, more attenuated or both. After one or just a few generations, the loss of quality is typically too little for anyone to notice. This is why most tests of different samplerates do not reveal any problems.
But once you reach 10, 15 or even 20 steps of processing, the loss has now accumulated over so many generations that the audio starts to suffer audibly.
The solution proposed here is to raise the samplerate of the entire production process, end-to-end. Because CPU and disk space are not unlimited resources, going to something like 192 kHz may be too troublesome at present, but a fair compromise could be using 96 kHz, which still moves the damaged area one octave up, and thus further out of the hearing range of humans.
In order to try this out properly, it is imperative to get rid of all frequency-wise bottlenecks, because this is all about leaving some headroom in the frequency domain. All recordings and samples should be 96 kHz or higher. The audio interface used should be capable of doing 96 kHz input and output (and most are), and the master samplerate of the DAW should be set to 96000 Hz.
Software plug-ins may need to be adjusted differently, because compressor/gate/limiter timings may become twice as fast in some cases, and poorly written synthesizers may be out of tune. All this can usually be compensated for, and the result is worth it.
And don't worry if the resulting file is supposed to be an mp3 file or a CD. Downsampling the result to 44.1 kHz will just get rid of the part of the sound that was fairly damaged and too high frequency to be heard anyway. The improved quality will shine though.
If you're a musician who likes experimenting with playing things at other speeds, you'll notice how samples made at 96 kHz can be played at half speed and still have clear treble. This is also quite useful to people making special effects for movies and games.
Website by Joachim Michaelis