Sometimes you might need to get the duration of an audio file, located somewhere on the network. There are a lot of libraries for any programming language as well as command-line tools that can do that for a locally available MP3 file. However, downloading the full MP3 file might not be desirable. The problem with MP3 files is that the nature of the file format doesn’t allow for quick and easy extraction of the duration data.
To get the 100% accurate duration, you do need to read (or rather skim) through the whole file. However, it is possible to get a fairly good estimate of the MP3 file duration fetching only 14 bytes of the data. It would be accurate for most MP3 files out there. It is especially useful if you know for sure there will be no “weird” files to get duration for (e.g. you are getting duration from the library that you have control over).
MP3 file contains several “frames”. The frame is a piece of audio data. Each frame consists of the header that describes how to extract the audio and the actual compressed data.
It is possible to read the header and calculate the duration of a single audio frame with 100% precision. For that we only need:
- bitrate — the number of bits per second of audio
- channel mode (stereo/mono) — whether the data contains 1 or 2 audio channels
If we know the total size of the MP3 file, we can estimate the duration on assumption that every single frame in a file has the same bitrate and channel mode. The catch is: they might not have. The channel mode is of a little concern — it is unlikely for a single file to alternate between stereo and mono (although, I stumbled upon the files like that). Variable bitrate (VBR) on the other side is something not unheard of.
That’s why I call this method “estimating the duration”, not “calculating”. The good news though — we only need to read the header which is only 4 bytes long. The structure of the header is described in detail in this article. Again — we are only interested in bitrate and channel mode.
Another thing that we might want to account for — is ID3 tag version 2. We can ignore tag version 1 because it has a fixed length of only 128 bytes. Whether it is present or not, it won’t critically affect our estimation. Version 2 tags, however, are designed to be highly extensible and may contain or not contain a substantial amount of data. Notably — the album artwork image, which might add significantly to the size of the MP3 file, thus distorting our results.
ID3 tag version 2 is always located at the beginning of the file. The full specification can be found here. We only need to read the header (10 bytes).
Estimation Algorithm Summarized
It is also worth mentioning that ID3 version 2 has a special tag for storing duration (TLEN). Although for my experience, it is rarely used, it tends to be present on VBR files. An idea for improving the algorithm — add another step that would check ID3 data for TLEN.
Originally published at https://www.factorialcomplexity.com.