-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
11 changed files
with
8,895 additions
and
0 deletions.
There are no files selected for viewing
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
# Skipper Makefile | ||
|
||
CC := gcc | ||
|
||
utils := skipper tensor-gen bin2c | ||
|
||
all: $(utils) | ||
|
||
skipper: skipper.c biquad.c lzwlib.c skipper.h biquad.h lzwlib.h 4d-tensor.h | ||
$(CC) skipper.c biquad.c lzwlib.c -O3 -lm -o skipper | ||
|
||
tensor-gen: tensor-gen.c lzwlib.c skipper.h lzwlib.h | ||
$(CC) tensor-gen.c lzwlib.c -lm -o tensor-gen | ||
|
||
bin2c: bin2c.c | ||
$(CC) bin2c.c lzwlib.c -lm -o bin2c | ||
|
||
clean: | ||
rm -f skipper tensor-gen bin2c |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,131 @@ | ||
## SKIPPER | ||
|
||
Selective Audio Detection and Filter | ||
|
||
Copyright (c) 2024 David Bryant. | ||
|
||
All Rights Reserved. | ||
|
||
Distributed under the [BSD Software License](https://github.com/dbry/skipper/blob/main/LICENSE). | ||
|
||
## What is this then? | ||
|
||
**Skipper** is a simple machine-learning-trained audio filter that can | ||
differentiate between musical material and talking in audio streams | ||
and, optionally, filter out (i.e., skip) one or the other. | ||
|
||
I developed this because I enjoy listening to and archiving FM radio | ||
and Internet music program streams from local stations. These are great | ||
for discovering new music, learning about the local music scene, and listening | ||
to interviews with artists and others in the music community. I find that | ||
these programs provide a superior curation of music than any automated | ||
methods or "genre" streams I have listened to. One local station's | ||
catch phrase is "don't let the robots win", and I'm on board! | ||
|
||
The problem is that if I listen to these archived programs more than once | ||
the dialog starts to get repetitive and sometimes even irrelevent and | ||
outdated (e.g., upcoming concerts or long finished pledge drives). Also, | ||
there are times when the dialog is undesirable, such as when I'm using the | ||
stream as background music for intense exercise, or reading, or when I | ||
have guests over. | ||
|
||
For these situations I have even gone as far as editing particularly good | ||
archived streams and removing the dialog. Unfortunately this is somewhat | ||
time-consuming and not at all practical on a regular basis. However, when | ||
I did do this I noticed that I could fairly easily distinguish the music | ||
and dialog by just glancing at the waveforms, and have thought that this | ||
would be something that could be automated without too much difficulty | ||
(famous last words). | ||
|
||
Anyway, _that_ is what **Skipper** is. By default it simply acts as a filter, | ||
consuming raw PCM audio (stereo or mono, 16-bit) from `stdin` and writing it | ||
unchanged (except always stereo) to `stdout`. However, it will be detecting | ||
music/talk transitions and reporting those timestamps to `stderr`, and two | ||
options are provided for filtering based on that detection. | ||
|
||
Specifying `-t` will skip over the detected talk and pass only the music | ||
(with crossfades to smooth the transitions) and, conversely, `-m` will | ||
skip over the detected music and pass only the talking portions, which is | ||
handy to see how well (or poorly) **Skipper** is working, and might even | ||
be useful on its own. | ||
|
||
## Caveats | ||
|
||
So, while it's pretty trivial to distinguish _most_ music and talk in | ||
audio streams, it's basically impossible to be 100% accurate. Why? Well consider | ||
the situation where the DJ is talking during and over the music. Depending | ||
on the relative levels, and what portion of the time the talking is occurring, | ||
this can essentially make the talk detection impossible. Or consider a cappella | ||
singing (which can range from operatic singing to essentially talking), or music | ||
that includes people actually talking (my brother used to like putting preachers | ||
in his songs). And some music genres simply have a very similar temporal acoustic | ||
profile to talk, and **Skipper** gets easily confused. | ||
|
||
Despite this, I find the program useful enough, and I have provided options for | ||
adjusting the detection threshold if it's getting too much talk or skipping too | ||
much music. And I'm working on improving the algorithm by creating a larger | ||
training corpus and more analysis functionality, so improvements will come... | ||
|
||
## Building | ||
|
||
I have provided a Makefile that should build the program on Linux and similar | ||
setups. I'll provide a Windows executable as well. | ||
|
||
Note that the executable `skipper` is the only one required. The other | ||
executables `tensor-gen` and `bin2c` are used, along with the `-a` option | ||
of `skipper` for generating tensor files from training audio data. | ||
|
||
## Usage | ||
|
||
There are probably many ways to use **Skipper**, but I have been using it with | ||
[FFmpeg](https://www.ffmpeg.org/) as the source because it handles virtually every | ||
format and works well writing to pipes. The output of `FFmpeg` is piped directly to | ||
`skipper` and its output is then piped to an appropriate encoder like | ||
[lame](https://lame.sourceforge.io/): | ||
|
||
> ffmpeg -i sourcefile.ext -f s16le - | ./skipper -t | lame -r - music-only.mp3 | ||
Alternatively, it's also possible to pipe the output of `skipper` directly to | ||
[FFplay](https:www.ffmpeg.org/) for immediate playback. In this use case we use the | ||
`-k` option to add "keep-alive" crossfades during long skips so that the playback | ||
does not underrun. | ||
|
||
> ffmpeg -i sourcefile.ext -f s16le - | ./skipper -tk | ffplay - -f s16le -ch_layout stereo | ||
Currently **Skipper**'s functionality is only available as a command-line filter. | ||
I have plans to create a callable library as well to make it possible to more easily | ||
integrate into an existing application. | ||
|
||
## Help | ||
|
||
``` | ||
SKIPPER Selective Audio Detection and Filter Version 0.1 | ||
Copyright (c) 2024 David Bryant. All Rights Reserved. | ||
Usage: SKIPPER [-options] < SourceAudio.pcm > StereoOutput.pcm | ||
Operation: scan source audio (`stdin`) using tensor discrimination to filter | ||
output (`stdout`), skipping either music (-m) or talk (-t); or | ||
output raw scan analytics for use with TENSOR-GEN util (-a) | ||
Options: -a <file.bin> = output analysis results to specified file | ||
-c<n> = override default channel count of 2 | ||
-d <file.tensor> = specify alternate discrimination tensor file | ||
-k = keep-alive crossfading for long skips | ||
-l<n> = left output override (for debug, n = 1-4: | ||
= 1=mono, 2=filtered, 3=level, 4=tensor) | ||
-m[<n>] = skip over music, with optional threshold offset | ||
= (raise or lower music threshold +/- 99 points) | ||
-n = no audio output (skip everything) | ||
-p = pass all audio (no skipping, default) | ||
-q = no messaging except errors | ||
-r<n> = right output override (for debug, n = 1-4: | ||
= 1=mono, 2=filtered, 3=level, 4=tensor) | ||
-s<n> = override default sample rate of 44.1 kHz | ||
-t[<n>] = skip over talk, with optional threshold offset | ||
= (raise or lower talk threshold +/- 99 points) | ||
-v[<n>] = set verbosity + [rate in seconds] | ||
Web: Visit www.github.com/dbry/skipper for latest version and info | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,57 @@ | ||
//////////////////////////////////////////////////////////////////////////// | ||
// **** BIN2C **** // | ||
// Binary to C-source converter // | ||
// Copyright (c) 2024 David Bryant. // | ||
// All Rights Reserved. // | ||
// Distributed under the BSD Software License (see license.txt) // | ||
//////////////////////////////////////////////////////////////////////////// | ||
|
||
#include <stdlib.h> | ||
#include <string.h> | ||
#include <stdio.h> | ||
|
||
#ifdef _WIN32 | ||
#include <fcntl.h> | ||
#endif | ||
|
||
#include "lzwlib.h" | ||
|
||
#define BYTES_PER_LINE 16 | ||
|
||
int main (int argc, char **argv) | ||
{ | ||
int num_bytes = 0, alloced_bytes = 0, ch; | ||
unsigned char *buffer = NULL; | ||
|
||
#ifdef _WIN32 | ||
setmode (fileno (stdin), O_BINARY); | ||
#endif | ||
|
||
while ((ch = getchar ()) != EOF) { | ||
if (num_bytes == alloced_bytes) | ||
buffer = realloc (buffer, alloced_bytes += 65536); | ||
|
||
buffer [num_bytes++] = ch; | ||
} | ||
|
||
printf ("static unsigned char %s [%d] = {\n", argc == 2 ? argv [1] : "array", num_bytes); | ||
|
||
for (int i = 0; i < num_bytes; i += BYTES_PER_LINE) { | ||
char string [256] = { 0 }; | ||
|
||
strcat (string, " "); | ||
for (int j = 0; i + j < num_bytes && j < BYTES_PER_LINE; ++j) { | ||
sprintf (string + strlen (string), "0x%02x", buffer [i+j]); | ||
if (i + j < num_bytes - 1) | ||
strcat (string, ","); | ||
if (i + j < num_bytes - 1 && j < BYTES_PER_LINE - 1) | ||
strcat (string, " "); | ||
} | ||
|
||
printf ("%s\n", string); | ||
} | ||
|
||
printf ("};\n"); | ||
free (buffer); | ||
return 0; | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,93 @@ | ||
//////////////////////////////////////////////////////////////////////////// | ||
// **** BIQUAD **** // | ||
// Simple Biquad Filter Library // | ||
// Copyright (c) 2021 - 2022 David Bryant. // | ||
// All Rights Reserved. // | ||
// Distributed under the BSD Software License (see license.txt) // | ||
//////////////////////////////////////////////////////////////////////////// | ||
|
||
// biquad.c | ||
|
||
#include "biquad.h" | ||
|
||
// Second-order Lowpass | ||
|
||
void biquad_lowpass (BiquadCoefficients *filter, double frequency) | ||
{ | ||
double Q = sqrt (0.5), K = tan (M_PI * frequency); | ||
double norm = 1.0 / (1.0 + K / Q + K * K); | ||
|
||
filter->a0 = K * K * norm; | ||
filter->a1 = 2 * filter->a0; | ||
filter->a2 = filter->a0; | ||
filter->b1 = 2.0 * (K * K - 1.0) * norm; | ||
filter->b2 = (1.0 - K / Q + K * K) * norm; | ||
} | ||
|
||
// Second-order Highpass | ||
|
||
void biquad_highpass (BiquadCoefficients *filter, double frequency) | ||
{ | ||
double Q = sqrt (0.5), K = tan (M_PI * frequency); | ||
double norm = 1.0 / (1.0 + K / Q + K * K); | ||
|
||
filter->a0 = norm; | ||
filter->a1 = -2.0 * norm; | ||
filter->a2 = filter->a0; | ||
filter->b1 = 2.0 * (K * K - 1.0) * norm; | ||
filter->b2 = (1.0 - K / Q + K * K) * norm; | ||
} | ||
|
||
// Initialize the specified biquad filter with the given parameters. Note that the "gain" parameter is supplied here | ||
// to save a multiply every time the filter in applied. | ||
|
||
void biquad_init (Biquad *f, const BiquadCoefficients *coeffs, float gain) | ||
{ | ||
f->coeffs = *coeffs; | ||
f->coeffs.a0 *= gain; | ||
f->coeffs.a1 *= gain; | ||
f->coeffs.a2 *= gain; | ||
f->in_d1 = f->in_d2 = 0.0F; | ||
f->out_d1 = f->out_d2 = 0.0F; | ||
f->first_order = (coeffs->a2 == 0.0F && coeffs->b2 == 0.0F); | ||
} | ||
|
||
// Apply the supplied sample to the specified biquad filter, which must have been initialized with biquad_init(). | ||
|
||
float biquad_apply_sample (Biquad *f, float input) | ||
{ | ||
float sum; | ||
|
||
if (f->first_order) | ||
sum = (input * f->coeffs.a0) + (f->in_d1 * f->coeffs.a1) - (f->coeffs.b1 * f->out_d1); | ||
else | ||
sum = (input * f->coeffs.a0) + (f->in_d1 * f->coeffs.a1) + (f->in_d2 * f->coeffs.a2) - (f->coeffs.b1 * f->out_d1) - (f->coeffs.b2 * f->out_d2); | ||
|
||
f->out_d2 = f->out_d1; | ||
f->out_d1 = sum; | ||
f->in_d2 = f->in_d1; | ||
f->in_d1 = input; | ||
return sum; | ||
} | ||
|
||
// Apply the supplied buffer to the specified biquad filter, which must have been initialized with biquad_init(). | ||
|
||
void biquad_apply_buffer (Biquad *f, float *buffer, int num_samples, int stride) | ||
{ | ||
if (f->first_order) while (num_samples--) { | ||
float sum = (*buffer * f->coeffs.a0) + (f->in_d1 * f->coeffs.a1) - (f->coeffs.b1 * f->out_d1); | ||
f->out_d2 = f->out_d1; | ||
f->in_d2 = f->in_d1; | ||
f->in_d1 = *buffer; | ||
*buffer = f->out_d1 = sum; | ||
buffer += stride; | ||
} | ||
else while (num_samples--) { | ||
float sum = (*buffer * f->coeffs.a0) + (f->in_d1 * f->coeffs.a1) + (f->in_d2 * f->coeffs.a2) - (f->coeffs.b1 * f->out_d1) - (f->coeffs.b2 * f->out_d2); | ||
f->out_d2 = f->out_d1; | ||
f->in_d2 = f->in_d1; | ||
f->in_d1 = *buffer; | ||
*buffer = f->out_d1 = sum; | ||
buffer += stride; | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,41 @@ | ||
//////////////////////////////////////////////////////////////////////////// | ||
// **** BIQUAD **** // | ||
// Simple Biquad Filter Library // | ||
// Copyright (c) 2021 - 2022 David Bryant. // | ||
// All Rights Reserved. // | ||
// Distributed under the BSD Software License (see license.txt) // | ||
//////////////////////////////////////////////////////////////////////////// | ||
|
||
// biquad.h | ||
|
||
#include <stdlib.h> | ||
#include <stdint.h> | ||
#include <stdio.h> | ||
#include <math.h> | ||
|
||
typedef struct { | ||
float a0, a1, a2, b1, b2; | ||
} BiquadCoefficients; | ||
|
||
typedef struct { | ||
BiquadCoefficients coeffs; // coefficients | ||
float in_d1, in_d2; // delayed input | ||
float out_d1, out_d2; // delayed output | ||
int first_order; // optimization | ||
} Biquad; | ||
|
||
#ifdef __cplusplus | ||
extern "C" { | ||
#endif | ||
|
||
void biquad_init (Biquad *f, const BiquadCoefficients *coeffs, float gain); | ||
|
||
void biquad_lowpass (BiquadCoefficients *filter, double frequency); | ||
void biquad_highpass (BiquadCoefficients *filter, double frequency); | ||
|
||
void biquad_apply_buffer (Biquad *f, float *buffer, int num_samples, int stride); | ||
float biquad_apply_sample (Biquad *f, float input); | ||
|
||
#ifdef __cplusplus | ||
} | ||
#endif |
Oops, something went wrong.