Voice Activity Detection in Python and SWIG

The WebRTC codebase contains a very solid voice activity detection (VAD) algorithm. The project itself is a treasure-trove of solid solutions to common problems in speech, audio and video streaming, encoding etc.

Recently, I was in need of a solid VAD I could use from Python. I wrote one myself in college (and to be fair it was a bit shit).

In a few hours I was able to isolate the source code from the WebRTC project and write a Python wrapper for it in SWIG.

A working VAD for Linux in Python on x86_64 is available in this repo.

The WebRTC VAD components are in this repo.

Some SWIG tips:

  • C functions typically have the following signature: int funcName(int *input_array, size_t array_size); Numpy ships with a fantastic set of typemaps (defined in numpy.i) for just this sort of thing. Drop numpy.i into your directory and include it in your SWIG setup.
  • A lot of typemaps aren’t defined in numpy.i - do not hesitate to write a header. For instance, numpy.i doesn’t have a typemap involving a const int * - a small wrapper around your desired function call it perfect and allows you to use existing typemaps.

Robust Principal Component Pursuit - Background Matrix Recovery

I recently spent some time working on a simple linear algebra problem - decompose a matrix $ M $ into a low-rank component $ L $ and a sparse component $ S $. The algorithm I used was very trivial to implement (and parallelize using map-reduce).

In this post, I will implement this very simple algorithm, explain the objective function and demonstrate its (amazing) effectiveness on a surveillance-camera dataset.

Twitter: @shriphani
Instagram: @life_of_ess
Fortior Per Mentem
(c) Shriphani Palakodety 2013-2020