Hybrid Neural Audio Effects

University of Oslo
SMC 2024

Abstract

Machine learning techniques are commonly employed for modeling audio devices, particularly analog audio effects. Conditioned models have been proposed as well. The conditioning mechanism utilizes the control parameters of the modeled device to influence the sound alteration process. Neural networks are shown to be capable of interpolating between conditioning parameter values. This paper further investigates the interpolation ability of neural audio effects. In particular, we introduce additional conditioning parameters to instruct the neural network to learn and predict different audio effects using Feature-wise Linear Modulation and the Gated Linear Unit. The resulting model is a hybrid neural effect that can reproduce, depending on the conditioning values, the audio-altering process of a specific audio effect or interpolates between them. We created hybrid audio effects from a preamp circuit, an optical compressor, and a tape recorder. The designed models are able to learn the sound alteration processes for individual effects and their combinations without producing audible artifacts, and users can use the conditioning parameters to navigate in a continuous space where each point represents a different hybrid audio effect.

Audio Examples

STFT

Normalized spectrograms of 5 seconds included in the test sets. Targets (top) against C (left), T (middle), and P (right) models, and against CT, TP, CP, and CTP models conditioned as C, as T, and as P, respectively.

Input (Compressor) Input (Tape Delay) Input (Preampflier)
Targets (Compressor) Targets (Tape Delay) Targets (Preampflier)
Prediciton C model Prediciton T model Prediciton P model
Prediciton CP model (as C) Prediciton CT model (as T) Prediciton TP model (as P)
Prediciton CTP model (as C) Prediciton CTP model (as T) Prediciton CTP model (as P)
EDCL1BPreampInterpolationD1constplot EDCL1BTapeInterpolationD1constplot EDTapePreampInterpolationD1constplot

Waveforms and spectrograms (normalized) of 20 seconds included in the test sets, resulted from the CP (left), CT (middle), and TP (right) models when the parameter is 0.5.

Input
Prediction CT Prediction CP Prediction TP
EDCL1BPreampInterpolationD1sineplot EDCL1BTapeInterpolationD1sineplot EDTapePreampInterpolationD1sineplot

Normalized spectrograms of a 20 second signal resulted from the CP (left), CT (middle), and TP (right) models when the parameter is continuously changed by modulating it using a sine wave at 0.05 Hz.

Input
Prediction CT Prediction CP Prediction TP
EDCL1BTapePreampCTP3Interpolation5plot EDCL1BTapePreampCTP3Interpolation6plot EDCL1BTapePreampCTP3Interpolation7plot EDCL1BTapePreampCTP3Interpolation8plot

Waveforms and normalized spectrograms of a 20 second signal resulted from the CTP model when the parameter is [0.3, 0.3, 0.3] (top-left), [0.5, 0.25, 0.25] (top-right), [0.25, 0.5, 0.25] (bottom-left), and [0.25, 0.25, 0.5] (bottom-right).

Prediction
EDCL1BTapePreampCTP3Interpolation1plot EDCL1BTapePreampCTP3Interpolation2plot EDCL1BTapePreampCTP3Interpolation3plot EDCL1BTapePreampCTP3Interpolation4plot

Waveforms and normalized spectrograms of a 20 second signal resulted from the CTP model when the parameter is [0.0,0.0,0.0] (top-left), [1.0,1.0,1.0] (top-right), [0.5,0.5,0.5] (bottom-left), and [0.3,0.8,0.5] (bottom-right).

Predictions
EDCL1BTapePreampCTP3Interpolation_contplot

Normalized spectrogram of a 20 second music signal produced by the CTP model when the parameter is continuously changed by modulating it using a sine wave at 0.05, 0.07, and 0.09 Hz.

Predictions

BibTeX

@article{simionato2024hybrid,
  title={HYBRID NEURAL AUDIO EFFECTS},
  author={SIMIONATO, Riccardo and FASCIANI, Stefano},
  booktitle={Proceedings of the International Conference on Sound and Music Computing},
  year={2024}}