Emulating Optical Compressors with Selective State Space Models

University of Oslo

Abstract

This paper presents a method for modeling optical dynamic range compressors using deep neural networks with Selective State Space models. The proposed approach surpasses previous methods based on recurrent layers by employing a Selective State Space block to encode the input audio. It features a refined technique integrating Feature-wise Linear Modulation and Gated Linear Units to adjust the network dynamically, conditioning the compression’s attack and release phases according to external parameters. The proposed architecture is well-suited for low-latency and real-time applications, crucial in live audio processing. The method has been validated on the analog optical compressors TubeTech CL 1B and Teletronix LA-2A, which possess distinct characteristics. Evaluation is performed using quantitative metrics and subjective listening tests, comparing the proposed method with other state-of-the-art models. Results show that our black-box modeling methods outperform all others, achieving accurate emulation of the compression process for both seen and unseen settings during training. We further show a correlation between this accuracy and the sampling density of the control parameters in the dataset and identify settings with fast attack and slow release as the most challenging to emulate..

Teletronix LA-2A Audio Examples

Peak Reduction 20, Limit Mode.

sf rms
Target S6
S4D ED
LSTM ED-b
LSTM-b TCN-b

Peak Reduction 40, Compression Mode.

sf rms
Target S6
S4D ED
LSTM ED-b
LSTM-b TCN-b

TubeTech CL 1B Audio Examples

Threshold: -10 dBu, Ratio: 6:1, Attack time: 250 ms, Release time: 5 s.

.

sf rms
Target S6
S4D ED
LSTM ED-b
LSTM-b TCN-b

Threshold: -20 dBu, Ratio: 2:1, Attack time: 250 ms, Release time: 5 s.

.

sf rms
Target S6
S4D ED
LSTM ED-b
LSTM-b TCN-b

Teletronix LA-2A Audio Examples (Extreme Settings)

Peak Reduction 100, Compression Mode.

sf rms
Target S6
S4D ED
LSTM ED-b
LSTM-b TCN-b

Peak Reduction 50, Compression Mode.

sf rms
Target S6
S4D ED
LSTM ED-b
LSTM-b TCN-b

Peak Reduction 100, Limit Mode.

sf rms
Target S6
S4D ED
LSTM ED-b
LSTM-b TCN-b

TubeTech CL 1B Audio Examples (Extreme Settings)

Threshold: -30dBu, ratio: 6:1, Attack time: 250 ms, Release time: 0.005 s.

sf rms
Target S6
S4D ED
LSTM ED-b
LSTM-b TCN-b

Threshold: -20, ratio: 10:1, Attack time: 250 ms, Release time: 0.005 s.

sf rms
Target S6
S4D ED
LSTM ED-b
LSTM-b TCN-b

Threshold: -30dBu, ratio: 10:1, Attack time: 500 ms, Release time: 5 s.

sf rms
Target S6
S4D ED
LSTM ED-b
LSTM-b TCN-b

Impact of dataset size against unseen parameter combinations. The examples refer to S6 model.

UAD LA-2A.

Peak Reduction: 55, Compression Mode.

sf rms
Input Target
S6 (Small) S6 (Medium) S6 (Large)

Softube CL 1B (unseen Threshold and Ratio)

Threshold: -10, Ratio: 4:1, Attack time: 70 ms, Release time: 1 s.

sf rms
Input Target
S6 (Small) S6 (Medium) S6 (Large)

Softube CL 1B (unseen Attack and Release times).

Threshold: -20, Ratio: 6:1, Attack time: 75 ms, Release time: 1.25 s.

sf rms
Input Target
S6 (Small) S6 (Medium) S6 (Large)

BibTeX