CamillaDSP - Cross-platform IIR and FIR engine for crossovers, room correction etc.

@HenrikEnquist I can't reproduce the volume bump anymore so it might be fixed in alpha 2. I connect my XMOS usb interface to my motu mk5 with toslink, set the clock of my motu mk5 to external(so its clock follows the upstream XMOS interface), and lastly motu mk5 is connected to Pi running CDSP with a USB cable. When I use ASIO driver of XMOS on windows to play music (eg. play music with ASIO plugin in foobar), there're lots of buffer underrun warning in CDSP's log, especially if I stop the music and replay it after a while. By directplay, I mean play music without ASIO driver(in a browser), so basically windows directstream. I can't really notice any actual underrun(clicks or popping sound) though. Chunksize is 1024 at 48kHz. I also tried 2048 for chunksize. It makes no difference. Underrun still shows up in log from time to time, and I still won't be cable to catch any click or pops. So I guess it could be XMOS ASIO driver cut itself off when not playing any music.
 
@HenrikEnquist I'm seeing much higher CPU usage on alpha 3 than v1.0.3. Is this expected?

On v1.0.3 I get total CPU usage for the CamillaDSP process of 22% with the top thread using 12%. On alpha 3 I get total CPU usage of 95% with the top thread using 85% - I assume that's the processing thread.

My use case is I'm upsampling to 352kHz before the audio gets to CamillaDSP, then using CamillaDSP for headphone crossfeed and PEQ - so the pipeline consists almost entirely of a series of biquad filters. The CPU usage drops significantly if I reduce the number of biquads so it seems for some reason biquads are very CPU heavy in alpha 3. I'm running this on an Intel N3700 quad core atom CPU.

Thank you for your great work!
 
It looks like the file didn't attach properly. Here is the full config:

devices:
samplerate: 96000
chunksize: 2048
queuelimit: 1
capture:
type: Stdin
channels: 2
format: S32LE
playback:
type: Alsa
channels: 2
device: "soundout"
format: S32LE

mixers:
2to4cross:
channels:
in: 2
out: 4
mapping:
- dest: 0
mute: false
sources:
- channel: 0
gain: 0
inverted: false
mute: false
- dest: 1
mute: false
sources:
- channel: 0
gain: 0
inverted: false
mute: false
- dest: 2
mute: false
sources:
- channel: 1
gain: 0
inverted: false
mute: false
- dest: 3
mute: false
sources:
- channel: 1
gain: 0
inverted: false
mute: false

4to2cross:
channels:
in: 4
out: 2
mapping:
- dest: 0
mute: false
sources:
- channel: 0
gain: 0
inverted: false
mute: false
- channel: 2
gain: 0
inverted: false
mute: false
- dest: 1
mute: false
sources:
- channel: 1
gain: 0
inverted: false
mute: false
- channel: 3
gain: 0
inverted: false
mute: false

filters:
preamp_gain:
type: Gain
parameters:
gain: -1
inverted: false
mute: false

cx1_hi:
parameters:
freq: 954.40
gain: -0.75
type: Lowshelf
q: 0.5
type: Biquad
cx1_lo:
parameters:
freq: 650
type: LowpassFO
type: Biquad
cx1_lo_gain:
type: Gain
parameters:
gain: -14.25
inverted: false

cx2_hi:
parameters:
freq: 824.70
gain: -1.4
type: Lowshelf
q: 0.5
type: Biquad
cx2_lo:
parameters:
freq: 650
type: LowpassFO
type: Biquad
cx2_lo_gain:
type: Gain
parameters:
gain: -10.92
inverted: false

cx3_hi:
parameters:
freq: 868.97
gain: -2
type: Lowshelf
q: 0.5
type: Biquad
cx3_lo:
parameters:
freq: 700
type: LowpassFO
type: Biquad
cx3_lo_gain:
type: Gain
parameters:
gain: -8
inverted: false

cx4_hi:
parameters:
freq: 873.89
gain: -2.25
type: Lowshelf
q: 0.5
type: Biquad
cx4_lo:
parameters:
freq: 700
type: LowpassFO
type: Biquad
cx4_lo_gain:
type: Gain
parameters:
gain: -6.75
inverted: false

cx5_hi:
parameters:
freq: 884.29
gain: -2.5
type: Lowshelf
q: 0.5
type: Biquad
cx5_lo:
parameters:
freq: 700
type: LowpassFO
type: Biquad
cx5_lo_gain:
type: Gain
parameters:
gain: -5.5
inverted: false

dt770_oratory1990_1:
parameters:
freq: 43
gain: -9.5
q: 0.25
type: Peaking
type: Biquad
dt770_oratory1990_2:
parameters:
freq: 90
gain: 2
q: 1.4
type: Peaking
type: Biquad
dt770_oratory1990_3:
parameters:
freq: 105
gain: 5.5
q: 0.71
type: Lowshelf
type: Biquad
dt770_oratory1990_4:
parameters:
freq: 210
gain: 6
q: 1.3
type: Peaking
type: Biquad
dt770_oratory1990_5:
parameters:
freq: 2550
gain: -0.8
q: 2
type: Peaking
type: Biquad
dt770_oratory1990_6:
parameters:
freq: 3800
gain: 3
q: 1
type: Peaking
type: Biquad
dt770_oratory1990_7:
parameters:
freq: 5100
gain: -1.4
q: 3
type: Peaking
type: Biquad
dt770_oratory1990_8:
parameters:
freq: 6450
gain: -2
q: 4
type: Peaking
type: Biquad
dt770_oratory1990_9:
parameters:
freq: 8300
gain: -1.9
q: 4
type: Peaking
type: Biquad
dt770_oratory1990_10:
parameters:
freq: 10000
gain: -2
q: 0.71
type: Highshelf
type: Biquad

pipeline:
- type: Filter
channel: 0
names:
- preamp_gain
- type: Filter
channel: 1
names:
- preamp_gain
- type: Mixer
name: 2to4cross
- channel: 0
names:
- cx3_hi
type: Filter
- channel: 1
names:
- cx3_lo
- cx3_lo_gain
type: Filter
- channel: 2
names:
- cx3_lo
- cx3_lo_gain
type: Filter
- channel: 3
names:
- cx3_hi
type: Filter
- type: Mixer
name: 4to2cross
- type: Filter
channel: 0
names:
- dt770_oratory1990_1
- dt770_oratory1990_2
- dt770_oratory1990_3
- dt770_oratory1990_4
- dt770_oratory1990_5
- dt770_oratory1990_6
- dt770_oratory1990_7
- dt770_oratory1990_8
- dt770_oratory1990_9
- dt770_oratory1990_10
- type: Filter
channel: 1
names:
- dt770_oratory1990_1
- dt770_oratory1990_2
- dt770_oratory1990_3
- dt770_oratory1990_4
- dt770_oratory1990_5
- dt770_oratory1990_6
- dt770_oratory1990_7
- dt770_oratory1990_8
- dt770_oratory1990_9
- dt770_oratory1990_10
 
If that is the case then it would explain the difference. @spfenwick are you running the binaries from GitHub?
I started off using the linux-amd64 binary from github and that gave me the high CPU load.

I've also tried compiling from source:
- Using the default settings gives high CPU load.
- Compiling with "RUSTFLAGS='-C target-cpu=x86-64' cargo build --release" gives high CPU load. I think that's the same as default.
- Compiling with "RUSTFLAGS='-C target-cpu=x86-64-v2' cargo build --release" gives CPU load more in line with v1.0.3
- Compiling with "RUSTFLAGS='-C target-cpu=x86-64-v3' cargo build --release" gives a binary that crashes on my PC

So target-cpu=x86-64-v2 seems to give a solution for my purposes, though presumably that makes the binary compatible with fewer old CPUs.
 
I'm seeing some weird results that make me think this is a pre-existing bug that is now being triggered in 2.0.0a3. I don't think it's related to sse4.2, though enabling sse4.2 seems to mask it for some reason.

This should be easy to reproduce. First generate a 16-bit stereo raw file, say with sox:
sox -V -r 48000 -n -b 16 -c 2 sin1k.raw synth 300 sin 1000 vol -10dB
Use this config file:
devices:
samplerate: 48000
chunksize: 2048
queuelimit: 1
capture:
type: Stdin
channels: 2
format: S16LE
playback:
type: File
channels: 2
filename: "/dev/null"
format: S16LE

filters:
preamp_gain:
type: Gain
parameters:
gain: 0
inverted: false
mute: false

filter1:
parameters:
freq: 43
gain: -9.5
q: 0.25
type: Peaking
type: Biquad

pipeline:
- type: Filter
channel: 0
names:
- preamp_gain
- type: Filter
channel: 1
names:
- preamp_gain
- type: Filter
channel: 0
names:
- filter1
- filter1
- filter1
- filter1
- filter1
- filter1
- filter1
- filter1
- filter1
- filter1
- type: Filter
channel: 1
names:
- filter1
- filter1
- filter1
- filter1
- filter1
- filter1
- filter1
- filter1
- filter1
- filter1
Time CamillaDSP v1.0.3 processing the file:
$ time ./camilladsp-1.0.3 config-gain.yml < sin1k.raw
2023-09-19 22:03:42.132408 INFO [src/bin.rs:711] CamillaDSP version 1.0.3
2023-09-19 22:03:42.132446 INFO [src/bin.rs:712] Running on linux, x86_64
2023-09-19 22:03:43.226724 INFO [src/bin.rs:420] Capture finished
2023-09-19 22:03:43.226874 INFO [src/bin.rs:410] Playback finished

real 0m1.098s
user 0m1.798s
sys 0m0.192s
Doing the same thing with v2.0.0a3 is tricker as it goes into a loop at the end of the file rather than terminating so I need to quickly hit Ctrl-C when it starts looping:
$ time ./camilladsp-2.0.0a3 config-gain.yml < sin1k.raw
2023-09-19 22:06:23.548472 INFO [src/bin.rs:690] CamillaDSP version 2.0.0-alpha3
2023-09-19 22:06:23.548493 INFO [src/bin.rs:691] Running on linux, x86_64
2023-09-19 22:06:27.550472 WARN [src/filedevice.rs:449] sample rate change detected, last rate was 1381731.3567070565 Hz
2023-09-19 22:06:28.551566 WARN [src/filedevice.rs:449] sample rate change detected, last rate was 1387024.768191978 Hz
2023-09-19 22:06:29.552479 WARN [src/filedevice.rs:449] sample rate change detected, last rate was 1385233.6130594476 Hz
2023-09-19 22:06:30.553721 WARN [src/filedevice.rs:449] sample rate change detected, last rate was 1388870.332724759 Hz
2023-09-19 22:06:31.554895 WARN [src/filedevice.rs:449] sample rate change detected, last rate was 1384869.1706424702 Hz
2023-09-19 22:06:32.555385 WARN [src/filedevice.rs:449] sample rate change detected, last rate was 1385816.6255723792 Hz
2023-09-19 22:06:33.555945 WARN [src/filedevice.rs:449] sample rate change detected, last rate was 1385743.855650237 Hz
2023-09-19 22:06:33.940457 INFO [src/bin.rs:390] Capture finished
2023-09-19 22:06:33.941769 INFO [src/bin.rs:378] Playback finished
2023-09-19 22:06:33.942108 INFO [src/bin.rs:390] Capture finished
2023-09-19 22:06:33.942143 INFO [src/bin.rs:378] Playback finished
2023-09-19 22:06:33.942372 INFO [src/bin.rs:390] Capture finished
...

real 0m10.759s
user 0m11.116s
sys 0m0.663s
The warnings are new and the CPU used is significantly higher - 11 seconds vs 1.8.

The reason I think this is a pre-existing bug is if I remove the Gain filter from the start of the pipeline I get similar results with v1.0.3.
The modified config file is:
devices:
samplerate: 48000
chunksize: 2048
queuelimit: 1
capture:
type: Stdin
channels: 2
format: S16LE
playback:
type: File
channels: 2
filename: "/dev/null"
format: S16LE

filters:
preamp_gain:
type: Gain
parameters:
gain: 0
inverted: false
mute: false

filter1:
parameters:
freq: 43
gain: -9.5
q: 0.25
type: Peaking
type: Biquad

pipeline:
- type: Filter
channel: 0
names:
- filter1
- filter1
- filter1
- filter1
- filter1
- filter1
- filter1
- filter1
- filter1
- filter1
- type: Filter
channel: 1
names:
- filter1
- filter1
- filter1
- filter1
- filter1
- filter1
- filter1
- filter1
- filter1
- filter1
And the results with v1.0.3:
$ time ./camilladsp-1.0.3 config-nogain.yml < sin1k.raw
2023-09-19 22:15:46.061027 INFO [src/bin.rs:711] CamillaDSP version 1.0.3
2023-09-19 22:15:46.061066 INFO [src/bin.rs:712] Running on linux, x86_64
2023-09-19 22:15:50.063601 WARN [src/filedevice.rs:446] sample rate change detected, last rate was 1392380.5396331032 Hz
2023-09-19 22:15:51.064191 WARN [src/filedevice.rs:446] sample rate change detected, last rate was 1399996.1421033302 Hz
2023-09-19 22:15:52.065452 WARN [src/filedevice.rs:446] sample rate change detected, last rate was 1394977.1528217047 Hz
2023-09-19 22:15:53.065904 WARN [src/filedevice.rs:446] sample rate change detected, last rate was 1394059.4980337278 Hz
2023-09-19 22:15:54.066315 WARN [src/filedevice.rs:446] sample rate change detected, last rate was 1392067.1810598 Hz
2023-09-19 22:15:55.066767 WARN [src/filedevice.rs:446] sample rate change detected, last rate was 1396104.2753803288 Hz
2023-09-19 22:15:56.067404 WARN [src/filedevice.rs:446] sample rate change detected, last rate was 1397894.6719907317 Hz
2023-09-19 22:15:56.379869 INFO [src/bin.rs:420] Capture finished
2023-09-19 22:15:56.381235 INFO [src/bin.rs:410] Playback finished

real 0m10.323s
user 0m10.883s
sys 0m0.240s
i.e. the same warnings that v2.0.0a3 gave and similarly high CPU usage. I have no idea why just removing a filter from the start of the pipeline would have this effect.

Both the binaries I'm using are straight off github.
 
  • Like
Reactions: 1 user
I'm trying to reproduce this but haven't had any luck so far. Unfortunately I don't have anything with an Atom CPU. It doesn't happen on the only x86 linux machine I have (a laptop with a first generation AMD Ryzen). I don't expect it to happen on ARM but I will try when I have a little time.

The sample rate change warnings are normal. They will come every few seconds, so when it runs fast it has already finished when the first one would have appeared.
 
This is what I get on 2.0.0 (after fixing that looping at the end, there will be a new release soon):
time_biquad.png

And I get the same on 1.0.3
 
This is really odd.

The problem I'm seeing is not specific to the Atom CPU. I first saw it there, but it also happens on my desktop, which is a Windows machine with an i5-9600k, running Linux under WSL2. And the same on an Intel NUC with an i5-5250u. All of those are running Ubuntu Server, so to see if it's an incompatibility with Ubuntu I tried setting up an Arch Linux VM and am getting the same behaviour there.

Could let me know what version of Linux are you running? I can try setting that up and see if the problem goes away.

When I get a chance I'll try to get my head around Linux performance profiling and see if I can track the problem down further.
 
Are you by any chance using an AMD CPU? If so that could explain our different results.

Manjaro linux didn't make any difference for me.

Running with "perf stat" gave some interesting information: The "good" performance scenario:
1695511433077.png


And the"bad" performance scenario:
1695511309616.png

I'm focusing on v1.0.3 because that way I can see the difference in performance with an identical codebase. The results with v2.0.0a3 are similar to the "bad" scenario.

This shows the number of CPU instructions executed is roughly the same, but the instructions per clock drops from 0.70 to 0.16. In other words, CamillaDSP is functioning the same but the CPU is running less efficiently in the second case. The results above are for the Atom N3700. The pattern was the same on other Intel CPUs though with overall performance being much higher.

I am way out of my depth trying to understand why a change in the config file would lead to this change in CPU performance. I wondered if it might be something to do with spectre/meltdown mitigations as some of those were supposed to have a big performance hit, so I tried setting "mitigations=off" in the kernel command line but that didn't make any difference. Some of the mitigations were also in CPU microcode but that's much harder to roll back to test. Anyway that's just a guess.

I'm getting this performance difference only on Linux. I tested under Windows and didn't see any difference.

As I indicated above, compiling with target-cpu=x86-64-v2 avoids this problem - at least on the scenarios I've tested.

The takeaway seems to be that on Linux machines with slightly older Intel CPUs, target-cpu=x86-64-v2 gives improved and more consistent performance. I don't have anything more recent than a 9th gen CPU, so I don't know what the effect is on more recent CPUs. I'd suggest adding this option to the standard compile options - at least on Linux. That would rule out some very old CPUs but should be compatible with anything since 1st gen Core i5/i7 (2008) and AMD Bulldozer (2011).
 
Yes I am using an AMD CPU. I don't have any Intel machine where I could test this, but if just setting target-cpu=x86-64-v2 solves it then this seems like a good idea.

Is anyone here running CamillaDSP on an older Atom CPU? Switching to x86-64-v2 would exclude Atoms made before 2014, for example the Atom N2700 & N2800, z2xxx etc.