Solutions to solve audio processing delay (ALSA/Linux)

Hi there,

The problem I am trying to solve is latency / audio delay for "lip sync" issues when playing video sources such as You Tube. That is - the video is ahead of the audio (so I need to either delay the video, or speed up audio).

I am constrained by the TV which only supports audio delay (I need video delay) and only for optical output.

I have a Raspberry Pi running Camilla DSP for an active crossover. This all works well.

Since the Pi (and Suptronics DAC) has no audio input, I have added a USB 5.1 soundcard - and using the line in (so I can connect to the TV). The Soundcard supports either RCA (analogue) line in or SPDIF

I need to loop the USB soundcard input - to the Camilla input device (currently using the loopback device loaded by snd-aloop).

I am using the alsaloop utility - with the lowest latency I can get away with without skipping complaints (50 usec) but the audio lag is still visible:

Code:
alsaloop -P dmix:1 -C hw:2,0 -S 5 -v -t 50000

The Camilla DSP source device is hw:Loopback,1 (dmix:1 above) and USB card is hw:2,0

I am using dmix since I want to have dual sources - both MPD playing locally stored audio AND "The TV" when I finally plug it in.

I have yet to try a simple arecord | aplay pipe (not at computer).

I do not have an A/V receiver with the smarts to do video buffering / delay.

I'm just wanting to know if I'm heading down a dead end, or possible solutions.
 
Do you really need dmix? That certainly introduces extra latency.

Just a note - the pipe in aplay | arecord typically introduces a large latency because the pipe blocking is designed for throughput performance (i.e. larger blocks).
 
Thanks. Im only using dmix so I can either play from mpd as a source or switch to tv and not have and additional scripts that shuts down mpd to free the playback device (Camilla DSP loopback capture) then starts alsaloop (for sourcing the tv input to Camilla).

Thanks for the arecord|aplay explanation. That certainly makes sense and was evident in testing
 
@HenrikEnquist please find attached the Camilla config I am running. AS noted - I didn't compile CamillaDSP with the +neon Raspberry Pi optimization... or I need to check how I compiled.

I've also added the ALSA devices I am using below For reference, hw:0,1 is my HDMI output (multi-channel) and hw:2 is my USB soundcard (line in). hw:1 is the snd-aloop loopback device. I have to use dmix unfortunately as I want seamless MPD vs. USB5.1 sound card in support.. unless I use MPD to stream from the USB as a source itself... will look into it.

Code:
# playback PCM device: using loopback subdevice 0,0
# Don't use a buffer size that is too small. Some apps
# won't like it and it will sound crappy
pcm.amix {
  type dmix
  ipc_key 219347
  hw_ptr_alignment "roundup"
  # If alsaloop service starts before MPD or MPD is paused, then
  # root will take permissions and not give MPD any... so grant 666 so any
  # service can share
  ipc_perm 0666
  slave {
    pcm "loophw00"
    period_size 1024
    periods 2
  }
}

# hardware 0,0 : used for ALSA playback (linked to hw:1,0 loopback
# capture for Camilla DSP)
pcm.loophw00 {
  type hw
  card Loopback
  device 0
  subdevice 0
  format S16_LE
  rate 44100
}


pcm.asnoop {
  type dsnoop
  ipc_key 219347
  hw_ptr_alignment "roundup"
  slave {
    pcm "usb_51"
    period_size 1024
    periods 2
  }
}

# hardware 2,0 : used for ALSA loopback capture (USB card line in)
pcm.usb_51 {
  type hw
  card 2
  device 0
  format S16_LE
  rate 48000
}

I am able to run alsaloop as a systemd service with this command


Code:
 /usr/bin/alsaloop -P amix -C asnoop -S 5 -t 40000

If I try a smaller period size than 1,024 for capture / playback devices or < 40msec latency, alsaloop either pops/crackles or goes to 100% CPU

The above is about as small a latency as I have been able to manage, without pops/crackles and alsaloop going into thermal runaway.
 

Attachments

Ok there is room for improvement!
First a little cleanup, queue_limit isn't needed and doesn't help. Just remove it.
Then, reduce the chunksize. 2048 is much larger than you need and adds quite some latency.Try 512, and if that works fine, try 256.
Finally, there is target_level. You want it low for minimum latency, but that increases the risk of underruns. You'll need to experiment to find the lowest value that runs reliably at each chunksize.

The +neon flag doesn't make a lot of difference, just a few percent of speedup, not important for latency. The pre-built binary has it enabled.
 
Thanks @HenrikEnquist . I got this error with a chunksize: 512:


2022-11-28 18:38:59.547071 WARN [src/alsadevice.rs:196] Playback device failed while waiting for available buffer space, error: ALSA function 'snd_pcm_wait' failed with error 'EPIPE: Broken pipe'
2022-11-28 18:38:59.547819 ERROR [src/bin.rs:344] Playback error: ALSA function 'snd_pcm_wait' failed with error 'EPIPE: Broken pipe'
2022-11-28 18:38:59.548260 DEBUG [src/bin.rs:352] Wait for capture thread to exit..
2022-11-28 18:38:59.549222 WARN [src/alsadevice.rs:157] Prepare playback after buffer underrun

The minimum camilla seemed to be happy with was 1,024. I removed queuelimit.
If I tried a target_level: 512 (half chunksize) I got popping (indicating underruns), so omitted this value and let Camilla default it and all seems ok

I then got a couple of fade out/ins in a 3 minute playing period which I assume Camilla does on buffer underruns.

I reverted back to a chunksize of 2048 = all stable / good. I am feeding the loopback using the

I have no easy way to measure latency through the devices,
 
I dunno if you can use it instead, or if it's applicable, but pipewire has way less overhead than Pulse.
https://pipewire.org/
Thanks - I'm running an older alsa stack (I have a lightweight archlinux implementation and rolling upgrades are a problem) and the only packaged pipewire implementation is quite old and likely lacking features.

I can't build the latest as my meson version just misses the version cut - upgrading that causes a chain of compatibility destruction...

In anycase - I am looking at a direct option... I'm not using pulse or jack (which layer on ALSA?) so unsure if pipewire is an ALSA alternative. I know I'll need to at least present ALSA devices to the software I am using.
 
  • Like
Reactions: kodabmx
Again - dmix is a very complex code which runs within the calling process (it's just a library call). It must keep locks between processes to allow mixing in the library. It was never designed for low latency, and your xruns can be caused by it. It's not being developed anymore, only maintenance mode.

That's the first thing I would recommend to get rid of.

Yes, alsaloop ends up consuming 100% CPU at xrun recovery sometimes, it's a bug nobody has fixed yet. It's source code is quite complex as it does everything in one thread only.

For your usecase of low-latency routing pipewire could be the best alternative.
 
Again - dmix is a very complex code which runs within the calling process (it's just a library call). It must keep locks between processes to allow mixing in the library. It was never designed for low latency, and your xruns can be caused by it. It's not being developed anymore, only maintenance mode.

That's the first thing I would recommend to get rid of.

Yes, alsaloop ends up consuming 100% CPU at xrun recovery sometimes, it's a bug nobody has fixed yet. It's source code is quite complex as it does everything in one thread only.

For your usecase of low-latency routing pipewire could be the best alternative.
@phofman thanks for this.

I hadn't realised the dmix plug-in was essentially deprecated.

I have had success increasing period_size to both dmix and dsnoop devices to 8192. This has allowed latency to drop to 10msec and is quite watchable.

My only problem now is alsaloop dying after about 5 minutes. Before it was a playback xrun that alsaloop never recovered from, now it is due to a failure to initialise one of the loopback threads - call to pcmjob_pollfds_init() that kills alsaloop.

I might have to bite the bullet and do a full system upgrade and rebuild the archphile recipe on the latest archlinux release to get pipewire. It is disappointing as I feel I am so close to a working system with everything I need.
 
So... my problem was self inflicted. Mixed versions is what caused problems.

Doing a full system upgrade then recompiling dependent utilities and MPD seems to have stablised things.

I tried to get pipewire and wireplumber running - but had unhelpful errors. I also didn't see the need to run some layer on top of alsa. I could only see it adding to resource consumption and hampering latency. The documentation is also quite immature (expected for new software) so until they have a much easier "point and shoot" installation approach, I'll defer investigation.
 
There's a lot of information on pipewire in the Arch wiki.
If you run Manjaro, you can switch to pipewire by installing manjaro-pipewire.
Latest Ubuntu and derivatives and Fedora use pipewire by default now.
Thanks for the reply. I'm sorry I don't mean to be disrespectful to pipewire.

I'm using Archlinux - not because I think it is better or anything, just familiarity and it is my current music server.

I suppose I was disappointed at the archlinux package not being consumer friendly as per the pipewire goals. installation was not well documented and configuration problematic. I'm sure things will improve as pipewire takes over the various distributions with more end user feedback.

For now - pure alsa and alsaloop using v.1.2.8 seems to be working reliably. So without any layering or servers to worrry about, it will meet my needs for now.