Hi there,
The problem I am trying to solve is latency / audio delay for "lip sync" issues when playing video sources such as You Tube. That is - the video is ahead of the audio (so I need to either delay the video, or speed up audio).
I am constrained by the TV which only supports audio delay (I need video delay) and only for optical output.
I have a Raspberry Pi running Camilla DSP for an active crossover. This all works well.
Since the Pi (and Suptronics DAC) has no audio input, I have added a USB 5.1 soundcard - and using the line in (so I can connect to the TV). The Soundcard supports either RCA (analogue) line in or SPDIF
I need to loop the USB soundcard input - to the Camilla input device (currently using the loopback device loaded by snd-aloop).
I am using the alsaloop utility - with the lowest latency I can get away with without skipping complaints (50 usec) but the audio lag is still visible:
The Camilla DSP source device is hw:Loopback,1 (dmix:1 above) and USB card is hw:2,0
I am using dmix since I want to have dual sources - both MPD playing locally stored audio AND "The TV" when I finally plug it in.
I have yet to try a simple arecord | aplay pipe (not at computer).
I do not have an A/V receiver with the smarts to do video buffering / delay.
I'm just wanting to know if I'm heading down a dead end, or possible solutions.
The problem I am trying to solve is latency / audio delay for "lip sync" issues when playing video sources such as You Tube. That is - the video is ahead of the audio (so I need to either delay the video, or speed up audio).
I am constrained by the TV which only supports audio delay (I need video delay) and only for optical output.
I have a Raspberry Pi running Camilla DSP for an active crossover. This all works well.
Since the Pi (and Suptronics DAC) has no audio input, I have added a USB 5.1 soundcard - and using the line in (so I can connect to the TV). The Soundcard supports either RCA (analogue) line in or SPDIF
I need to loop the USB soundcard input - to the Camilla input device (currently using the loopback device loaded by snd-aloop).
I am using the alsaloop utility - with the lowest latency I can get away with without skipping complaints (50 usec) but the audio lag is still visible:
Code:
alsaloop -P dmix:1 -C hw:2,0 -S 5 -v -t 50000
The Camilla DSP source device is hw:Loopback,1 (dmix:1 above) and USB card is hw:2,0
I am using dmix since I want to have dual sources - both MPD playing locally stored audio AND "The TV" when I finally plug it in.
I have yet to try a simple arecord | aplay pipe (not at computer).
I do not have an A/V receiver with the smarts to do video buffering / delay.
I'm just wanting to know if I'm heading down a dead end, or possible solutions.
I do not think you will be able to solve this problem of audio lag. I couldn't. It's a weakness of computer based DSP processing of audio from a TV because there is typically no way to delay the video signal on the TV.
It may be possible to reduce the delay from camilladsp a bit. Not sure it will be enough, but worth a shot.
Can you post your config file?
Can you post your config file?
Thanks Charlie and Henrik
The latency is passable and I'm away from home but will post the Camilla config file when back.
I downloaded the arm7 binary and did not compile myself so unsure if it has the +neon rust cc optimisation
The latency is passable and I'm away from home but will post the Camilla config file when back.
I downloaded the arm7 binary and did not compile myself so unsure if it has the +neon rust cc optimisation
Do you really need dmix? That certainly introduces extra latency.
Just a note - the pipe in aplay | arecord typically introduces a large latency because the pipe blocking is designed for throughput performance (i.e. larger blocks).
Just a note - the pipe in aplay | arecord typically introduces a large latency because the pipe blocking is designed for throughput performance (i.e. larger blocks).
Thanks. Im only using dmix so I can either play from mpd as a source or switch to tv and not have and additional scripts that shuts down mpd to free the playback device (Camilla DSP loopback capture) then starts alsaloop (for sourcing the tv input to Camilla).
Thanks for the arecord|aplay explanation. That certainly makes sense and was evident in testing
Thanks for the arecord|aplay explanation. That certainly makes sense and was evident in testing
I understand your reason, but for lip sync the chain should be as simple as possible (avoiding unnecessary buffers) and even that will require optimizations.
@HenrikEnquist please find attached the Camilla config I am running. AS noted - I didn't compile CamillaDSP with the +neon Raspberry Pi optimization... or I need to check how I compiled.
I've also added the ALSA devices I am using below For reference, hw:0,1 is my HDMI output (multi-channel) and hw:2 is my USB soundcard (line in). hw:1 is the snd-aloop loopback device. I have to use dmix unfortunately as I want seamless MPD vs. USB5.1 sound card in support.. unless I use MPD to stream from the USB as a source itself... will look into it.
I am able to run alsaloop as a systemd service with this command
If I try a smaller period size than 1,024 for capture / playback devices or < 40msec latency, alsaloop either pops/crackles or goes to 100% CPU
The above is about as small a latency as I have been able to manage, without pops/crackles and alsaloop going into thermal runaway.
I've also added the ALSA devices I am using below For reference, hw:0,1 is my HDMI output (multi-channel) and hw:2 is my USB soundcard (line in). hw:1 is the snd-aloop loopback device. I have to use dmix unfortunately as I want seamless MPD vs. USB5.1 sound card in support.. unless I use MPD to stream from the USB as a source itself... will look into it.
Code:
# playback PCM device: using loopback subdevice 0,0
# Don't use a buffer size that is too small. Some apps
# won't like it and it will sound crappy
pcm.amix {
type dmix
ipc_key 219347
hw_ptr_alignment "roundup"
# If alsaloop service starts before MPD or MPD is paused, then
# root will take permissions and not give MPD any... so grant 666 so any
# service can share
ipc_perm 0666
slave {
pcm "loophw00"
period_size 1024
periods 2
}
}
# hardware 0,0 : used for ALSA playback (linked to hw:1,0 loopback
# capture for Camilla DSP)
pcm.loophw00 {
type hw
card Loopback
device 0
subdevice 0
format S16_LE
rate 44100
}
pcm.asnoop {
type dsnoop
ipc_key 219347
hw_ptr_alignment "roundup"
slave {
pcm "usb_51"
period_size 1024
periods 2
}
}
# hardware 2,0 : used for ALSA loopback capture (USB card line in)
pcm.usb_51 {
type hw
card 2
device 0
format S16_LE
rate 48000
}
I am able to run alsaloop as a systemd service with this command
Code:
/usr/bin/alsaloop -P amix -C asnoop -S 5 -t 40000
If I try a smaller period size than 1,024 for capture / playback devices or < 40msec latency, alsaloop either pops/crackles or goes to 100% CPU
The above is about as small a latency as I have been able to manage, without pops/crackles and alsaloop going into thermal runaway.
Attachments
Ok there is room for improvement!
First a little cleanup, queue_limit isn't needed and doesn't help. Just remove it.
Then, reduce the chunksize. 2048 is much larger than you need and adds quite some latency.Try 512, and if that works fine, try 256.
Finally, there is target_level. You want it low for minimum latency, but that increases the risk of underruns. You'll need to experiment to find the lowest value that runs reliably at each chunksize.
The +neon flag doesn't make a lot of difference, just a few percent of speedup, not important for latency. The pre-built binary has it enabled.
First a little cleanup, queue_limit isn't needed and doesn't help. Just remove it.
Then, reduce the chunksize. 2048 is much larger than you need and adds quite some latency.Try 512, and if that works fine, try 256.
Finally, there is target_level. You want it low for minimum latency, but that increases the risk of underruns. You'll need to experiment to find the lowest value that runs reliably at each chunksize.
The +neon flag doesn't make a lot of difference, just a few percent of speedup, not important for latency. The pre-built binary has it enabled.
Thanks @HenrikEnquist . I got this error with a chunksize: 512:
2022-11-28 18:38:59.547071 WARN [src/alsadevice.rs:196] Playback device failed while waiting for available buffer space, error: ALSA function 'snd_pcm_wait' failed with error 'EPIPE: Broken pipe'
2022-11-28 18:38:59.547819 ERROR [src/bin.rs:344] Playback error: ALSA function 'snd_pcm_wait' failed with error 'EPIPE: Broken pipe'
2022-11-28 18:38:59.548260 DEBUG [src/bin.rs:352] Wait for capture thread to exit..
2022-11-28 18:38:59.549222 WARN [src/alsadevice.rs:157] Prepare playback after buffer underrun
The minimum camilla seemed to be happy with was 1,024. I removed queuelimit.
If I tried a target_level: 512 (half chunksize) I got popping (indicating underruns), so omitted this value and let Camilla default it and all seems ok
I then got a couple of fade out/ins in a 3 minute playing period which I assume Camilla does on buffer underruns.
I reverted back to a chunksize of 2048 = all stable / good. I am feeding the loopback using the
I have no easy way to measure latency through the devices,
2022-11-28 18:38:59.547071 WARN [src/alsadevice.rs:196] Playback device failed while waiting for available buffer space, error: ALSA function 'snd_pcm_wait' failed with error 'EPIPE: Broken pipe'
2022-11-28 18:38:59.547819 ERROR [src/bin.rs:344] Playback error: ALSA function 'snd_pcm_wait' failed with error 'EPIPE: Broken pipe'
2022-11-28 18:38:59.548260 DEBUG [src/bin.rs:352] Wait for capture thread to exit..
2022-11-28 18:38:59.549222 WARN [src/alsadevice.rs:157] Prepare playback after buffer underrun
The minimum camilla seemed to be happy with was 1,024. I removed queuelimit.
If I tried a target_level: 512 (half chunksize) I got popping (indicating underruns), so omitted this value and let Camilla default it and all seems ok
I then got a couple of fade out/ins in a 3 minute playing period which I assume Camilla does on buffer underruns.
I reverted back to a chunksize of 2048 = all stable / good. I am feeding the loopback using the
I have no easy way to measure latency through the devices,
My new challenge - alsaloop going to 100% CPU on the first buffer underrun. I need it to gracefully recover.
I dunno if you can use it instead, or if it's applicable, but pipewire has way less overhead than Pulse.
https://pipewire.org/
https://pipewire.org/
Thanks - I'm running an older alsa stack (I have a lightweight archlinux implementation and rolling upgrades are a problem) and the only packaged pipewire implementation is quite old and likely lacking features.I dunno if you can use it instead, or if it's applicable, but pipewire has way less overhead than Pulse.
https://pipewire.org/
I can't build the latest as my meson version just misses the version cut - upgrading that causes a chain of compatibility destruction...
In anycase - I am looking at a direct option... I'm not using pulse or jack (which layer on ALSA?) so unsure if pipewire is an ALSA alternative. I know I'll need to at least present ALSA devices to the software I am using.
Again - dmix is a very complex code which runs within the calling process (it's just a library call). It must keep locks between processes to allow mixing in the library. It was never designed for low latency, and your xruns can be caused by it. It's not being developed anymore, only maintenance mode.
That's the first thing I would recommend to get rid of.
Yes, alsaloop ends up consuming 100% CPU at xrun recovery sometimes, it's a bug nobody has fixed yet. It's source code is quite complex as it does everything in one thread only.
For your usecase of low-latency routing pipewire could be the best alternative.
That's the first thing I would recommend to get rid of.
Yes, alsaloop ends up consuming 100% CPU at xrun recovery sometimes, it's a bug nobody has fixed yet. It's source code is quite complex as it does everything in one thread only.
For your usecase of low-latency routing pipewire could be the best alternative.
@phofman thanks for this.Again - dmix is a very complex code which runs within the calling process (it's just a library call). It must keep locks between processes to allow mixing in the library. It was never designed for low latency, and your xruns can be caused by it. It's not being developed anymore, only maintenance mode.
That's the first thing I would recommend to get rid of.
Yes, alsaloop ends up consuming 100% CPU at xrun recovery sometimes, it's a bug nobody has fixed yet. It's source code is quite complex as it does everything in one thread only.
For your usecase of low-latency routing pipewire could be the best alternative.
I hadn't realised the dmix plug-in was essentially deprecated.
I have had success increasing period_size to both dmix and dsnoop devices to 8192. This has allowed latency to drop to 10msec and is quite watchable.
My only problem now is alsaloop dying after about 5 minutes. Before it was a playback xrun that alsaloop never recovered from, now it is due to a failure to initialise one of the loopback threads - call to pcmjob_pollfds_init() that kills alsaloop.
I might have to bite the bullet and do a full system upgrade and rebuild the archphile recipe on the latest archlinux release to get pipewire. It is disappointing as I feel I am so close to a working system with everything I need.
So... my problem was self inflicted. Mixed versions is what caused problems.
Doing a full system upgrade then recompiling dependent utilities and MPD seems to have stablised things.
I tried to get pipewire and wireplumber running - but had unhelpful errors. I also didn't see the need to run some layer on top of alsa. I could only see it adding to resource consumption and hampering latency. The documentation is also quite immature (expected for new software) so until they have a much easier "point and shoot" installation approach, I'll defer investigation.
Doing a full system upgrade then recompiling dependent utilities and MPD seems to have stablised things.
I tried to get pipewire and wireplumber running - but had unhelpful errors. I also didn't see the need to run some layer on top of alsa. I could only see it adding to resource consumption and hampering latency. The documentation is also quite immature (expected for new software) so until they have a much easier "point and shoot" installation approach, I'll defer investigation.
There's a lot of information on pipewire in the Arch wiki.
If you run Manjaro, you can switch to pipewire by installing manjaro-pipewire.
Latest Ubuntu and derivatives and Fedora use pipewire by default now.
If you run Manjaro, you can switch to pipewire by installing manjaro-pipewire.
Latest Ubuntu and derivatives and Fedora use pipewire by default now.
Thanks for the reply. I'm sorry I don't mean to be disrespectful to pipewire.There's a lot of information on pipewire in the Arch wiki.
If you run Manjaro, you can switch to pipewire by installing manjaro-pipewire.
Latest Ubuntu and derivatives and Fedora use pipewire by default now.
I'm using Archlinux - not because I think it is better or anything, just familiarity and it is my current music server.
I suppose I was disappointed at the archlinux package not being consumer friendly as per the pipewire goals. installation was not well documented and configuration problematic. I'm sure things will improve as pipewire takes over the various distributions with more end user feedback.
For now - pure alsa and alsaloop using v.1.2.8 seems to be working reliably. So without any layering or servers to worrry about, it will meet my needs for now.
Ya it's true... But I think it's just a matter of removing the pulse stuff, installing the pipewire stuff, wireplumber, and then enable the "job" in systemd.
I'm sure you've seen this wiki?
https://wiki.archlinux.org/title/PipeWire
I'm sure you've seen this wiki?
https://wiki.archlinux.org/title/PipeWire
thanks - yes read the wiki. to be fair I had not done a full system upgrade. I think the wireplumber connect problems were due to that. I might retry in due course.
- Home
- Source & Line
- PC Based
- Solutions to solve audio processing delay (ALSA/Linux)