Can low jitter be achieved with STM32 microcontroller

JMF11 · 2016-08-31 9:27 pm

Hi,

I struggle to make this Async USB work. I have somehow progressed, but not finalized.

I decided to spend some time to push the code to GitHub, in case someone would be interested and would like to help/work on the topic.

The code is here: https://github.com/jmf13/USB_Async_DAC

I try to will a wiki with all information collected, and especially threads about the implementation of Async USB:
https://github.com/jmf13/Const_DSP_I2S_DAC/wiki
https://github.com/jmf13/Const_DSP_I2S_DAC/wiki/Async-USB-inspiration

I'm sure that there is not much missing...

JMF

JMF11 · 2016-09-08 9:53 pm

Small update,

I have been working hard to try to make the USB Async function work on the Stm32. After long investigatiosn, looking at several relevant examples on other platforms, it appears that the Stm32 makes the implementation of this feature a bit more complex that on other MCUs (bad luck... as it is one of the best ones for multi synchronized outputs):
This implementation seems especially difficult on the Stm32 with proposed libs as:
- the Stm32 has a behaviour that seems specific compared to other MCU: all pending (not completed) transfers are automatically canceled at end of frame. This is (seems) impossible to avoid in case of feedback transmit to the host. It needs dedicated low level registers management and a callback to be informed of the situation,
- le usbd libraries, and especially the usbd_core don't implement important callbacks.

So you can't rely on the high level USB libraries only. You have to deal with lower level libraries or modify the higher level ones, which I wanted to avoid.

So not lost, but more difficult than expected.

The good thing is that I now understand better the underlying mecanisms USB spec...

JMF

steph_tsf · 2016-09-09 9:20 am

Hi JMF11, there are people around like myself, wanting to evaluate the STM32 Audio potential, basing on a simple and easy to use framework that's taking full advantage of the latest STM32CubeMX configuration software.

This is in the context of most STM32 USB Audio 1.0 software examples dating back from 2012 or so, before the advent of STM32CubeMX. I find them nearly unusable. In such category, we have the STM32F4 USB Audio 1.0 example relayed by tjaekel here : STMF4 Discovery USB Sound Card

The situation improves when considering the STM32F7 USB Audio 1.0 software examples dating back from June 2015, again relayed by tjaekel : STM32F7 Discovery Sound Card
Being more usable, those examples still look overcomplicated as they rely on CMSIS (batch audio DSP that we don't require this at this stage), and as they consist of many source files and ST Middlewares, including a GUI (Graphical User Interface).

Therefore, can you please build and publish a
- single I2S-in (slave mode)
- sample-by-sample Audio DSP
- twin I2S-out (slave mode)
digital audio framework designed for running on a STM32F7 or STM32F4 board?

This would enable people to rely on a
- SPDIF-in to I2S assembled board (CS8412, or miniDSP DIGI-FP SRC4382),
- USB Audio to I2S assembled board (PCM2706, CM6631A, or miniDSP miniStreamer XMOS)
for injecting high quality audio into the system.

This would enable people to rely on a
- DAC assembled board (PCM5102A)
for feeding a couple of power amplifiers.

In such context, at this stage, the answer to the question "Can low jitter be achieved with STM32 microcontroller" appears to be "no", basically.

IMO, you have completed such framework, but you don't publish it yet because your mind can't endorse such "no" answer. Let me say that you are wrong, in associating such "no" answer, to some failure. It is not a failure. Come on, and move on.

First of all, you shall build and publish the above framework, I mean the I2S-in / sample-by-sample audio DSP / I2S out, with the STM32 appearing as audio clock slave and I2S slave. This is not at all regressive, at the condition that you achieve a simple and easy to use framework that's taking full advantage of the latest STM32CubeMX configuration software.

Then, you shall build and publish a more elaborated framework, basing on the above framework, duly modified for taking advantage of the STM32F4 or STM32F7 USB interface, exploited as plain USB Audio 1.0, without trying to make it adaptive what's regarding the audio throughput. Like everybody, you'll end up duplicating or skipping an audio sample here and there. Unlike everybody, you shall light a LED each time you are forced to duplicate, and you shall light another LED each time you are forced to skip. This way users will visualize the plain USB Audio 1.0 quality limitation. This being said, you shall support two different clocking schemes. The first clocking scheme shall rely on the CPU clock, that you manipulate using the STM32 PLL for getting a 256 x Fs clock to be used as local master clock, feeding the I2S-out. At this stage, do not waste your time, fiddling with the PLL for adaptatively speeding up or slowing down the I2S according to the USB audio throughput. The second clocking scheme shall rely on two high quality quartz oscillators each having an "enable pin", one for the 44.1 family, and the other for the 48 family. Such external clock (possibly 256 x Fs) to be used as local master clock, feeding the I2S-out. At the end of the day, you shall tell the cost of the Audio USB 1.0, in terms of computing power. In case it consumes less than 10 MIPS, and doesn't prevent the audio DSP to be zero latency sample-by-sample, and only lights up the "duplicate/skip" LEDs two or three times per minute, that's not the end of the world, especially if such result can be attained using the 44.1 family and the 48 family high quality quartz oscillators.
Now read carefully. Bear in mind that the GPIO pins feeding the "duplicate/skip" LEDs can be exploited for sending a positive or negative current into an integrating capacitor, for generating a smooth analog voltage to be applied to a varicap diode, finely tuning the oscillation frequency of the external quartz acting as audio clock. This way, people will convert your plain USB Audio 1.0 framework into "audiophile" stuff. This is indeed a bruteforce method for achieving 1) a low jitter with the STM32 microcontroller and 2) no Audio USB sample duplication of skip.

Now comes the "piece of resistance" : the adaptive USB Audio 1.0 simulating some Async USB Audio arrangement. You may succeed, or not succeed. Because of what's exposed above, there is no rush. You have plenty of time, and in case one needs to wait until next summer, that's not an issue.

JMF11 · 2016-09-09 12:31 pm

Hi Steph,

Good to see that some doesn't consider that current stage is an failure 🙂. I agree with you that there is still hope. I'm just disappointed that in spite of much more time than expected, this USN Async feature in not working yet and will need additional efforts.

I only try to progress in my very little spare hobby time. And this already eats a lot of focus and brain availability. I can't allocate more time that what would make my own need progress.

However, all useful code I developed is published in https://github.com/jmf13/USB_Async_DAC and REX collected in https://github.com/jmf13/Const_DSP_I2S_DAC/wiki . This should help anybody willing to have a try or gets hands dirty. Be sure that I also will explain if needed what I did.

Prerequisite is to install System Workbench for STM32, and learn about how to use it.

For sure, several people working together on those topics would help go faster.

I won't work on the I2S input but I'm sure that my code can help has a starting point. I hope that somebody will do.

About the sample by sample strategy, I understand your point, but I feel that it fill a very specific need that does not provides the best balance for the general case. It will reduce a lot the benefits of DMA transfers and increase the payload on the CPU, at the expense of the DSP processing power.

I will be happy to work on code for a dedicated audio clock as soon as somebody else will provide a way for me to easily source such a clock as existing or easy to assemble component. I won't before such thing is not available. I also believe that the 8 Mhz clock can do a lot.

About the multi channel, I need to procure first a stm32F7 nucleo board as the stm32F4 discovery does not provides several SAI that can be synchronized together.

I'm now focussing on the USB Async, because I consider that it is a must have, and because of aesthetic reasons, I don't see the purpose to have additional hardware there (KISS principle). This USB stuff is software stuff. MCU should be the best implementation...

By the way, I will continue to update the status here and publish code if significant progress.

JMF

JMF11 · 2016-09-11 8:27 am

Small good news: yesterday, I started to use a Linux machine to interface with the stm32, instead of the windows machine... and it behaves differently. I now can get shorter or longer USB frames according to the feedback I provide. This is still not perfect as what I get when I ask for more samples is strange...

But this is a step forward 🙂

JMF

TNT · 2016-09-11 9:07 am

If I2S in: slave 32 bit 192 khz -> USB would be possible I would be interested.

//

JMF11 · 2016-09-11 10:06 am

Sorry, It may be possible to achieve, but I'm not working on that part.

The chip has the potential, but the manufacturer only provides at the moment libraries/examples dealing with 16 bits and 48 kHz.

JMF

camelator · 2016-09-12 7:34 am

JMF11 said:
Sorry, It may be possible to achieve, but I'm not working on that part.

The chip has the potential, but the manufacturer only provides at the moment libraries/examples dealing with 16 bits and 48 kHz.

JMF

Hi JMF11,
Do you think STM32F4 is enough faster to run convolution ?

steph_tsf · 2016-09-12 7:54 am

JMF11 said:
I started to use a Linux machine to interface with the stm32, instead of the windows machine... and it behaves differently. I now can get shorter or longer USB frames according to the feedback I provide.

Please read this : https://github.com/borgestrand/sdr-widget/blob/audio-widget-experimental/AW_readme.txt

It says :

The projects implement Asyncronous USB Audio in a generic MCU, the AT32UC3A3 from Atmel.

The project started out as the SDR Widget, an open source system of firmware, hardware and programs for HAMs / Radio Amateurs.
Late 2010 work was initiated to extract and reuse parts of the project for Hi-Fi / Hi-End audio purposes. This project was named Audio Widget (AW).
Please note that implementations of the Audio Widget and its drivers come under many names. Depending on your firmware version it may appear under names containing "QNKTC", "Yoyodyne", "Audio Widget", "SDR Widget", "ASIO UAC2", "DG8SAQ".

...

SB Audio Class 1 is the default operating mode of the AB-1.2. It is plug-and-play on Windows, Linux and Mac. UAC1 supports sample rates up to 24/48. One might believe 24/96 is the UAC1 limit, but that is not the case with the Atmel AVR32 MCU the project is built on. Their chip has endpoint buffer size limited to 512 bytes. 1kbyte would allow 24/96 on UAC1. UAC1 uses USB 1.1 (Full Speed). UAC1 uses asynchronous USB audio with the DAC as the timing reference.

...

Listening to audio UAC1 - Windows

The Audio Widget should show up in Device Manager under "Sound, video and game controllers".
Usually, your newly plugged in audio device will become the new default device. But that is not necessarily always the case. To make sure the Audio Widget is the default playback (and / or communication) device, do as follows:

- Right-click the little speaker icon in the bottom-right corner
- Left-click Playback devices
- Right-click DG8SAQ-I2C / Audio Widget / QNKTC... / Yoyodyne...
- Left-click Set as Default Device
- To use with Skype etc. left-click Set as Default Communication Device

On Windows 7 make sure you're sampling at 44.1 (or 48ksps) depending on your music. 44.1kHz is the sampling frequency of CDs and the most likely sampling rate used. That way you bypass the OS's built-in sample rate converter which would add artefacts to the sound. Some players may take exclusive control, but don't count on it.

- Left-click the little speaker icon in the bottom-right corner
- Left-click the icon on top of the volume control
- The Output Properties window should appear.
- Click Advanced
- Choose Default Format = 44100 from the pull-down menu
- Tick all Allow exclusive options
- Click Apply
- Click Enhancements
- Disable all sound effects
- Click Apply and close the window

...

Installing drivers - Windows

Driver installation is only needed on Windows.
You will need to download and install AWSetup.zip from: https://sites.google.com/site/nikkov/home/microcontrollers/audio-widget/

...

I have found that it is very useful to refer to the source code of the Linux uac2 driver and Apple's USBAudio driver when there are doubts about how the uac2 specs is interpreted. There are areas where the specs are not entirely clear and you have to experiment with the actual drivers to find out what works and what doesn't. For example, under Windows uac1, you have to provide rate feedback values that the driver expects. The delta between one feedback rate and the next feedback rate has to exceed a certain minimum - otherwise it will be ignored completely. Under Linux uac2, it has automatic rate feedback format detection and the way it changes sampling rate is different from OSX:

The difference is just that OSX does:
1 - SetAltInterface(0)
2 - SET_CUR freq
3 - SetAltInterface(1)
4 - Start streaming
and between 1 and 4, there's a time gap of ~740ms.

Linux does:
1 - SetAltInterface(0)
2 - SetAltInterface(1)
3 - SET_CUR freq
4 - Start streaming
and between 1 and 4, there's a time gap of only ~11ms.

Note that Linux sets the alt interface to 1 first before changing the freq, and starts streaming within a very short time. This caused trouble at the firmware as the rate feedback is still based on the OLD sampling rate before things settle down to the new sampling rate. The OSX way of doing things is more gentlemanly 🙂 However, the Linux developer thinks his way is the correct way so we have to deal with this quirk 🙂

See the Fact Sheet : http://www.henryaudio.com/uploads/fact_sheet_mkII.pdf

Hope this helps,
Steph

JMF11 · 2016-09-12 8:52 am

camelator said:
Hi JMF11,
Do you think STM32F4 is enough faster to run convolution ?

Hi,

I can't say... The instructions for convolution are provided:
https://www.keil.com/pack/doc/CMSIS/DSP/html/group___convolution_example.html
https://www.keil.com/pack/doc/CMSIS/DSP/html/group___conv.html

Stm32F7 should be better with better DSP implementation, and not more expensive as Nucleo boards.

JMF

JMF11 · 2016-09-12 8:56 am

steph_tsf said:
Please read this : https://github.com/borgestrand/sdr-widget/blob/audio-widget-experimental/AW_readme.txt

It says :

The projects implement Asyncronous USB Audio in a generic MCU, the AT32UC3A3 from Atmel.

...

See the Fact Sheet : http://www.henryaudio.com/uploads/fact_sheet_mkII.pdf

Hope this helps,
Steph

Thanks Steph, this is a very good resource that I use regularly. I think that it is the best open source example of USB Async Code. It appears not portable to my app (RTOS + different API for USB), but it is definitely very inspiring.

My next step if to implement a "dirty" feedback strategy to convince me that I can get the music with no glitches by adjusting the flow.

Now the I can have a responsive host, I will be able to move forward.

However, I don't know what fails with W10...

JMF

abraxalito · 2016-09-12 9:17 am

camelator said:
Do you think STM32F4 is enough faster to run convolution ?

If convolution is the primary function you're wanting I'd not choose the ARM M4 as its not really optimised for that (convolution in my understanding is a long FIR filter). It doesn't have the architecture for efficiently running FIR filters - it can manage the multiplies well enough but not the parallel data moves. That's not to say it can't run them but you won't approach the performance of a dedicated DSP at the same clock rate.

steph_tsf · 2016-09-12 11:44 am

Hi abraxalito, a trivial 102-tap FIR filter routine could be something like this, relying on the MADD instruction and the HILO register pair. It needs to run each time there is a new audio sample available.

Code:

Acc_Reg =  (long long) C000*S000;
Acc_Reg += (long long) C001*S001;
Acc_Reg += (long long) C002*S002;
...
Acc_Reg += (long long) C100*S100;
Acc_Reg += (long long) C101*S001;
Acc_Reg_Shift;
Sout_Assignment;

"Cxxx" are the FIR filter coefficients (possibly from Flash memory)
"Sxxx" are the consecutive audio samples (from RAM)
"Sout" would contain the resulting filtered audio sample

The "Cxxx" coefficients shall be read using a pointer incrementing from 0 to 101.
The "Sxxx" audio samples shall be read using a pointer implementing a circular buffer, having a depth of 102.

Upon completing the FIR filter routine, the coefficients pointer shall be reset, and the audio sample pointer shall remain as is.

Each MADD instruction executes in 10 nanosecond or so.
Ask yourself if this is for multiplying and accumulating two 32-bit data words already sitting in the required ALU registers, or if this is for multiplying and accumulating 32-bit data words possibly sitting in Flash or RAM.
Imagine the data and the coefficients are in Flash or RAM.
Imagine there is only one data bus between the Flash/RAM and the ALU.

Many Digital Signal Processors feature :

a way to auto-increment a pointer,
three independent data buses, three independent RAM planes for simultaneously reading two new operands and fetching a new instruction,
a zero-overhead loop arrangement in case of tight loops, more efficient than unrolling the code.

Consider the STM32F4 architecture : the ALU connects to the I-BUS (instruction bus) and to the D-BUs (data bus), connecting to a smart cache memory that's actually splitting the Instructions from the Data, enabling some parallelism to some extend. Unfortunately, there is the AHB bus-matrix underway, restricting the speed someway.

Consider the STM32F7 architecture : the ALU embeds 4 KB of I-Cache memory, and 4 KB of D-Cache memory, and is tightly coupled to 16 KB of Instruction memory (ITCM), and 64 KB of Data Memory (DTCM). Once the instruction and data are in cache, the AHB bus-matrix gets out of the way. I don't know if the MADD instruction can have one operand in D-Cache, and the other operand in DTCM, enabling the two operands to converge to the ALU, in parallel.

Consider the DSP56K architecture : the ALU is tightly coupled to the Program Data Bus (PDB), the X-Data Bus (XDB), and the Y-Data Bus (YDB) using three busses operating in parallelism. The two operands of the MACC instruction converge to the ALU, in parallel (DXB and YDB). At the same time, the ALU fetches the next instruction, thanks to the Program Data Bus (PDB).

IMO, provided you have some luck with your Assembler or C compiler, you could persuade a STM32F7 to behave like a decent Digital Signal Processor. Possibly you'll miss the pointer auto-increment feature. Possibly, a practical FIR filter tap for audio, dealing with 32-bit audio data in RAM and 32-bit coefficient data in Flash, could execute in three clock cycles. Add another 30 cycles for managing the circular buffer and for formatting and storing the filtered audio sample. Say the clock is 200 MHz. A 512-tap FIR filter could thus execute in 2.8 µs. At a 48 kHz sampling frequency, the audio samples occur each 20.8 µs. We can thus execute up to six 1024-tap FIR filters, enough for implementing a linear phase stereo 3-way crossover having a frequency resolution of 94 Hz. At the moment, I don't know if this is sharp enough for a bass/mid crossover operating at 300 Hz.

Wondering if you could take some time, verifying this.
In Assembler, or in C?

abraxalito · 2016-09-12 12:15 pm

Hi steph, I shall try to recall how I programmed my software emulation of the SAA7220 digital filter chip, this seems relevant as there's some interest on another thread for a replacement of that chip and only yesterday I identified a potential contender for a device I could build it on, the Cortex M0 STM32F072 (which has dual I2S ports).

The SAA7220 if memory serves correctly has 120 taps (4X oversampling means only 30 taps worth need calculating at the output sample rate). I seem to recall my code needed a CPU clock rate around 55MHz (an overclock of the LPC1114 I was using at that time). Given its stereo this means I achieved the calculation of 240 taps at 44,100Hz so 10.6Mtaps/s.

If I have recalled correctly then on the M0 we need about 5 CPU ticks per tap, I'd expect an M4 to do a little bit better than that but not hugely better because its data move limited rather than calculation limited. Yes I would hope the M7 does better still due to its caches but I've not looked into the architecture in any depth.

Let me now compare your estimates for the M7 with my results on the M0. Your clock rate is about 4X mine so I reckon at 200MHz the M0 could do a single 1750 tap FIR. You're estimating an equivalent single 6140 tap FIR. My gut tells me you may be optimistic here as I'd not expect the M7 to be over 300% more efficient at FIRs than the M0, but 200% might be reasonable.

My code had full loop unravelling and was running purely out of RAM so zero wait states. It was hand-coded in assembler.

steph_tsf · 2016-09-12 2:06 pm

Hi abraxalito, are you basing on ST Nucleo boards? I have a few STM32F401 Nucleo and STM32F411 Nucleo. I would like to reproduce a few experiments, configuring the STM32 I2S as slave, receiving the Bit_Clock and Frame_Sync (no MCLK) from a SPDIF to I2S converter purchased on eBay like Assembled Coaxial Optical Receiver Board TO IIS I2S output CS8412 for amplifier.
Applying a FIR filter, and a few IIR Biquad filters.
Outputting audio through I2S (also slave) to a PCM5102A DAC purchased on eBay like PCM5102 DAC Decoder Encoder I2S Player Assembled Board 32bit 384K Beyond PCM1794 | eBay

The initialization code written in C thanks to the STMCubeMX utility.
The audio DSP code written in Assembler whenever possible.

Possibly adding a console (on the STM32 +5 Volt Serial TX/RX ) for enabling some limited user interactivity, like for enabling/disabling the I2S interrupt, and for grabbing and flashing the FIR filters coefficients and the IIR Biquad filters coefficients. This way during normal operation, there is no need to connect on the Integrated Development Environment (IDE).

On diyAudio Digital Line Level section, we may open a dedicated thread "STM32 bare metal Audio DSP with Serial Console".

I remember what you said a long time ago, when ST introduced the STM32 product line. You were right. Nowadays, STM32 chips and STM32 prototyping boards are becoming so inexpensive, that there is no fatal cost penalty, if for a high quality stereo 3-way crossover, we are forced to rely on three or four STM32 boards running in parallel.

Say there is a board acting as audio input, say a SPDIF to I2S converter, or a USB to I2S converter. Such board to act as local I2S master and local Audio clock master. There is no STM32 chip over there.

STM32 Board #1 connecting on the I2S source, doing some global equalization or dynamics processing.
STM32 Board #2 connecting on STM32 Board #1, filtering it for the two woofers.
STM32 Board #3 connecting on STM32 Board #1, filtering it for the two medium drivers.
STM32 Board #4 connecting on STM32 Board #1, filtering it for the two tweeters.
No 256 x Fs (MCLK) coming from the I2S source.
No 256 x Fs (MCLK) coming from the STM32 boards.

You can hook any DAC that you want, provided it doesn't require an external 256 x Fs MCLK.
All Non-Oversampling DACs are good.
All DACs featuring an internal PLL multiplying the Frame_Sync are good.

Do not try recycling such setup, for implementing an oversampler.
Try keeping everything simple, close to the bare minimum.

STM32F0 boards to compute two 200-tap FIR filters and eight IIR Biquad filters.
STM32F4 boards to compute two 400-tap FIR filters and sixteen IIR Biquad filters.
STM32F7 boards to compute two 800-tap FIR filters and thirty-two IIR Biquad filters.

Roughly.

Working at this, we'll be able to focus on audio DSP, this time for real.

camelator · 2016-09-14 12:36 am

Hi Steph,
well done for your work here,
so you need 4 I2S ? Which STM32 have 4 I2S?
"All DACs featuring an internal PLL multiplying the Frame_Sync are good."
Yes PLL helps to improve sync, but what about jitter without external clockS?

steph_tsf · 2016-09-14 9:39 am

camelator said:
Hi Steph, well done for your work here.

Actually it's not my work, I'm only sketching what could be quickly done (however not dirty) without needing to deal with the USB Audio issues that JMF11 tends to underestimate. IMO, JMF11 should ask the SDR Audio Widget people, about 1) teamworking for a proper STM32F4 and STM32F7 implementation, or 2) teamworking for developing a bare metal audio DSP framework for the ATMEL AT32UC3A.

camelator said:
So you need 4 I2S ?

No. In my previous post, I show how to rely on an array of inexpensive STM32 microcontrollers each featuring one full-duplex I2S for the audio, and possibly one SPI or I2C for the Codec control data. This being said, nothing prevents using a STM32F7 microcontroller, grabbing stereo audio using one SAI/I2S (as input), and feeding three stereo DACs using three SAI/I2S (as outputs). This way you implement a single-chip stereo 3-way crossover (and preamplifier). And you still have plenty I2C and SPI for the housekeeping (DACs mode control, DACs volume control, remote control receiver, alphanumeric display, Bluetooth, etc.).

camelator said:
All DACs featuring an internal PLL multiplying the Frame_Sync are good.

Indeed, that's what I've said. IMO, as soon as you target some decent audio quality, you should not involve a microcontroller's PLL, to generate a MCLK. Those are digital PLLs possibly exhibiting hiccups, and anyway, due to the fact that the microcontroller internal power supply is far from clean, the logic gates thresholds are not rock solid, and this will also generate jitter to some extend. IMO, nowadays you can live without a MCLK signal on the PCB, as there are more and more decent oversampled ADCs (AD1974), DACs (AD1933, AD1934, PCM5102A, TDA7801) and Codecs (AK4558EN, AD1928, AD1937, AD1938, AD1939) to be used as audio clock slaves, featuring an internal PLL in charge of multiplying and smoothing the Frame-Sync, for internally generating a high quality, local MCLK within the chip. Where should reside the local audio clock master? IMO, the easiest approach is to have the I2S audio source, acting as I2S audio clock master. Now comes the question, how do you manage to get an I2S audio source? There can be a SPDIF to I2S converter. Depending on the chip that's inside, you will inherit a dirty or a clean Frame-Sync. Let's try an assembled board that's available from eBay, basing on a CS8412. There can be a USB Audio to I2S converter. Let's try an assembled board that's available from eBay, basing on a PCM2706. We can compare it with another assembled board that's also available from eBay, basing on a CM6631A. We can compare it with a miniDSP miniStreamer, basing on a XMOS microcontroller inside. The beauty of such easy approach, is that unlike the SDR - Audio Widget arrangement, there is no need for the usual 256 x Fs quartz oscillators (44.1 family and 48 family). In our easy approach, the STM32 microcontroller doesn't know about MCLK. It behaves as I2S slave, receiving a frame sync, and receiving a bit clock. Provided the STM32 internal clocks are fast enough, the STM32 I2S hardware block will be able to sample the I2S data, bit clock and frame sync lines, fast enough. This is what I understand, when playing with the STM32 clocks and PLLs, using the STM32CubeMX configuration software.

As a footnote, I must insist that the answer to the question "Can low jitter be achieved with STM32 microcontroller?" is YES (in theory), even when relying on the plain trivial STM32 USB Audio examples provided by ST. Actually this is not STM32-specific.
For achieving this, you shall rely on a 256 x Fs quartz-based master audio clock, that you convert into a finely tunable 256 x Fs PLL by loading the quartz by varicap diodes, instead of the usual 22 pF capacitors. The control voltage emanates from an integrating capacitor (say 10 µF) fed by currents of say 1 mA. A 1 mA charging current during 20 µs (one audio sample period) must occur when the USB audio buffer tends to overflow (doing this, you speed-up the USB buffer data readout). A 1 mA discharging current during 20 µs (one audio sample period) must occur when the USB audio buffer tends to underrun (doing this, you slow-down the USB buffer data readout). As soon as the system gets to the equilibrium state, the current pulses get very seldom, and the capacitor voltage output remains stable (high loading impedance required), meaning that the quartz oscillator gets no modulation anymore. The output quality gets very close to a fixed-frequency 256 x Fs quartz oscillator.
The 256 x Fs MCLK gets available for oldschool oversampling ADCs, DACs, or Codecs requiring a MCLK.
From there, you can elect one of them as I2S master.
That particular one will deliver a high quality frame sync and high quality bitclock, to the whole system.
The STM32 (or any other microcontroller) and the remaining ADCs, DACs, or Codecs to act as I2S slaves.

Regards,
Steph

camelator · 2016-09-14 3:30 pm

"how do you manage to get an I2S audio source?"

raspberry pi 2. There is also I2C for controls.

steph_tsf · 2016-09-15 3:20 am

camelator said:
Raspberry pi 2 to get an I2S audio source. There is also I2C for controls.

Ouch! Definitely wrong. Let me tell you why. The Raspberry Pi 2 doesn't feature an audio clock that's independent from the CPU clock. OK would you say, it is understood that if the Raspberry Pi 2 gets used as I2S master, we'll never attain a decent audio quality, unless we add an ASRC (Asynchronous Sample Rate Converter) chip between 1) the "bad" I2S that's writing the audio data and the audio clocks into the ASRC port #1, and 2) the "rock solid" I2S that's reading the resampled (aka FIR filtered) audio data and writing the "rock solid" audio clocks into the ASRC port #2. Such "rock solid" audio clock to be generated by a high quality quartz oscillator (no PLL, and possibly one flip-flop for guaranteeing a 50% duty cycle). Please note, the port #2 audio data, is not anymore "bit perfect", because of the calculations done inside the ASRC, acting as a quite elaborated FIR filter. Jitter (time domain impurity) gets thus replaced by smear (frequency domain impurity). This is what a ASRC does. This is not optimal. But that's not bad, compared to other arrangements.

There is another method, kind of degenerated zero MIPS ASRC consisting of sending the Raspberry pi 2 I2S audio data and audio clocks into a buffer containing 20 ms of audio. The audio buffer gets written with the "bad" I2S clock exhibiting jitter. The same audio buffer gets read using a "rock solid" I2S clock coming from a quartz oscillator (no PLL, and possibly one flip-flop for guaranteeing a 50% duty cycle). This time, the port #2 audio is "bit perfect" (because of no calculations on the audio). And the port #2 audio can't exhibit jitter, because of the high quality quartz oscillator acting over there. Do you think this is a perfect solution. It is far from good!

The first issue is the slight delay (20 ms) caused by the audio buffer, problematic when there is question of watching a movie (you may perceive the lips as out of sync). This is not fatal, as most media players feature a setting for specifying a delay (positive or negative) for reaching a perfect lips sync.

The second issue is that there are few chances, that the "bad" I2S frame-sync, is at the exact same frequency than the "rock solid" I2S. One can be at 47,998 Hz, while the other can be at 48,002 Hz. The frequencies differ by 4 Hz, from a nominal sampling frequency of 48,000 Hz. After 1 second, the read and the write pointers diverge by 4 samples. In case the audio buffer has a depth of 1024 samples, it will overflow or underflow after 256 second, which is approx 4 minute and 16 second. Such buffer overflow or underrun is catastrophic. When underrun occurs, for realigning the read pointer with the write pointer, you need to repeat the same sample during 10 ms. When overflow occurs, you need to hop to a sample that's 10 ms in the future (dropping 10 ms of signal). Doing so you regain a "headroom" of 10 ms in each direction.
You may apply a momentary FIR lowpass filter during the sample repeat or sample hop, for rendering the fiddle less audible.
Have you noticed, the allowed de-alignment is not 20 ms, but 10 ms in each direction. Which means that we'll overflow or underrun every 2 minutes and 8 seconds.

More complex schemes are feasible, like identifying and measuring the long term de-alignment trend (the frequency mismatch), and computing the exact time at which it is required to skip or duplicate one audio sample, for maintaining the read pointer (port #2) aligned with the write pointer (port #1). In case the frequency mismatch is 4 Hz, we need to skip or duplicate a sample every 250 ms. This is what you hear in many ill-conceived audio systems. For sure, that's not what we want here.

So what ? How to avoid the "big glitch" of 10 ms sample repeat or 10 ms sample hop, every 2 minutes and 8 seconds, and at the same time, how to avoid the "tiny glitch" of 1 sample repeat or 1 sample hop every 250 ms ?

A trick consists of detecting a silence, for immediately doing the "big glitch" sample repeat or sample hop, for immediately re-creating a "headroom" of 10 ms in each direction. Do you understand the waters we navigate, here? Is this decent?

In case there is no silence, and the read pointer is dangerously getting close to the -10/+10 ms mismatch limit, one can blend a "medium glitch" of 2 ms sample repeat or 2 ms sample hop, with a temporary 1 sample repeat or sample drop every 250 ms.

Well, I guess you understand the principles.

This is what you get when Windows or Linux try merging or syncing two digital audio signals using little computing resources (managing pointers is not a heavy load).
You now realize that this is getting very problematic when the two audio signals have uncorrelated clock domains, albeit a same nominal sampling frequency.

This is like hell.

And at this point, you may prefear relying on a ASRC that's proposing a completely different tradeoff, as said above : replacing time-domain impurity, by a small degree of frequency-domain impurity.

Let's now envision USB Audio. The Raspberry pi 2 can be a very good audio source, this time on USB. This, at the condition that the Raspberry runs the proper software driver, taking advantage of the USB2 asynchronous audio modality.
The Asus Xonar U7 soundcard is very good at this. It supports the USB2 asynchronous audio modality. Inside, there is a quartz oscillator, exploited as high quality audio clock master. Everybody should rely on such setup. And this is a 8-channel soundcard. Which means that the day the Raspberry pi 2 can run JRiver Media Center for Linux, and persuade JRiver Media Center to output the audio to a VST that's emulated, such VST can act as stereo 4-way crossover, sending the 8-channel audio over USB, to the Asus Xonar U7 soundcard, for implementing a stereo 4-way crossover.
Look how tiny the Raspberry Zero is.

1Ghz, Single-core CPU
512MB RAM
USB On-The-Go
Micro USB power
HAT-compatible 40-pin header
$20 maybe

Wait a moment, from where JRiver Media Center is supposed to take the audio? Please tell me ...
From I2S ?
From a SPDIF to I2S converter ?

Wait a moment, we know that the Asus Xonar U7 is going to operate as audio clock master, regularly requesting audio samples.
That's plain incompatible with a SPDIF source, converted to a I2S source, connected on the HAT-compatible 40-pin header.
The two audio clock domains are uncorrelated.
We are kaput.

From Ethernet or WiFi perhaps, grabbing streamed audio coming from internet (Spotify, a NAS, etc.)?

Wait a moment, we know that the Raspberry has a poor Ethernet implementation, the Ethernet controller being hooked on the USB. This will cause congestion with the USB audio.
We are kaput.

Last hope solution is to grab audio from the Asus Xonar U7 analog Line-In. Quite a shame.
Albeit not stupid, if the aim is to play vinyles in realtime.
Requires a high quality turntable, high quality cartridge, and high quality preamp.
Will be exposed to acoustic feedback.
Quite doubtful, thus.

Now look, there is a way to use the Raspberry pi 2 I2S, that has not yet been investigated a lot.
It consists on taking the Raspberry pi 2 for what is it : a CPU that's quite fast, but unable to produce delicate things like a high quality MCLK, and a high quality frame-sync operating a 44.1 kHz and 48 kHz.
Consider configuring the Raspberry pi 2 I2S as slave.
Even not dealing with a MCLK.
Say there is audio coming from a CD/DVD/Bluray player, through SPDIF.
Say you connect a SPDIF to I2S converter.
It will output the I2S frame-sync, bit clock, and audio data.
The SPDIF to I2S converter acts thus as SPI master, and audio clock master.
The Raspberry pi 2 reads such audio, as SPI slave.
The Raspberry pi 2 returns processed audio, still as SPI slave.
And the DAC (possibly an assembled PCM5102A DAC) reads such audio, also as slave.
The whole stuff gets thus sequenced by a sole device : the SPDIF to I2S converter.
The end result will only depend on the quality of the frame-sync that's delivered by the SPDIF to I2S converter.
But wait a minute, there is only one I2S, meaning that such arrangement can't implement a stereo crossover.
We are kaput.

Another arrangement that's conceivable, is to have the same ambition as JMF11. Transforming the Raspberry pi 2 into a high quality USB soundcard. It should emulate some reputated USB2 Async soundcard, say the miniDSP miniStreamer.
There shall be a quartz oscillator as audio clock master, generating MCLK = 256 x Fs.
There shall be a DAC reading such MCLK.
Such DAC to operate as I2S master.
The Raspberry pi 2 operating as I2S slave, not even touching MCLK, but receiving the frame-sync and the bit clock.
The Raspberry pi 2 requesting audio packets through USB.
The Raspberry pi 2 outputting stereo audio on I2S slave.
Okay, this can work. At the condition that one succeed in emulating a miniDSP miniStreamer. There is a big programming effort required.
But wait a minute, there is only one I2S, meaning that such arrangement can't implement a stereo crossover.
We are kaput.

Next time you want to rely on a Raspberry pi 2 for processing audio, think twice.

Regards,
Steph

steph_tsf · 2016-09-15 8:00 am

steph_tsf said:
... Say you connect a SPDIF to I2S converter.
It will output the I2S frame-sync, bit clock, and audio data.
The SPDIF to I2S converter acts thus as SPI master, and audio clock master.
The Raspberry pi 2 reads such audio, as SPI slave.
The Raspberry pi 2 returns processed audio, still as SPI slave.
And the DAC (possibly an assembled PCM5102A DAC) reads such audio, also as slave.
The whole stuff gets thus sequenced by a sole device : the SPDIF to I2S converter.

A kind of typo here above. I should have written :

... Say you connect a SPDIF to I2S converter.
It will output the I2S frame-sync, bit clock, and audio data.
The SPDIF to I2S converter acts thus as I2S master, and audio clock master.
The Raspberry pi 2 reads such audio, as I2S slave.
The Raspberry pi 2 returns processed audio, still as I2S slave.
And the DAC (possibly an assembled PCM5102A DAC) reads such audio, also as slave.
The whole stuff gets thus sequenced by a sole device : the SPDIF to I2S converter.

Search

Amplifiers

Source & Line

Loudspeakers

Design & Build

General Interest

Live Sound

Member Areas

Site

Featured Vendors

Members Market

Vendors Market

Vendors

Search

Can low jitter be achieved with STM32 microcontroller

JMF11

JMF11

steph_tsf

JMF11

JMF11

TNT

JMF11

camelator

steph_tsf

JMF11

JMF11

abraxalito

steph_tsf

Attachments

abraxalito

steph_tsf

camelator

steph_tsf

camelator

steph_tsf

steph_tsf