Acoustic Horn Design – The Easy Way (Ath4)

You may not like the show, but the audio on Disney's "Mandalorian" is extraordinary, as is the visual. The film business is so far ahead of the audio one that it is alarming. I found this out when I looked into video compression. The video guys had done an enormous amount of work on perception - way beyond anything done for audio.

And in video they accept the science. In audio it's still "sounds good to me, that's all that counts."

^ Good point!

My friend may work on that, I'll check with him.

You two would probably have fun talking shop. I've never seen someone who invests so much time with audio. It's his day job and it's also his hobby.
 
What I fail to see is any rational for increasing DI(f) other than "it sounds good to me." I'm not a big fan of that kind of evidence.

I wonder if it is beneficial to adjust directivity based around human ability to localize different frequency wavelengths and our preferences for reflections.

In this research paper on sound source localization they say,

Interestingly, due to the physical properties of the head, preponderant cues depend on sound stimulus characteristics. There is thus a change in cuing around 1500 Hz, with ITD used below and ILD above [9], [12]. This leaves a “gray area”, roughly between 1000 and 3000 Hz, where the binaural cues are inefficient (Fig. 5).

At the same time, in Loudspeakers and Rooms for Sound Reproduction—A Scientific Review Floyd Toole said,

• Persuasive evidence points to several beneficial and few
negative effects of early reflections. However, sound
reproduction brings some conflicting requirements, and
more research is required to identify what control of
overall reflections is appropriate. That research should
take into account the normal multichannel loudspeaker
configurations and the primary roles played by each of
the channels.
• A room with abundant reflections is not likely to exhibit audible evidence of comb filtering from any single
reflection.
• Multiple reflections improve the audibility of timbral
cues from resonances in the structure of musical and
vocal sounds.
• Early reflections improve speech intelligibility.
• Early lateral reflections increase our preference for the
sound of music and speech.

If someone likes increasing DI it might be because they want wide directivity in the range of hearing where sound source localization is in the 'gray zone' (derive the benefits Toole notes without sacrificing imaging) and they want narrow at higher frequencies where humans are able to sharply localize again.

However, I don't know if sheetrock walls absorb those high frequencies, meaning wide directivity at higher frequencies wouldn't matter very much anyway. Maybe in a normal room narrowing the high frequencies doesn't help because they don't reflect anyway. That's something I'm still trying to figure out.
 

Attachments

  • 1-s2.0-S187972961830067X-gr5.jpg
    1-s2.0-S187972961830067X-gr5.jpg
    47.4 KB · Views: 294
When talking about "early reflections" it is important to be clear on the timing. Greater than about 10 ms. and they are all good, but below that they degrade imaging. Toole and other acousticians usually mean > 10 ms. since in most larger spaces this is always the case. But in a home situation the VER (Very Early Reflections < 10 ms.) are an issue. This is precisely where Floyd and I disagreed, but he seems to have softened his position somewhat.
 
Early reflections like > 10 ms. do not degrade imaging (much.) But <10 ms. will. How this is determined is a long story told somewhere here at DIY. It has to do with looking at the impulse responses of gammatone filters - which emulate hearing. These get shorter and shorter with frequency so, say at 1 kHz the gammatone impulse response might be, say, 10 ms. long (guessing at the number right now) hence a reflection > 10 ms will not interfere with the direct sound. But everything below 1 kHz will and this will degrade our hearings ability to separate the direct from reflected sound. At > 20 ms it would be 500 Hz and 5 ms. would be 2 kHz. You get the idea.
 
I also have to say that the plot that is being shown is confusing to me because according to Blauert the mid frequency range has a high capability of resolving localization because the ears have both ITD and ILD cues at their disposal. The cues are not conflicting but enhance one another. Moore has stated 2 kHz as the point where our hearing is most acute.
 
I also have to say that the plot that is being shown is confusing to me because according to Blauert the mid frequency range has a high capability of resolving localization because the ears have both ITD and ILD cues at their disposal. The cues are not conflicting but enhance one another. Moore has stated 2 kHz as the point where our hearing is most acute.

Here's the link to the paper I got that image from. It's from 2018 so I thought it captured earlier work but maybe it didn't or -- far more likely -- maybe I misunderstood the paper.

Sound source localization - ScienceDirect

With consideration of Blauert's findings, I was thinking a loudspeaker with narrow directivity below 1,000Hz, wide directivity 1,000-3,000Hz, and then narrow again above 3,000Hz might resolve ok because we use broadband cues to localize. By controlling above and below the "inaccuracy area" (if that 2018 paper is right, and if I interpreted the paper correctly) you would control directivity in the frequency ranges that helped you resolve localization using the combination of ITD and ILD cues in a broadband signal.

Why not control the entire spectrum, including the "inaccuracy area"? I don't know, maybe that is the best way. The reason I started thinking about widening the "inaccuracy area" is that it appeared Toole -- if I understood him correctly, too -- said reflections improve clarity, specifically in the 2,000Hz interaural crosstalk cancellation dip he described in chapter 7.1.1 Problems With The Stereo Phantom Image. He said the reflections fill in what we miss and reduce the perception of the dip.

In the scope of this thread, that would make me lean toward waveguides designed to operate as you approach 3,000Hz. Or another option, a waveguide that works from 1,000Hz with the addition of >10ms delayed side channels that operate in the "inaccuracy area." But only if my understanding of Toole and the 2018 paper is accurate. And my understanding is the big weak link here.
 
Thanks, I see where that comes from now. I can also see that this is a dip that stems from the same issue that Floyd talks about for mono signals - a weak inter-aural-cross-correlation at about 2 kHz. I would guess this pretty narrow and it's hard for me to see if what you suggest would be better or worse. Remember those studies were not about loudspeaker imaging in a room, but were quite specific to single sources in open spaces.

But thanks, it was an interesting review of the latest thinking.
 

Is this quote the reason you liked that paper?

The paper also highlighted the large amount of research that still
needs to be conducted in order to fully understand the details of
ITD fluctuations. Further investigation of the effect of various
physical parameters of aspects of the sound recording and
reproduction chain on the ITD fluctuations created at the ears of
the listener is of particular importance. The task of relating the
perception of ITD fluctuations to subjective preference is also
significant. The results of this research will be presented in future
publications

I thought this might be a useful quote as well, it appears to be consistent with what I've read the loudspeaker directivity chapter of Floyd Toole's book in the sense that different types of music styles or program material benefit from different and possibly conflicting design choices.

It is likely that the preferred magnitude of ITD fluctuations will be
dependent on the characteristics of the programme material. Whilst
for some programme material it may be preferable to create a
magnitude of ITD fluctuations as large as possible, this may not be
suitable for other programme material.
 
I would guess this pretty narrow and it's hard for me to see if what you suggest would be better or worse. Remember those studies were not about loudspeaker imaging in a room, but were quite specific to single sources in open spaces.

Thanks for reviewing it and interpreting it for me in the context of loudspeaker imaging in rooms. Your response leads me to think it might be beneficial to take a modular approach to speakers based on what you said here:

https://www.diyaudio.com/forums/multi-way/103872-geddes-waveguides.html#post1237595

I view sound system design in three major frequency ranges - low frequencies, where modal effects and the room dominates, there is no imaging or psychoacoustics to worry about, its simply a matter of adequite output and smooth spatial and frequency response (more on this in another thread); 200 Hz - 1000 Hz, probably the most forgiving of the three regions, our auditory system is only just begining to be capable of resolving spatial aspects (localization) and it is not yet very good at resolution of time delays, reflections and frequency response. If you are going to compromise something do it here as it will have the least noticable effect. Above 1 kHz is where we live as far as music is concerned. This region is ultra sensitive to time delays, reflections, frequency response, diffraction, all the things that tend to mess up coloration and imaging. Mess up this frequency region and you won't be able to recover the sound quality. Here is not where you want to make compromises for sound quality.

I mean a literal modular approach. Multi-subs, Low-mid section 200Hz to 1,000Hz, then a separate mid-high section above 1,000Hz. The mid-high section could be swapped out based on listening material.

There is one correct answer to room modes, pretty much one correct answer to low-mid and it's forgiving anyway, but the 1,000Hz up section has more than one answer and is least forgiving.
 
diyAudio Moderator
Joined 2008
Paid Member
I wouldn't overstate 'forgiving', in the 200-1k region. It is relative, and the magnitude of what you can get away with has to be in perspective with the expectations of the system.

Therefore it is one thing to say I'll let a crossover fall within this region.. and another to abuse it. It is not without challenge to get good results here given the nature of the modes and various practical issues.