Results of the 2023 TSR Spring Speaker Shootout

margiepollock
Oct 6, 2023
17 min read

Updated: Feb 29, 2024

READ ON FOR OUR DETAILED REPORT!!!

If you’d like to discuss the results of this event or are interested in any of these speakers, please contact us at info@thescreeningroomav.com or call us at 720-377-3877. Content like this is difficult and expensive to produce; the business we hope to earn from this is what funds the next big event. If you find this kind of content valuable, please reach out to us about any of your audio / video needs so we can keep putting out this kind of good, reliable information. We'll be hosting a webinar soon to discuss in detail what we learned from the shootout. If you’re interested in attending the webinar, be sure to sign up for our mailing list.

INTRODUCTION

The Screening Room has always been passionate about the science of audio and video. For example, when we do product comparisons we go through great efforts to eliminate testing variables with the goal of creating a truly level, competitive playing field. Most product comparisons online or even in publications don’t put into place the stringent controls necessary to achieve reliable results, as such efforts are expensive and time consuming. However, taking the time to put strict scientific testing protocols in place is the only way to ensure you get valid data (for a good example of the lengths we go to, check out our recent projector shootout), This Speaker Science and Shootout event proved that to us once again, as the time, effort and dollars put in were not insignificant - but the end results are far more meaningful.

With this event our intent was to see if the peer reviewed science that shows that people will overwhelmingly pick the most neutral and accurate speakers in blind listening tests would hold up. For years we've followed groundbreaking research correlating listener preference with speaker measurements conducted by Dr. Floyd Toole, Dr. Sean Olive, and others at the Canadian NRC. This research was later refined and further developed at the Harman X Research Group in Northridge California (note that the ongoing research at Harman X is shared with the audio community at large, published and peer-reviewed in the true spirit of science).

Dr. Sean Olive (left) and Dr. Floyd Toole (right)

The work of Toole, Olive and others involved in this project has driven innovation not only at Harman, but other companies who took the data from these experiments and incorporated it into their own designs. Perhaps the most amazing thing that came out of the NRC research is the fact that listener preference in a listening test can be predicted with 86% confidence just by looking at anechoic speaker measurements in a format called a “Spinorama.” This predictive power is so great that the Spinorama was recently formalized as a industry standard - the CEA2034 "Standard Method Of Measurement For In-Home Loudspeakers" - which in turn has been integrated in the new CEDIA RP22 immersive audio design document.

We decided to put that research to the test and experience it for ourselves. We designed our very own "speaker shuffler" from scratch so we could conduct a truly blind listening test. We invited industry experts like Dr. Sean Olive, Senior Fellow at Harman International and past president of the AES (Audio Engineering Society), Erik Wiederholtz, CTO at Perlisten Audio, Jon Herron, Managing Director at Trinnov Audio, and Mark Seaton, CEO of Seaton Sound. We invited people to loan us speakers to test and we invited the public at large to come experience the testing with us.

Before the listening tests, Dr. Sean Olive delivered a 60-minute presentation on the 40+ years of research that laid the foundation for our tests, then took questions from the audience. You can watch his full presentation here. It’s fascinating stuff, so well worth your time:

Dr. Sean Olive's "Science of Sound" Presentation

Eric Weiderholz of Perlisten also presented later in the day that was complementary, focusing on speaker science and engineering only. Unfortunately, our videographer had to leave before Erik’s presentation or we would absolutely have included it here (sorry, Erik!).

TEST PREPARATION

As mentioned above, the point of the test was to see if the speakers that measured the best would indeed rank the highest in listening preference. To that end, we put out a national “casting call” asking for people to bring speakers to the event, specifically targeting well-regarded brands like Focal, Legacy, B&W, Wilson, KEF etc. Unfortunately, and understandably, people are not often willing to transport an expensive speaker to an event like this one, but we did end up with a nice cross-section thanks to some very generous participants who brought their own speakers for comparison. The idea was to have a variety of speakers with a variety of measurement characteristics.

Our event was wide open to the public – in fact, we had attendees from all over the country, including an individual from Canada. Anyone was welcome to check our methods and test protocols. The goal of the shootout was to eliminate bias having to do with price, appearance, and popularity and concentrate solely on the listening experience. This video gives an excellent overview of the test protocols and gives a great feel for what it was like to be in the room those two days:

Test Overview Video

CAVEATS

Caveat #1

We have been deliberating for months now whether we should share the data we have from Saturday, the first day of testing. Ultimately, we decided not to do so. Why? Because the data was statistically unreliable for a variety of reasons:

Saturday’s crowd was so large that the seating distance / position was not even close to the same for each listener. In fact, many listeners had somewhere between two and four people sitting in front of or beside them, sometimes both. Several were way off to the side, against the wall. We apologize to attendees for not properly tracking the attendee count before Saturday’s event so that everyone could get a good listening position.
Dr. Olive suggested that everyone put down their relative listening positions to see what could be learned from trying to correlate listener position with scoring, but only a few participants took the time to do that. Without that data no real conclusions could be drawn.
Human bodies all packed together and blocking the speakers make terrific sound absorbers, which means we ended up with a massively over-absorbed room. Since this is so far from what could reasonably be experienced by any listener in a real-world environment, we felt that the limited data we had could not be extrapolated into anything meaningful and might actually lead to confusion and misrepresentation.
The scores we did get were almost random. Test operator John Schuermann was at the very back of the room and agreed with those seated in the back that it was very hard to tell one speaker from another due to all the bodies and their resulting absorption,.

Picture of Saturday’s crowd – see how densely packed we were. Dr. Olive is doing his presentation, so several people from the front row moved to standing positions in the back. There were even more people up front, off to the side.

Sunday, on the other hand, we had a much smaller crowd (11 listeners for the “under $10K” shootout round), so were able to give just about everyone a chance at the “sweet spot.” We also took Dr. Olive’s suggestion for whittling down the track list, as well as picking tracks that were fuller bandwidth and therefore more revealing of speaker deficiencies. For that reason, we are much more confident about Sunday's data, particularly in the "under $10K per pair" category. Even from John's position in the back of the room, the differences in sound between the speakers was obvious.

Caveat #2

One of the speakers in the “over $10,000” category – the Revel F328Be - had a bad cross-over, which resulted in a 6 db drop-off in high frequency response over 2 khz (right at the tweeter crossover point). We had Bob Lord of the AES confirm the issue when we compared the speaker to its mate in our showroom. More notes on this in the “Round 2” section below.

As a result of all the above, we are planning a Round Two event, taking into account what we learned from the first event AND replacing the speaker with the bad tweeter / crossover. Those who are interested in participating or bringing a speaker should reach out to us as we plan the next event.

SPEAKER SHOOTOUT RESULTS

We need to preface all this by stating that the point of the shootout was NOT to say "this speaker sucks" or "this speaker was great." The point of the event was to see if the 40+ years of research that shows that speaker preference can be predicted based on how well it measures would stand up.

So, how well did the science of predicting speaker preferences hold up? Read on to find out!

LISTENING TEST CONDITIONS

All speakers were hidden behind an acoustically-transparent screen. At no time did anyone in the room – including test operator John Schuermann – know which speaker was playing.

All speakers were level-matched using pink noise to within a fraction of a decibel using a Trinnov Altitude16 processor feeding a Trinnov Amplitude8m amplifier. Music source was Roon. All DSP and crossovers were bypassed, and all speakers performed “on their own” without any tuning files or subwoofers. Each speaker was then moved into the identical listening position using the speaker shuffler designed by The Screening Room’s own Steve Crabb – see the above video (thanks Steve!).

Speakers were labeled as A, B, C and D. When the curtain was pulled back no one in the room knew which speaker was going to be which.

Below, is a summary of the second day of testing, done after we corrected for the problems we had the first day. Here are the test results - please enjoy!

"The event was very educational - I love how TSR strives to bring industry experts to explain the science and the manufacturer’s approach to designing their products. I highly recommend attending these events if you have a chance." Jay W, Speaker Shootout Attendee

There are lots of ways to skew a blind test. TSR took every possible precaution to ensure the shootout was fair and level matched. Mike F, Speaker Shootout Attendee

UNDER $10K PER PAIR CATEGORY

These are the scores from Sunday when we had a smaller group of people (as described above). The seating arrangement and distances on Sunday were much more typical of what someone would have in their own home theater / listening room, so we feel that these scores apply to most people’s home situations.

NOTE that these scores were derived by simply adding up the total scores that each speaker got on all tracks combined, then dividing that by the total number of listeners.

Here is an example of a typical score sheet pulled randomly from the pile for the “Under $10K Speakers” session from Sunday. Each track was rated with each speaker on a scale from 1-10:

Bird on a Wire, Jennifer Warnes:

Speaker A: 7
Speaker B: 9
Speaker C: 7
Speaker D: 2

Take Five, Dave Brubeck:

Speaker A: 5
Speaker B: 8
Speaker C: 9
Speaker D: 2

Track list for the this session:

Bird On a Wire – Jennifer Warnes
Battlestar Galactica – Stu Phillips
World in My Eyes – Depeche Mode
Parade of the Wooden Soldiers – Dallas Wind Symphony
Spanish Harlem – Rebecca Pidgeon
Take Five – Dave Brubeck
Fat Cry – Yello
Fast Car – Tracy Chapman

Here are the cumulative scores for all speakers in the under $10K category, ranked in order of preference:

Revel F226Be: Total Score 684, divided by number of listeners (11) = 62.18, with an average track score of 7.77
JBL HDI-3800: Total Score = 586 / 11 = 53.27, with an average track score of 6.66
Triad Gold In-Room LCR: Total Score 532 / 11 = 48.36, with an average track score of 6.05
Klipsch Heresy with upgraded Crites crossovers: Total Score 345 /11 = 31.36, with an average track score of 3.92

People at the event keeping us honest: Jon Herron (Trinnov), Mark Seaton (Seaton Sound), Dr. Sean Olive (Harman X Research Group), Erik Weiderholtz (Perlisten).

NOTE that these photos are of the hidden lineup from the "Over $10K Per Pair" shootout group (shared to show our speaker shuffler mechanism).

COMMENTARY ON ROUND ONE RESULTS (INCLUDING SPEAKER MEASUREMENTS)

So, how well did the listener preference ranking come out in relation to how well the speaker measured?

The speakers that measured the best also scored the best. In fact, this held true right down the line!

"Putting your money where your mouth is in a blind shootout is nerve-wracking, but science prevailed. I preferred the speakers that measured best. In fact, I preferred them in the exact order of how well they measured." Mike F, Speaker Shootout Attendee

Here are the “spinorama” CEA2034 speaker measurements for each model in this shootout, in order of how they ranked. The top lines in the Spinorama graph essentially represent anechoic frequency response. Note that the smoother the frequency response, the better the speaker did in the listening test.

But frequency response isn’t the only thing shown on a CEA2034 graph – there are other elements that contribute to the overall sound quality of a speaker shown in it as well. A complete primer on “how to read a Spinorama” is included at the bottom of this article. *

"The event was very educational - I love how TSR strives to bring industry experts to explain the science and the manufacturer’s approach to designing their products. I highly recommend attending these events if you have a chance." Jay W, Speaker Shootout Attendee

Revel F226Be CEA2034 “Spinorama”

JBL HDI3800 CEA2034 “Spinorama”

Triad In-Room Gold LCR CEA2034 “Spinorama”

Klipsch Heresy CEA2034 “Spinorama”

NOTE – We did receive some online criticism about one aspect of this listening test. A couple of individuals felt that this was an unfair test of the Triad Gold LCR, as it is intended to be used with a subwoofer. However, the bass roll-off points on most of the speakers - including the "winning" Revel F226Be - are very similar to the Triad model. For this and other reasons, we don’t consider this criticism to be all that valid. Steve Crabb of TSR – the prime architect of the listening test – wrote a detailed response to this criticism that can be found at the bottom of this article. **

In the interest of fairness, we have extended a sincere invitation for these critics to participate in our next shootout event.

OVER $10K PER PAIR CATEGORY

Published here are the scores for the “Over $10K” category. We waffled on publishing these scores because the Revel F328Be had a clear defect (evidently a bad crossover which showed up as a -6 db drop from 2.1 kHz to 20 kHz). However, in the interest of total transparency we will share the results with the caveat that one speaker was not performing properly.

Again, these are the scores from Sunday when we had a smaller group of people.

IMPORTANT NOTE – As mentioned above, we discovered a problem with the Revel F328Be speaker used in this portion of the test. Several people noticed that it sounded muffled, like a blanket was over the tweeter. When we brought it back to our reference showroom we took an REW measurement, and the speaker is either defective or was damaged in transport. There was a big 6 db drop at 2 khz which continued all the way up to 20 khz. This is right at the crossover point between the midrange driver and tweeter. We compared it to the other F328Be at our showroom, which measured to spec. The graphs are posted below for comparison. AS A RESULT WE ARE GOING TO SCHEDULE A NEW LISTENING TEST WITH ALL OR MOST OF THESE SPEAKERS IN THE NEAR FUTURE.

REW measurements take in room of the two F328Be speakers. Note the red line, which shows the frequency response of the damaged or defective speaker (the green line is the properly operating speaker).

Note too that our group of listeners had dropped to seven at this point. It’s also fair to point out that the remaining group contained a good number of individuals over 50, which means that there was likely some high frequency hearing loss associated with these listeners. This could explain why the Revel F328Be scores were better than expected, considering the fact that the speaker had impaired high frequency response. Again, this is part of why we want to do this particular round of testing a second time.

NOTE 2 - These listening scores were derived by simply adding up the total scores that each speaker got on all tracks combined, then dividing that by the total number of listeners, just as in the previously described listening test.

All speakers were hidden behind an acoustically transparent screen. At no time did anyone in the room – including test operator John Schuermann – know which speaker was playing. All speakers were level-matched using pink noise to within a fraction of a decibel using a Trinnov Altitude16 pre-pro feeding a Trinnov Amplitude8m amplifier. Music source was Roon. All DSP and crossovers were bypassed, and all speakers performed “on their own” without any tuning files or subwoofers. Each speaker was then moved into the identical listening position using the speaker shuffler designed by The Screening Room’s own Steve Crabb.

Track list for the this session:

Bird On a Wire – Jennifer Warnes
Battlestar Galactica – Stu Phillips
World in My Eyes – Depeche Mode
Parade of the Wooden Soldiers – Dallas Wind Symphony
Spanish Harlem – Rebecca Pidgeon
Take Five – Dave Brubeck
Fat Cry – Yello
Fast Car – Tracy Chapman

Here are the cumulative scores for all speakers in the OVER $10K category, ranked in order of preference. NOTE also that Steve put the $5500 per pair JBL HDI3800 into the mix as a “ringer” and to fill out the open fourth slot on the speaker shuffler:

Perlisten S7t (List $17,990 per pair in gloss black): Total score 363, divided by the number of listeners (7) = 51.86, with an average track score of 6.48
JBL HDI-3800 (List $5,500 per pair): Total score 360 / 7 = 51.43, with an average track score of 6.43
Legacy Focus SE (List $12,400 - $15,000 per pair based on finish): Total score 337 / 7 = 48.14, with an average track score of 6.
DEFECTIVE Revel F328Be (List $17,600): Total score 333 / 7 = 47.57, with an average track score of 5.95

People at the event keeping us honest: Jon Herron (Trinnov), Mark Seaton (Seaton Sound), Dr. Sean Olive (Harman X), Erik Weiderholtz (Perlisten)

AGAIN, please note we will be running this listening test again with a properly operating F328Be.

“After two rounds of testing the TSR team was able to give us a look behind the curtain. After seeing the custom speaker slide setup I was even more blown away by their dedication to the event and objective nature of the testing.” Brandon S, Speaker Shootout Attendee

"The speaker shuffler was impressive…I’m not aware of anyone else in the industry going to these lengths to compare products on their true performance and not marketing claims." Mike F, Speaker Shootout Attendee

COMMENTARY ON ROUND TWO RESULTS (INCLUDING SPEAKER MEASUREMENTS)

Once again, with the exception of the damaged or defective Revel F328Be, the speakers ranked in terms of how well they measured, just as predicted by the CEA2034 research.

Perlisten S7t CEA2034 “Spinorama”

JBL HDI3800 CEA2034 “Spinorama”

Legacy Focus SE Measurement Data

NOTE that we were not able to get CEA2034 measurements on this speaker. The manufacturer does not supply them and the extreme weight of the speaker prevented us from shipping the speaker out to be measured. However, we did find these measurements from a Polish audio publication. Since these measurements are not in the same format we recommend against drawing any conclusions from what is reproduced here. For full context, here is the link to the review with measurements - https://audio.com.pl/testy/stereo/kolumny-glosnikowe/3113-focus-se

Revel F328Be CEA2034 “Spinorama” measurements.

Once again, the speaker in our shootout had a bad crossover so these measurements don’t reflect the actual performance of the speaker in the test.

CONCLUSIONS AND TAKEAWAYS

Well, we learned a lot. First and foremost, we discovered that yes, listener preference can indeed be predicted by looking at speaker measurements in the CEA2034 “Spinorama” format. This was true right down the line.

Secondly, we learned a great deal about what to do and what not to do. As mentioned, we are going to put together a second event based on what we learned from this first go-around, namely:

Limit the crowd size so that everyone gets a chance in the “sweet spot.”
Do a listening test like this in a space where we have more control over the variables
Reduce the number of tracks but increase listening time (these were mistakes we made on Saturday that were rectified on Sunday)
Test speakers beforehand to make sure all are performing to spec

On the other hand, we think the methodology we used for the testing was absolutely valid. Steve’s speaker shuffler design worked flawlessly. The fact that the speakers were visually hidden from view (as well as their identities and price ranges) meant that we were able to keep as much bias out of the listening test as humanly possible. Within the limits of the caveats listed above, we absolutely stand by the results of this test.

NOTE - If you’d like to discuss the results of this event or any of these speakers in detail, please contact us at info@thescreeningroomav.com or call us at 720-377-3877. Also, we'll be hosting a webinar soon to present and discuss in detail what we learned from the shootout. If you’re interested in attending the webinar, but sure to sign up for our mailing list.

See below for important footnotes!

* CEA2034 “Spinorama” Primer

The top lines in the Spinorama graph essentially represent anechoic frequency response. Note that the smoother the frequency response, the better the speaker did in the listening test. This correlates with the research that shows that the most accurate speakers are those that do the best in controlled listening tests. This really shouldn’t be surprising; for the same reasons our eye / brain combination can tell when colors are “off” on a video display, our ear / brain combination can also tell us when sounds are ”off” from how we experience them in real life. But frequency response isn’t the only thing shown on a CEA2034 graph – there are other elements that contribute to the overall sound quality of a speaker reflected in it as well.

Color-coded measurement of the CEA2034 measurement of the Revel F228Be

On-axis Response - This represents the direct sound heard by a single listener sitting on the design axis of the loudspeaker. A flat frequency response is an absolute requirement for all electronic devices. Therefore, it is not surprising that loudspeakers with a flat on-axis frequency response have a higher probability of being preferred in double-blind listening tests.

Listening Window - The well-designed loudspeaker should deliver good sound to a group of listeners -- not just the person sitting on-axis. The listening window is the average frequency response measured for listeners sitting on and slightly off the reference axis of the loudspeaker. Loudspeakers that receive high sound quality ratings in double-blind listening tests tend to have listening windows with a flat frequency response.

First, or Early Reflections -- Most of the sound we hear is reflected in rooms. The second loudest sound (after the direct sound) is the first reflected sound produced from the loudspeaker. Therefore, it is paramount that the sounds radiated by the loudspeaker in the off-axis directions generate early reflections that sound good. The shape of this curve should not differ greatly from the on-axis response curve.

Sound Power Response -This is a measure of the total sound radiated by the loudspeaker without regard to the direction in which it is radiated. The shape should be smooth and slightly downward tilting.

Sound Power and First Reflection Directivity Indices - These directivity indices tell us how the directivity of the loudspeaker changes as a function of frequency. At low frequencies most loudspeakers radiate sound omni-directionally (DI = 0 dB), where wavelengths are long. In forward-firing, 2-way and 3-way loudspeakers, as wavelengths get shorter, frequencies get higher, and more of the sound is radiated towards the front. The goal is to have this trend develop smoothly and gradually.

** TSR’s response to online criticism re: the Triad test results. Note that TSR is itself a Triad dealer (pointed out as we were accused of trying to bash a “competing” speaker brand):

The suggestion that the test was flawed because "the Triad Gold In-Room LCR was designed to be used with a sub" misses the mark. All the speakers have similar bass extension capabilities. The Triad's -3dB point is 50Hz. The Revel F226Be that “won” the shootout in the Under-$10k category has a -3dB point of 48Hz, so almost the same extension as the Triad. The Klipsch Heresy that came in last rolls off at 70 hz. The JBL HDI-3800 that beat out the Triad by a small margin has a -6dB point of 37Hz, and is down 5 db at 45 hz.

We literally spent hours as a team debating whether to test all the speakers with or without subs. Both approaches have issues, and we considered all those issues. We knew that if we tested with subs, people would have criticized us for unfairly handicapping the speakers with better bass extension. We also knew that if we tested without subs, that people would criticize us for unfairly giving the advantage to the speakers with better extension. We knew we'd be damned if we did or damned if we didn't. Considering the complexities of integrating multiple speakers with multiple subs, and considering set-up time and complexity given the fact that we were setting up in a rented facility, we chose the approach that was simpler to execute.

So, while it's true the Triad isn't technically designed to be used as a full-range speaker and wouldn't be used that way in a designed theater, given its capable extensions, the other similar speakers in the shootout, and how we conducted the test (using music at modest SPL's), it was not only reasonable to have been tested this way, we think it was the best, most fair way, and we spent considerable time and effort to reach that decision.

It's worth noting that Erin Hardison's measurements did reveal some issues with the Triad. A little non-linearity at around 200Hz and 1kHz, some beaming of the dome tweeter, and some issues with the directivity because there's no waveguide. But not a bad or terrible speaker by any means. – Steve Crabb

NOTE - If you’d like to discuss the results of this event or are interested in any of these speakers, please contact us at info@thescreeningroomav.com or call us at 720-377-3877. Also, we'll be hosting a webinar soon to present and discuss in detail what we learned from the shootout. If you’re interested in attending the webinar, be sure to sign up for our mailing list.