The death of web usability testing as we know it?

31 Dec 2007 - 12:45am
6 years ago
19 replies
973 reads
Oleh Kovalchuke
2006

I was reading Supercrunchers [1], and came across Omniture's Offermatica
[2].

Offermatica does real time randomization and analysis of traffic from the
variants of the page. The randomization is important here for validity of
the results. The analysis is per session.

For instance, you want to see, how search box placement or font size affects
product sales. Make the layouts to be tested and see, which one increases
sales in real time. I think this could lead to an incremental microevolution
of layouts, not unlike the biological microevolution (and look what good
that has done...).

So the questions I have are these:

- Have people used Offermatica?
- Since it brings actual statistical analysis into usability testing
based on sales goals, doesn't it lead to the death (or at least to the
significant dent) of the conventional web usability testing (facilitator,
one-on-one etc.)? Put "something" on the web and start incrementing on daily
basis.

Oleh

PS I understand that my message sounds like astroturfing. It is not. I am
only concerned about Nielsen's welfare.

------------------
[1] http://tinyurl.com/2nzrov
[2] http://www.omniture.com/products/optimization/offermatica - they use
rather long copy as well as tiny font on that page, I wonder, if those were
optimized with their software.

Comments

31 Dec 2007 - 8:42am
Patricia Garcia
2007

Oleh, I'm sure Nielsen is flattered for your concern. :)

I think what you mention will be used in conjunction to usability
testing. But in order to collect analytics, you are assuming a live
site or at least a beta with lots of traffic. For new interfaces or
testing out a design prototype before it's been developed, you still
use usability testing. I think what you are suggesting is to forget
that step and just put it out there and see what happens? It's
possible bottom-line fanatics will do just that and feel themselves
justified for skipping usability testing altogether. But there is
just some things you cannot collect from analytics alone as we all
know.

And I'm not just saying this to keep myself employed, but there is
will always be a place for some sort of usability testing. I would
be interested in seeing what kind of data Offematica produces and
would probably incorporate it into my findings along with testing
results.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Posted from the new ixda.org
http://www.ixda.org/discuss?post=24040

31 Dec 2007 - 9:46am
sylvania
2005

I agree that analytics, no matter how well collected and detailed, will never be a replacement for user observation. This system sounds like a great augmentation to usability testing, but the nature of people as fuzzy, mutable, and often unpredictable creatures makes direct observation the ideal (and often only) way to really understand what's going on in their heads. Analytics can tell a lot about what a person does, but not what he thinks. And people often think differently than their actions suggest - sometimes radically so. The problem with analytic data is it can lie about the user's intentions and perceptions and it's often impossible to tease apart intended and unintended behaviour. (I have similarly-grounded issues with eye-tracking technology.)

I'm a designer; I don't conduct usability tests but I rely on them - as well as other types of data - to inform design, and I can't imagine abandoning user observation, even a little bit. Actually, I'd be somewhat skeptical of relying solely on this type of system for incremental design. (I'm also wondering if they used this on their own site.)

Offematica does sound really interesting, though.

Cheers,
Sylvania

31 Dec 2007 - 10:44am
Nicholas Iozzo
2007

Here is a great story about one of the earliest Usability tests on the Macintosh.

http://www.folklore.org/StoryView.py?project=Macintosh&story=Do_It.txt&sortOrder=Sort%20by%20Date&detail=high

It is a very short story and worth knowing about if you never heard it before. But the conclusions drawn from this test could have never be drawn from analysis of user behavior. It was drawn based on the facilitator making an observation and then questioning the user.

Nick Iozzo
Principal User Experience Architect

tandemseven

847.452.7442 mobile

niozzo at tandemseven.com
http://www.tandemseven.com/

From: Oleh Kovalchuke
Sent: Sun 12/30/2007 11:45 PM
To: IxDA Discuss
Subject: [IxDA Discuss] The death of web usability testing as we know it?

I was reading Supercrunchers [1], and came across Omniture's Offermatica
[2].

Offermatica does real time randomization and analysis of traffic from the
variants of the page. The randomization is important here for validity of
the results. The analysis is per session.

For instance, you want to see, how search box placement or font size affects
product sales. Make the layouts to be tested and see, which one increases
sales in real time. I think this could lead to an incremental microevolution
of layouts, not unlike the biological microevolution (and look what good
that has done...).

So the questions I have are these:

- Have people used Offermatica?
- Since it brings actual statistical analysis into usability testing
based on sales goals, doesn't it lead to the death (or at least to the
significant dent) of the conventional web usability testing (facilitator,
one-on-one etc.)? Put "something" on the web and start incrementing on daily
basis.

Oleh

PS I understand that my message sounds like astroturfing. It is not. I am
only concerned about Nielsen's welfare.

------------------
[1] http://tinyurl.com/2nzrov
[2] http://www.omniture.com/products/optimization/offermatica - they use
rather long copy as well as tiny font on that page, I wonder, if those were
optimized with their software.
________________________________________________________________
*Come to IxDA Interaction08 | Savannah*
February 8-10, 2008 in Savannah, GA, USA
Register today: http://interaction08.ixda.org/

________________________________________________________________
Welcome to the Interaction Design Association (IxDA)!
To post to this list ....... discuss at ixda.org
Unsubscribe ................ http://www.ixda.org/unsubscribe
List Guidelines ............ http://www.ixda.org/guidelines
List Help .................. http://www.ixda.org/help

31 Dec 2007 - 11:56am
Jeff Seager
2007

Oleh:
"Since it brings actual statistical analysis into usability testing
based on sales goals, doesn't it lead to the death (or at least to
the significant dent) of the conventional web usability testing
(facilitator, one-on-one etc.) ? Put "something" on the web and
start incrementing on daily basis."

I'm not opposed to metrics and analytical tools, but the real
challenge is in the inferences we make from the numbers. To make
those inferences "in real time" based on a tiny sampling is going
to guarantee a lot of knee-jerk reactions that will probably run in
circles.

"Actual statistical analysis" may yield entirely unreliable data if
it's based on faulty assumptions or faulty interpretation. So to my
mind, this tool may be the birth of something useful but it isn't
the death of anything.

There's always somebody trying to sell us shortcuts or an easy way
out. Caveat emptor.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Posted from the new ixda.org
http://www.ixda.org/discuss?post=24040

31 Dec 2007 - 12:06pm
Paul Trumble
2004

Multivariate and A/B test tools have been around and available for almost as
long as the web has been around. I've had metrics vendors tell me that
Amazon runs as many as 8 different versions of their checkout system for the
purpose of testing. And yet we still do usability tests.

I do plenty of both in my team. They each have their place and I find them
quite complementary. Split tests can be more expensive then they seem at
first, and more often than not don't produce a clear winner. Analytics
tools of all types are very good at telling you what happens but not so good
at telling you why it happens. And they can't tell you what will happen at
all.

Paul Trumble

31 Dec 2007 - 12:13pm
Fred Beecher
2006

On 12/30/07, Oleh Kovalchuke <tangospring at gmail.com> wrote:
>
>
> For instance, you want to see, how search box placement or font size
> affects
> product sales. Make the layouts to be tested and see, which one increases
> sales in real time. I think this could lead to an incremental
> microevolution
> of layouts, not unlike the biological microevolution (and look what good
> that has done...).

While you can use it for this, to my knowledge its main purpose is to test
campaign effectiveness. While I haven't used Offermatica in particular, I
have been involved in campaigns where we did A/B testing with a limited
audience. What we did was to test two different creative
treatments/messages. It was pretty helpful, as one message converted
significantly better than the other. When we launched the campaign
nationwide, we did so with the campaign that converted better. Obviously. :
)

I think the key to successfully using a product like this is to use it for
something with an extremely limited scope. I'd never, EVER use it for a
whole Web site. There are just too many confounding factors in that sort of
situation that prevent a product like this from giving actionable
information. Sure, it would give you data, but any conclusions you'd draw
from that data would be basically made up. Now, if there is a particular
area of the Web site that has a particular conversion or activity of
interest, then that might be a situation in which such a product can yield
actionable information.

- Fred

31 Dec 2007 - 12:52pm
Julie Palmer
2007

When I was at Delta, the usability manager implemented Optimost,
another multi-variant testing tool. Its use was limited to pages or
web apps where conversion could be clearly measured, and we used the
results to inform and improve design on maintenance releases.

In no way, however, did it serve to replace usability testing.
Usability test plans were still created to guide design decisions
prior to product launch. We also implemented OpinionLab to gather
customer feedback to try to help us determine the "why" behind the
"what" we saw with the Optimost numbers.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Posted from the new ixda.org
http://www.ixda.org/discuss?post=24040

30 Dec 2007 - 7:08pm
Paul Sherman
2006

Hi Oleh, IxDA'ers,

I've been mostly lurking / posting jobs, but I couldn't resist responding to
this post.

Offermatica sounds like an interesting tool, and I'd like to learn more
about it. I could see myself & my team adding it to our toolbox.

I doubt, however, that it or tools like it will replace or put a dent in the
amount of exploratory - or even summative - usability testing that occurs. I
have a few reasons for this claim. The first is an economic argument, the
others are more conceptual.

1. The UX industry as a whole has been experiencing steady growth since the
dot-bomb days. Even if tools like this do supplant *some* 1-1, in-person,
facilitated usability testing, I think the most that would happen is that
the rate of growth for that kind of utesting would slow slightly. Caveat, I
don't have hard (or even semi-soft) numbers at my disposal, this is IMO.

2. Not all web sites are transactional, so the metric of interest is not
always going to be sales or conversion. So the tool will not always be
appropriate.

3. Like data from survey research, data derived from behavioral traces (as
opposed to actual observation of behavior) always leads you back to the
"why?" question. Knowing *why* people did something on a site - or at least
having a good idea about why, which is what you can get with in-person
testing - provides interaction designers with much richer and actionable
guidance for making big, significant conceptual design decisions.

There's much to be said for tweaking your way to optimization, however. If
you're already confident that your navigation and process flows are solid,
I'm all for a/b/multi testing a la Offermatica and tools like it. I just
don't think they put a stake through the heart of ye olde utesting methods.

- Paul Sherman

-----Original Message-----
From: Oleh Kovalchuke
Subject: [IxDA Discuss] The death of web usability testing as we know it?

I was reading Supercrunchers [1], and came across Omniture's Offermatica
[2].

Offermatica does real time randomization and analysis of traffic from the
variants of the page. The randomization is important here for validity of
the results. The analysis is per session.

For instance, you want to see, how search box placement or font size affects
product sales. Make the layouts to be tested and see, which one increases
sales in real time. I think this could lead to an incremental microevolution
of layouts, not unlike the biological microevolution (and look what good
that has done...).

So the questions I have are these:

- Have people used Offermatica?
- Since it brings actual statistical analysis into usability testing
based on sales goals, doesn't it lead to the death (or at least to the
significant dent) of the conventional web usability testing (facilitator,
one-on-one etc.)? Put "something" on the web and start incrementing on
daily
basis.

Oleh

31 Dec 2007 - 2:48pm
D. Keith Robinson
2007

The answer, in my opinion, is "no" if for no other reason than that
designers need to spend time with their audiences in order to develop
the empathy and knowledge-base they need to make good design
decisions.

While something like Offermatica certainly seems useful and looks to
provide some very interesting data (under certain circumstances, as
others have mentioned) I'd say it'd have to be used in conjunction
with usability testing to get the most possible value out of it.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Posted from the new ixda.org
http://www.ixda.org/discuss?post=24040

31 Dec 2007 - 3:51pm
Stew Dean
2007

Hi Oleh,

First up I have to come clean and say I've never been a fan of
usability testing as for existing sites you should know already why
the site is faiiing. I've worked with third party usability test
results and 9/10 the user comments are often interesting but the
conclusions terrible.

In short I don't feel there's any need for usability testing anyway
when there are much better user research techniques such as using
multiple competitor sites with users to give you an idea of what
they're realy after and good old fashioned talking to people away from
their computer one and one to get an idea of what they're after.

If you are going to test then test prototypes but then only in context
(context is king).

That off my chest by the sounds of it they're doing what I have known
as A/B testing - which has been used for advertising for quite a few
years now, except they're changing items on a site.

I personally did a stint working on a small .com site and found that a
lot can be learnt from making small changes and keeping an eye on the
web stats. I could see how many people where on the site and each day
have a look at the figures to see where people where going on the site
(all using free software). I went on to work for large companies that
got web stats about once a month and hadn't really set up their four
figure software to give then anything useful.

It looks like someone is just trying to give a solution for these
dysfunctional organsiations (that's most large organisations) that
have too many barriers between decision makers and the raw information
about who is visiting their sit and what they're doing whilst they're
there.

Personally I'm happy for 'usability testing' to die out and be
replaced by user research and this approach, which can be done without
the software, sounds like a good technique. It fits into the whole
idea of 'put it live and see what happens' which can ofter bring you
results quicker and better than getting 10 folks in a room with a half
silvered mirror and expecting them to tell you how to redesign your
site.

--
Stewart Dean

1 Jan 2008 - 2:09am
Phil Chung
2007

Stew and Oleh,

Usability testing / user research has remained integral in our early testing of
new websites, software, and IVRs, despite increasing usage of analytics.
You can't always get the "why" with analytics that usability tests /
user research can more aptly provide, as Julie and others brought up. It may not be as effective or reliable as A/B testing or
post-release analytics for uncovering every design problem (as supported by Molich's CUE studies), but I also don't see how the latter two can be
carried out more quickly for a new product.

Oleh, who says there is no statistical analysis based on sales goals with usability testing?

"Since it brings actual statistical analysis into usability testing based on sales goals,"

Furthermore, there are often
significant business implications with the higher profile A/B tests or
analytics work (e.g., the results are going to show up on some director's
desk), such that running a quick usability study is usually worth the trouble. We have done studies even with
very small sample sizes (due to time constraints), predicting major issues and the "why" behind them that
resurfaced x10 in the pilot. Needless to say, such occurrences increased our credibility with the business.

Stew, you mention that the conclusions from the third party usability tests were terrible -- I see that as a problem with your third party's ability to interpret user comments / behavior and translate them into effective design recommendations, not usability testing itself (I hope they are not on this list!).

One caveat, I do see a lesser role for usability testing, when design standards and guidelines become solidified / routine. For example, if you're simply designing a variant or slightly updating an existing system, a full usability test is probably less appropriate. This would be where analytics and experience as the designer should tell you what works and what doesn't.

Phil Chung

"First up I have to come clean and say I've never been a fan of
usability testing as for existing sites you should know already why
the site is faiiing. I've worked with third party usability test
results and 9/10 the user comments are often interesting but the
conclusions terrible.

In short I don't feel there's any need for usability testing anyway
when there are much better user research techniques such as using
multiple competitor sites with users to give you an idea of what
they're realy after and good old fashioned talking to people away from
their computer one and one to get an idea of what they're after."

____________________________________________________________________________________
Looking for last minute shopping deals?
Find them fast with Yahoo! Search. http://tools.search.yahoo.com/newsearch/category.php?category=shopping

2 Jan 2008 - 3:17am
Oleh Kovalchuke
2006

A few comments based on the biological microevolution analogy.

1) Just like in biology the "why" question is irrelevant for the final
measurable outcome as long as the outcome is optimized (sales, click through
– whatever is measured). Just like in evolution the why question is
important and will be debated in academia.

2) People are notoriously poor at articulating their motivations (look at
the industry of psychoanalysis for example). Let's take the position of the
search box example. Would the users be able to say why positioning the box
on top of the left nav is better than in the right top corner?

3) The difference in sales between the two design choices could be
statistically significant 5%. Depending on volume 5% could translate into
millions of dollars. Would conventional testing detect the 5% difference in
the outcome?

4) Finally, I think, the randomized real time statistical analysis could
lead to the modified, more agile development process. I wonder if we call it
"Optimized Design Drift" or, perhaps, "IDE - Intelligent Design Evolution"?

Thanks everyone for your discussion.

--
Oleh Kovalchuke
Interaction Design is the Design of Time
http://www.tangospring.com/IxDtopicWhatIsInteractionDesign.htm

2 Jan 2008 - 4:38am
Steven Pautz
2006

While I believe this kind of approach can certainly be valuable,
particularly when dovetailed with other techniques, I wonder if its use
might shift some teams' or stakeholders' perspectives more towards
short-term factors.

If all of the metrics are session-based (as they seem to be), the results of
this kind of analysis would likely favor a design that best suits
"immediate" customers -- customers who perform some desired behavior *now*,
rather than in a future visit -- potentially at the expense of designs
which, hypothetically, might provide a better balance between short-term and
long-term concerns.

While this would probably be good for many businesses and contexts, it's
also fair game for the "it depends" card -- much like the balance between
hype-oriented writing versus "just the facts" writing.

Are there any passive, (semi-)automated tools or techniques out there that
"get at" more long-term factors like brand loyalty, community health, etc?
Is such a thing even possible to automate, or can it only be performed by a
strategically-minded person/team?

----------------------------
Steven Pautz
spautz at gmail.com
http://stevenpautz.com/

2 Jan 2008 - 2:36pm
Oleh Kovalchuke
2006

A few additional thoughts.

It is an *intelligent* design evolution: the initial design and the
possible iterations would be informed by the known best practices. The best
practices themselves would be put to the test.
The "Dolt=Do It" example, Nick has mentioned, could have been tested against
"Submit", "OK", "Apply", "Run" and other possible buttons, as well as with
different typefaces.

If this is an example of design microevolution analogous to the natural
selection of genes, are there examples of more disruptive macroevolution of
design analogous to the runaway sexual selection in nature (peacock tail,
our own brain)? I think the disruptive macroevolution of design can be found
in the relatively insulated design research of academia and in the outcomes
of Google's 70/20/10 time allocation model [1].

Oleh

[1] http://www.workforce.com/section/01/feature/25/24/14/index.html

On Jan 2, 2008 1:17 AM, Oleh Kovalchuke <tangospring at gmail.com> wrote:

> A few comments based on the biological microevolution analogy.
>
>
>
> 1) Just like in biology the "why" question is irrelevant for the final
> measurable outcome as long as the outcome is optimized (sales, click through
> – whatever is measured). Just like in evolution the why question is
> important and will be debated in the academia.
>
> 2) People are notoriously poor at articulating their motivations (look at
> the industry of psychoanalysis for instance). Let's take the position of the
> search box example. Would the users be able to say why positioning the box
> on top of the left nav is better than in the right top corner?
>
> 3) The difference in sales between the two design choices could be
> statistically significant 5%. Depending on volume 5% could translate into
> millions of dollars. Would conventional testing detect the 5% difference in
> the outcome?
>
> 4) Finally, I think, the randomized real time statistical analysis could
> lead to the modified, more agile development process. I wonder if we could
> call it "Optimized Design Drift" or, perhaps, "IDE - Intelligent Design
> Evolution"?
>
>
>
> Thanks everyone for your discussion.
>
> --
> Oleh Kovalchuke
> Interaction Design is the Design of Time
> http://www.tangospring.com/IxDtopicWhatIsInteractionDesign.htm
>

2 Jan 2008 - 11:57pm
Jared M. Spool
2003

On Jan 2, 2008, at 3:17 AM, Oleh Kovalchuke wrote:

> 1) Just like in biology the "why" question is irrelevant for the final
> measurable outcome as long as the outcome is optimized (sales,
> click through
> – whatever is measured). Just like in evolution the why question is
> important and will be debated in academia.

"Why" is irrelevant until you're asked to repeat a past success.

Jared

Jared M. Spool
User Interface Engineering
510 Turnpike St., Suite 102, North Andover, MA 01845
e: jspool at uie.com p: +1 978 327 5561
http://uie.com Blog: http://uie.com/brainsparks

3 Jan 2008 - 12:19am
Oleh Kovalchuke
2006

On Jan 2, 2008 9:57 PM, Jared M. Spool <jspool at uie.com> wrote:

> "Why" is irrelevant until you're asked to repeat a past success.
>
>

Indeed. And armed with the updated "best practices" for the starting
designs, you repeat the process with new (or perhaps the same goals), but
inevitably in a new context.

It is *intelligent* design evolution after all - the starting point will not
be a primordial soup, the perpetually updated results will differ as
well with exception of niche platypuses and with the unenviable fate
of design dinosaurs.

Oleh

3 Jan 2008 - 1:25am
Jared M. Spool
2003

On Jan 3, 2008, at 12:19 AM, Oleh Kovalchuke wrote:

> On Jan 2, 2008 9:57 PM, Jared M. Spool <jspool at uie.com> wrote:
>
> "Why" is irrelevant until you're asked to repeat a past success.
>
>
> Indeed. And armed with the updated "best practices" for the
> starting designs, you repeat the process with new (or perhaps the
> same goals), but inevitably in a new context.
>
> It is intelligent design evolution after all - the starting point
> will not be a primordial soup, the perpetually updated results will
> differ as well with exception of niche platypuses and with the
> unenviable fate of design dinosaurs.

Let me know how that works for you.

:)

Jared

Jared M. Spool
User Interface Engineering
510 Turnpike St., Suite 102, North Andover, MA 01845
e: jspool at uie.com p: +1 978 327 5561
http://uie.com Blog: http://uie.com/brainsparks

3 Jan 2008 - 1:44am
Oleh Kovalchuke
2006

Me? I am walking with dinosaurs.

Oleh

On Jan 2, 2008 11:25 PM, Jared M. Spool <jspool at uie.com> wrote:

>
> On Jan 3, 2008, at 12:19 AM, Oleh Kovalchuke wrote:
>
> On Jan 2, 2008 9:57 PM, Jared M. Spool <jspool at uie.com> wrote:
>
>
> > "Why" is irrelevant until you're asked to repeat a past success.
> >
> >
>
> Indeed. And armed with the updated "best practices" for the starting
> designs, you repeat the process with new (or perhaps the same goals), but
> inevitably in a new context.
>
> It is *intelligent* design evolution after all - the starting point will
> not be a primordial soup, the perpetually updated results will differ as
> well with exception of niche platypuses and with the unenviable fate
> of design dinosaurs.
>
>
> Let me know how that works for you.
>
> :)
>
> Jared
>
> Jared M. Spool
> User Interface Engineering
> 510 Turnpike St., Suite 102, North Andover, MA 01845
> e: jspool at uie.com p: +1 978 327 5561
> http://uie.com Blog: http://uie.com/brainsparks
>
>

--
Oleh Kovalchuke
Interaction Design is the Design of Time
http://www.tangospring.com/IxDtopicWhatIsInteractionDesign.htm

4 Jan 2008 - 2:48pm
Dante Murphy
2006

<quote>
While you can use it for this, to my knowledge its main purpose is to
test campaign effectiveness. While I haven't used Offermatica in
particular, I have been involved in campaigns where we did A/B testing
with a limited audience. What we did was to test two different creative
treatments/messages. It was pretty helpful, as one message converted
significantly better than the other. When we launched the campaign
nationwide, we did so with the campaign that converted better.
Obviously. :
)

I think the key to successfully using a product like this is to use it
for something with an extremely limited scope. I'd never, EVER use it
for a whole Web site. There are just too many confounding factors in
that sort of situation that prevent a product like this from giving
actionable information. Sure, it would give you data, but any
conclusions you'd draw from that data would be basically made up. Now,
if there is a particular area of the Web site that has a particular
conversion or activity of interest, then that might be a situation in
which such a product can yield actionable information.

- Fred
</quote>

When I was with GSI Commerce, we began to use Offermatica for exactly
this purpose. At the time it could not be used to evaluate a workflow;
the architecture was that you would designate an area of a tested page
as an "m-box", then put two or more variants in play and analyze their
results. Multiple campaigns could be evaluated at a single time, but
the data became suspect if the campaigns overlapped.

One of the other shortcomings was that the page architecture had to be
consistent; only the contents of the "m-box" could change. They could
be different sizes, but your layout had better accommodate that or the
whole page would get wonky (which happened a few times).

It was not by any means a way of testing experience or workflow; and
unless they have significantly improved their system architecture, it
still isn't. All you can do is see if one way of treating one section
of one page is better than another way. Hardly a substitute for
usability testing or good interaction design, and probably not a
particularly valuable tool.

Dante Murphy | Director of Information Architecture | D I G I T A S H E
A L T H
229 South 18th Street | Rittenhouse Square | Philadelphia, PA 19103 |
USA
Email: dmurphy at digitashealth.com
www.digitashealth.com

Syndicate content Get the feed