There's a lot to be said for the “wisdom of crowds.” I heard James Surowiecki speak about it at the 2008 Agile Conference and it's a fascinating topic. At the recent Agile Australia conference, Doug Blue from Seek (an international job-seeker website group) spoke about “letting the audience decide” the fine-tuning of the user interface as a last step in usability testing. By selecting a small group of users, Seek is able to monitor crowd wisdom to help select the preferred interface. Some of the changes are almost invisible to the users, for example, tiny pink pixel highlights on an “email me more jobs” link increased its usage by 27 percent. Doug also explained his approach in this interview.
While this type of testing is becoming a standard across the industry, it is very easy to get it wrong. Doug explained that something as simple as having two drop down lists on a salary search impacted the number of job applications they received. There is also the danger in mixing user interface testing with functionality testing in a live environment, as Google and Delta recently discovered.
What would today's equivalent be for the famous trash icons of early PC desktops? If you utilize crowd wisdom, the choice would have to be an icon pair—the thumbs up/thumbs down, like/dislike icons. Recently, Google, a pioneer of crowd wisdom interface testing, appeared to overstep the mark by replacing these icons in YouTube. A Google spokesperson said, “We are currently running experiments showing different Google+ buttons in YouTube in order to provide the best user experience.” They replaced the thumbs up/thumbs down buttons with a G+ logo and +1, Like, or Share. This drew a strong response (with some strong language, as well) from Wil Wheaton comparing the change to forcing people to “like” something before letting them see it. While some people saw this as a logical architectural step, merging Google+ functionality into a YouTube interface goes beyond standard crowd wisdom interface testing.
Annoying any users is bad enough, but Delta Airlines alienated some of its most valued customers in an exercise designed specifically to do the opposite. In a blog post, Delta explained that it was all part of “improving the search and shopping experience,” with a phased installation of search as they were “careful not to disrupt the booking experience for our best customers.”
When you are doing these live user experiments, one key idea is to only use a small sample (say 5 percent) to limit the impact of change. Delta apparently chose a much larger group—all non-frequent flyers—to release a major functionality change that needed to be understood and verified with internal testing. Whether it was due to lack of testing or lack of follow up on internal testing, Delta now has close to a worst-case scenario with a lot of bad press, a U.S. Department of Transportation investigation, and a massive exercise in trying to reassure the frequent travelers who were meant to be unaffected.
What went wrong? The frequent flyer searches appear to have been tweaked to focus on quality, giving the fastest return trip in a smaller time window (typically on a more expensive day trip), while the standard searches returned a much wider variety of return trips (including cheaper red eye and multiple stop options). As the cheapest flights are highlighted in search results, the impression was that frequent flyers were getting overcharged. It also took Delta three weeks to undo the change—an eternity in the Internet Age.
Using the “wisdom of crowds” is a great test tool to tweak user interfaces and it’s becoming a standard practice. The New York Times has a dedicated site, beta620, that it is using as a testbed for new apps, and in Europe, crowd wisdom is being used to design a folding chair. Think through your experiments well and you'll be well rewarded. If you throw functional change into the mix, make sure you test it first and understand the impact of the change on your users!