A 2006 study by Matthew Salganik and his co-researchers at Princeton suggests that a huge amount of effort is wasted in many different areas of human endeavor, and the resulting outcomes are far less than optimal — but that there is a simple algorithm that could fix both problems.
In Salganik’s experiment, users of a music-rating site were divided at random into eight artificial “worlds”. All of the users in all eight worlds had access to the same library of songs, which they could download or recommend to their peers, but they could only recommend songs to other users in the same world. Also, each user could view the number of times that a song had been downloaded, but only by other users in the same world.
The goal was to see whether certain songs could become popular in some worlds while languishing in others, despite the fact that all groups consisted of randomly assigned populations that all had equal access to the same songs. The experiment also attempted to measure the “merit” of individual songs by assigning some users to an “independent” group, where they could listen to songs and choose whether to download them, but without seeing the number of times the song had been downloaded by anyone else; the merit of the song was defined as the number of times that users in the independent group decided to download the song after listening to it. Experimenters looked at whether the merit of the song had any effect on the popularity levels it achieved in the eight other “worlds”.
The authors summed it up: “In general, the ‘best’ songs never do very badly, and the ‘worst’ songs never do extremely well, but almost any other result is possible.” They also noted that in the “social influence” worlds where users could see each others’ downloads, increasing download numbers had a snowball effect that widened the difference between the successful songs and the unsuccessful: “We found that all eight social influence worlds exhibit greater inequality — meaning popular songs are more popular and unpopular songs are less popular — than the world in which individuals make decisions independently.” Figures 3(A) and 3(C) in their paper show that the relationship between a song’s merit and its success in any given world — while not completely random — is tenuous.
As economists Richard Thaler and Cass Sunstein put it, in their book Nudge, when describing the Salganik study:
“In many domains people are tempted to think, after the fact, that an outcome was entirely predictable, and that the success of a musician, an actor, an author, or a politician was inevitable in light of his or her skills and characteristics. Beware of that temptation. Small interventions and even coincidences, at a key stage, can produce large variations in the outcome. Today’s hot singer is probably indistinguishable from dozens and even hundreds of equally talented performers whose names you’ve never heard. We can go further. Most of today’s governors are hard to distinguish from dozens or even hundreds of politicians whose candidacies badly fizzled.”
This squares intuitively with how we talk about success in entertainment and sometimes in politics, where an artist or a candidate gets a “big break” that leads to them becoming a star. The first-order effect of this randomness is, of course, that the songs (or politicians, or fads) which achieve breakout success are usually not the ones that are the “best” by any objective measure (for example, the songs that would have gotten the highest rating in the “independent” group), and thus consumers are not best served by the random process. The second-order effect is that most people know the amount of luck required to succeed in those industries — even extremely talented and extremely dedicated musicians often labor in obscurity for years before they achieve their own big break — and so most talented musicians and other artists don’t even bother trying to achieve breakout success in those fields. (By contrast, a person who would make a good doctor or a good programmer is rightly encouraged to go into those fields, because even though those professions don’t offer the same opportunities for stardom, they also don’t require a lot of luck.)
So much we know. But consider what would happen instead of Google (or Pandora or Spotify) implemented something like the following algorithm. (It would have to be implemented by a large company with a built-in audience. For reasons that will be obvious, it wouldn’t work on a small scale with a handful of users.)
Consider just the subset of users interested in a particular genre, like alt-rock. When an artist submits a new song to the system, the song is pushed out to a small random subset of those users. (The system is agnostic about how this is done — you can recruit volunteers to rate songs, you can pay them a modest fee to rate songs, or you can mix the songs in seamlessly with the music they’re already streaming and hope that some of them will rate the song afterwards.) Each user in this sample rates the song without seeing the ratings that have been given by other users in the sample, in the same way that Salganik’s experiment used a sample of blind ratings to determine the “objective merit” of a song. The sample doesn’t have to be large relative to the whole population, it just has to be large enough for the average rating to be statistically meaningful. If the average rating is high enough, the system pushes the song out to all other alt-rock fans in the system, which in this system we define as “success”.
It’s a simple algorithm, but consider how radically this would alter everything we think we know about what it takes to be “successful”. You don’t need to “network” and “build connections” and ask particular highly-connected users to help promote your song, because that won’t affect the rating. You can’t tell a mob of your Facebook friends to “come and vote for my song”, because only people in the random sample can cast votes. Generally, the conventional wisdom that “You need to get out there and hustle” — which is to say, engage in economically non-productive activity that doesn’t improve the product quality — is rendered useless. It’s a waste of time to do anything except focus on the actual quality of the song, insofar as it will be reflected in the average rating.
This eliminates the two problems listed at the beginning. The songs that get pushed out to the widest audience are the ones that provide the most value to users (as determined by the highest average rating), and highly talented content producers can get into the game and see their songs become popular without waiting for a lucky break. (Of course it also lets artists find out very quickly how their current output ranks against other artists; not everybody is the “highly talented content producer” that they think they are.)
This “random-sample-voting” system has other desirable properties:
– It preserves the average quality of everyone’s song feed. Suppose there are 20,000 alt-rock fans in the system, and suppose it takes 20 users to get a statistically meaningful rating of a new song. Then “bad” submissions will only waste the time of 20 users, whole “good” submissions will get pushed out to all 20,000, so the ratio of good-to-bad songs in the average person’s song feed would be 1000 to 1. In practice, users could also address their threshold depending on the minimum average rating that they want to listen to, so that even songs with a mediocre rating will get pushed out to some users, but top-rated songs get pushed out to many more. (This is the part that requires a large user base. If your user base is only 20 users, then every new submission wastes everyone’s time, and there’s no point.)
– It’s non-gameable. With a user base of 20,000, even if an artist tries to stuff the ballot box by signing up 1,000 fake accounts, that still only constitutes 5% of users selected in the average voting sample. This is a weakness of most sites driven by user ratings — most of them make it relatively easy to create fake accounts on your own behalf to vote up your own content. (It’s also a reason that this system only works with a large built-in user base.)
– It’s scalable; the system works regardless of the number of users or the number of submissions, as long as the number of users (who are available to rate songs in a random sample) grows in proportion to the number of submissions. (If the system gets overwhelmed with too many low-quality submissions, you can always charge submitters a fee, which gets redistributed in part to the raters who are selected in each random sample. Hopefully this would cut down on the number of junk submissions, but even if it doesn’t, at least the raters will be adequately compensated for the time spent rating the junk.)
– It’s non-arbitrary. As long as the sample of raters is large enough, the average rating achieved by a song should be close to the average rating it would get from the population as a whole. There’s almost no luck required to achieve success (although, conversely, an artist who gets a bad rating can’t blame it on bad luck either). As an extension of this, since the feedback is rapid (there’s no reason you couldn’t get an average rating from your song in just a few minutes), an artist can tweak their song to address any criticisms and see if the average rating gets higher on re-submission.
The algorithm could be applied to other types of content as well, such as:
– Abuse reports. Twitter’s abuse report problem is frequently in the news: They get too many abuse reports to review accurately, so that some unlucky people have their tweets removed or their accounts suspended for non-offenses, while most egregious harassers go unpunished. With a random-sample-voting system, users could sign up as volunteer “reviewers” of abusive tweets. If a tweet is flagged in an abuse report, with a specific citation of the terms of service clause that it violated (and the reporting party agrees for it to be shared with volunteer reviewers), it gets shared with a random sample of volunteers; if some threshold percentage of reviewers agree that it is abusive, then the complaint is upheld.
– Tutorial webpages and videos. From working with some entrepreneurs who have built very popular blogs and how-to websites or Youtube channels, I can tell you first-hand that everyone in the industry knows that success is not determined by quality of content; the content just has to be good enough, and the rest of the time is spent on optimizing pages for Google search results, negotiating links from higher-traffic sites — in short, hustling in ways that have no bearing on the product quality. With a random-sample-voting system, raters could rate a tutorial based on how well the directions worked for them, and the highest-rated tutorials could be released to a wider audience, with no “hustling” required from the content creator.
– Economic arguments! Some of Paul Krugman’s columns may be objectively “better” than some of Steve’s blog posts, but if we were to use a random-sample-voting system to determine every week whether Steve’s or Paul’s column would be pushed out to millions of New York Times readers, I’d like to think Steve would win some of the time. If the problem is that the average person isn’t qualified to review the arguments, then the random sample could be taken from among economics PhDs, or economics professors at accredited universities — the algorithm works even if the voting audience is limited by some criteria.
– Obama’s “We The People” website. Currently, the White House promises to respond to any petition which gets more than 100,000 signatures — however, the government can dismiss any petition by saying, quite validly, “Just because you were able to get a mob of 100,000 people to sign a petition, that just means you’re very good at hustling, or you got very lucky — it doesn’t mean there’s any merit to your idea.” But if an idea gets an extremely high average rating from a random sample of volunteers who have signed up to rate the submissions, it would at least be worth looking into why so many people support an idea that the government has not yet implemented. Again though, it would be worth having the idea reviewed by qualified experts — perhaps a random sample of economic PhDs could review the proposal alongside a random sample of regular citizens. If the two groups diverge widely in their ratings, that could mean either that (a) economics professors have lost their humanity or (b) regular citizens need some economic education, but at least the result would be interesting.
(I suspect the White House might be nervous that this system would actually work too well. Under their current system, it’s easy for them to dismiss a petition even if it crosses the 100,000 signature mark. But if a random sample survey shows that a change in economic policy is supported by over 80% of economics professors, it’s a lot harder to come up with an excuse for ignoring that.)
In one sense, applying this algorithm to any type of social-media site would be a radically new practice; on the other hand, scientists have been using the basic independent-random-sampling algorithm for centuries. Scientists use it because they care about eliminating arbitrariness and getting the objectively best answer to the question that they’re investigating; there’s no reason we can’t use the same algorithm in any other scenario where we care about the result. Especially if it would eliminate the economic waste associated with “hustling”, gaming the system, and waiting around for a lucky break.