Quantcast
Channel: Innovations – Steven Landsburg | The Big Questions: Tackling the Problems of Philosophy with Ideas from Mathematics, Economics, and Physics
Viewing all articles
Browse latest Browse all 11

A Modified Algorithm for Evaluating Logical Arguments

$
0
0

A Guest Post

by

Bennett Haselton

In a previous guest post I had argued that we should use a random-sample-voting algorithm in any kind of system that promotes certain types of content (songs, tutorials, ideas, etc.) above others. By tabulating the votes of a random sample of the user base, this would reward the content that objectively has the most merit (in the average opinion of the user population), instead of rewarding the content whose creators spent the most time promoting it, or who figured out how to game the system, or who happened to get lucky if an initial “critical mass” of users happened to like the content all at the same time. (The original post describes why these weaknesses exist in other systems, and how the random-sample-voting system takes care of them.)

However, this system works less well in evaluating the merits of a rigorous argument, because an argument can be appealing (gathering a high percentage of up-votes in the random-sample-voting system) and still contain a fatal flaw. So I propose a modified system that would work better for evaluating arguments, by adding a “rebuttal takedown” feature.

Arguments are still voted up or down based on the votes of an initial random sample of voters, like songs. But anyone can post a rebuttal to an argument, focusing on what they believe is a flaw. (The rebuttal should focus on a specific step in the argument that is believed to be flawed, and should not just be a global rebuttal arguing in the opposite direction.) The original poster (OP) can try to incorporate the objection into their argument, but if the OP doesn’t concede the point, then the dispute goes to a “jury” of other users, also selected by the random-sample-voting method. (The “jury” that decides between the OP and the rebuttal can be made up of laypersons, or limited to qualified experts, depending on the context.) If the rebuttal wins, then the original post is “disqualified” for containing a fatal flaw that the OP declined to correct.

This is an algorithmic distillation of something Professor Landsburg wrote in The Big Questions: “If you’re objecting to a logical argument, try asking yourself exactly which line in that argument you’re objecting to. If you can’t identify the locus of your disagreement, you’re probably just blathering.”

Before going into more detail, I want to make a fairly audacious claim: I believe that this algorithm is the optimal algorithm for any type of argument-rating or information-sorting problem. In cases where something like this algorithm is already implemented in practice, the implementation works insofar as it stays close to this ideal, and breaks down insofar as it deviates from this ideal.

Besides the benefits of random-sample-voting outlined in the previous blog post, this algorithm relies on several assumptions, which I believe are reasonable:

1) A good argument can be presented without any flaws (where a “flaw” is defined such that a majority of a random sample of peers agree that it is a flaw, so that a “rebuttal takedown” pointing out the flaw will win). That doesn’t mean that a valid argument has to be flawless on the first draft or else it’s worthless. But if the argument is valid, it should be possible to correct the flaws and the argument will still stand. On the other hand, an invalid argument (especially in rigorous subjects like mathematics) will often contain a subtle flaw, and if you try to correct the flaw in one place, that will introduce an inconsistency with another section of the argument, such that no matter how hard you try, you cannot fix the argument without a flaw existing somewhere. For this reason, a single flaw ought to be enough to disqualify an argument, if the OP can’t fix it. But if an argument is sound at its core and just happens to contain some minor errors in its presentation, then those errors can be fixed incrementally with feedback from the community.

2) When voting on whether “rebuttal takedown” invalidates an argument, we assume the jury of users will vote more honestly (less influenced by their own biases) if they are focused on a specific point of disagreement, than if they are asked to vote globally between two essays arguing opposite points of view. If each side of a debate presents a series of statements that vary between true, ambiguous, and blatantly false, there is a temptation for a voter to skew their perception towards the conclusion that they already want to believe in. That’s why I wouldn’t be very interested in a vote between the OP’s essay arguing one side, and a rebuttal that simply argues the opposite side from scratch. But when users are asked to focus on the correctness of a specific statement, or even a specific step in one’s reasoning, I believe they can be more clear-eyed. (For instance, I agree with most instances in which Politifact has rated some of Clinton’s statements “False” and Trump’s statements “True”, even though I still think Trump makes overwhelmingly more false statements of the two.) Moreover, if an argument is laid out rigorously and a reader disagrees with it, it is incumbent on them to find what they think is the flaw — and if they can’t find a flaw (at least, not one where a majority of peers would agree that it’s a flaw), an intellectually honest reader would be more inclined to think there’s some truth to the argument after all.

The modified system — random-sample-voting with rebuttal-takedowns — has a number of desirable properties:

1) The rigorous arguments that “win” in this system are the ones where no one has (yet) found a flaw.

2) If a reader does find a flaw, and the jury of their peers votes that it is indeed a flaw, then there is no need to get into a side debate about whether the flaw “really” undermines the whole argument, or is just incidental. If the OP won’t fix the flaw, then the argument is invalidated; if the OP (or someone else who wants to take up the mantle) really believes that the flaw is incidental, they should modify the argument to take out the flaw.

3) Under existing debate and forum systems, if one person makes an argument, and 100 other people make counter-arguments, it is not practical for a casual observer to know whether one of the counter-arguments is pointing out a fatal flaw in the OP’s argument, without reading through and analyzing all of them. But it is possible with this system. Consider the case of Argument A which has been posted in a typical present-day discussion forum (not using this algorithm), and has 100 counter-arguments posted in response, and none of the counter-arguments are valid (where a “valid” counter-argument is one that would win the vote in a rebuttal takedown). Meanwhile, argument B has been posted in a discussion forum and has 100 counter-arguments posted, and one of them is valid (and the objection is fatal to argument B, i.e., argument B is not valid). To an observer, it would be cumbersome to go down the rabbit hole into every counter-argument posted to both arguments, in order to find the one valid objection. And thus there is no way for a casual observer to know that argument A is valid but B is not. But with the new proposed system, a casual observer would be able to see that argument A had not been defeated by any objections posted by users, whereas argument B had been defeated. (It’s still not possible for a casual observer to know whether A might someday be defeated by a rebuttal, and similarly, there is a time period where argument B will look “valid” because nobody has posted a valid objection yet. But I believe that this is the best we can do algorithmically.)

Here are some example scenarios where I believe this algorithm would be optimal:

1) At the /r/lifehacks/ subreddit, users can submit simple but little-known techniques for solving a problem or otherwise improving your life. Users can browse the ideas sorted with the top-voted ideas listed first, and vote ideas up or down or post comments (and the comments themselves can be voted up or down). So this obviously is subject to the Salganik effect, where a coincidental flurry of initial upvotes can get an idea in front of more people, and trigger a snowball effect of more upvotes, even if the idea is not any better than others that were submitted at the same time. On the day that I visted the subreddit, one of the highest-rated ideas was to save thousands of dollars on your lifetime mortgage payment by cutting the payment amount in half, and then paying every 2 weeks instead of once per month.

(Now, as for the idea itself: The reason this “works” is because “every two weeks” is slightly more often than “twice a month”, so this just amounts to paying a larger mortgage payment every month, which you can already do anyway if you want to — but you might not want to, if you think the markets are a better investment. So this “life hack” just uses a simple math error to disguise a mundane financial choice, which may or may not be a good one, depending on your mortgage interest rate and how the markets end up doing. And yet the idea got thousands of upvotes and sat for days at the top of the leaderboard.)

Under a naive “random-sample-voting” system, without rebuttal takedowns, this idea might have been gotten a high initial rating as well. However, in a system that allows rebuttal takedowns, some user probably would have posted the counter-argument outlined above, and the rebuttal would have probably been approved, which would have disqualified the parent idea.

2) Companies like Google receive so many security vulnerability reports from the public, that occasionally a valid security report (including some submitted by friends of mine) can be lost in the shuffle.

Under a simple random-sample-voting system, incoming reports would be reviewed by a random subset of Google employees who are qualified to evaluate them, and the reports that get the most “upvotes” will get closer scrutiny. However, this could lead to errors if an incoming security report seems to depict a serious security flaw (thus, getting a large number of upvotes), but the security report is based on a subtle faulty assumption (for example, if the “exploit” depends on being able to run untrusted code on the user’s machine — this is usually not the case in the real world, and when it is, the attacker already has control of the user’s machine anyway, so the “security vulnerability” is moot).

With the rebuttal takedown feature, if an eagle-eyed Google employee happened to spot the faulty assumption, and their peers voted in favor of that “rebuttal”, then the security “vulnerability” would be disqualified, even if it received a high proportion of upvotes from the other employees who evaluated it.

3) In my original post about random-sample-voting, I had mentioned in passing that it could be used by Facebook or Reddit to handle abuse reports — if a user flags a post as violating the site’s Terms of Service, then rather than being reviewed by a company employee (which creates a bottleneck), the post could be reviewed by a random subset of the site’s users who had volunteered as abuse report mediators. Even in this context, there are scenarios where random-sample-voting would work better with a rebuttal-takedown feature.

Suppose a user posts a picture of Barack Obama with a Hitler mustache and uniform, and another user reports that post as a Terms of Service violation for being “racist”. I think that if this were put to a simple random-sample-voting jury, many users would agree with that assessment. But with rebuttal takedowns, a user can post a “counter-argument” to the original abuse report, essentially saying: “There is nothing in this post that references race. You can think it’s in poor taste to compare Obama to Hitler, but people can (and did) do the same to white politicians. It’s stupid, but it’s not racist.” Given a moment’s consideration, hopefully most people would agree with this response, and thus the “rebuttal” would invalidate the abuse report, which I would consider to be the correct result.

4) Academic journal debates. Academic journal review is one of the few systems that uses something like random-sample-voting to evaluate content — submissions are sent to a random subset of peers, anonymized from the author and from each other, who submit their “ratings”. But after a paper is published, another peer in the field might spot a flaw that largely invalidates the argument in the original paper. The academic review system is what I had in mind, when I said at the outset that some systems track very closely to the random-sample-voting-with-rebuttal-takedowns algorithm, and those systems are flawed only insofar as they deviate from the algorithm. In this case, the flaws are that (1) even if one paper has been “taken down” by a rebuttal from another, there is no “invalidating marker” applied to the original paper (other readers can still find it in archived journals, and the original author can still list it among their “published papers”); (2) there is no way for a reader to submit more minor improvements to the paper (correcting typos, or suggesting clarification of a difficult section of the paper); (3) after a paper is published, there is no sense of how many readers have read the paper without finding a flaw. In the algorithm I’ve been proposing, each time a person reads the argument and can’t find a flaw, they could mark it with a tentative pseudo-endorsement, signifying, “I read through this and I couldn’t find any mistakes.”

I first had a fully-formed version of this idea about 10 years ago, and ever since then it’s struck me how often I’ve run across a problem or a process that seemed like it could be optimized by some variation of this algorithm. In the same way that some economists believe that markets are almost universally applicable to problems of resource-allocation, I think this algorithm is almost as widely applicable to problems of information-sorting. And as Professor Landsburg has written before, there is no efficient marketplace to compensate people for good ideas and good arguments, which means that we probably need another system to bubble the best ones to the top — perhaps this algorithm is one way to do that.

Print Friendly, PDF & Email

Share


Viewing all articles
Browse latest Browse all 11

Trending Articles