Tuesday, May 9, 2017

Statistics and Media Bias

A normal disTRObution

Lies, damn lies, and statistics. You've all heard the old saying, popularized by Twain yet originated by who knows. This post will rely fairly extensively on the basic theory underlying this statement, to show how very true it is, yet how little it really reveals.

A post about media bias and statistics? How out of place on the reTROview blog! Indeed. Yet I read this piece in the New York Times and just couldn't help myself! I don't really have any other place to put this, and I think everyone would benefit from reading this, so here goes! Before you move on to my piece, however, you should read the linked one thoroughly.

Before I begin, I want to assure you that I am not picking on the New York Times, nor am I picking on traditional news at all. I think the linked piece is a very informative and accurate article, yet illustrates some critical lessons about media production and consumption in any era. I think if you give my post a thorough and unbiased read, you will be able to understand my core points, which I will lay out in the following bullet points. I will then give a brief defense of these bullet points sequentially, using the linked piece as an example.
  • All media is biased, and misunderstanding or incomplete presentation of statistics is a big part of this bias.
  • The primary forms of media bias are not malicious misinformation, but are:
    • Focus
    • Tone
    • Causal inference
  • The well informed person must, therefore, read widely and critically.
All Media is Biased

We have a fantastic (in the Tolkien sense of the word) idea in the West of the unbiased journalist. The theory goes that a journalist has the responsibility to report the facts of what is news, distance him/herself as much as possible from the story, obtain credible source reports on what happened, and only include sources whose information can be corroborated by information obtained apart from the source. There are probably more elements than this, but journalistic ethics isn't my forte. I can say safely that these are all elements of our public sense for what neutral journalism should be, however.

Indeed, all of these elements are helpful and useful checks on media bias, ensuring that the articles and stories published as a result of journalistic efforts have a basis in reality, rather than in the underlying ethical belief systems of the paper, website, or author. Most journalism, in fact, adheres to these overarching responsibilities of journalistic neutrality, yet that does not mean that media is not biased in several ways, none of which are truly controlled by these formal responsibilities.

The number of stories which end up being debunked as "totally false" for any major news network are relatively small. In the 24 hour a day news cycle, the number of true scandals that show up of people falsifying sources or intentionally publishing false information (like Jayson Blair) is so small that it gives me relative faith in the news media. Yet simply because there are so few outright falsehoods does not mean that media is always reliable across the political spectrum. My argument is that the bias of media comes in just about every story which you read, and rarely takes the form of true lies, but rather comes in more subtle ways, and the use of statistics plays a huge role in this.

Statistics themselves are fascinating. Having studied stats at a more than undergraduate level, I find the simple theory of stats to be magnificently powerful. Collecting data to support arguments is essential to grounding arguments in empirical reality, and statistics are one of the more powerful forms of data to support an argument.

Yet statistics on their own are meaningless. Here are a few illustrative examples to explain this point. My fastest time beating Pokemon Emerald is in the 9 hour range. Having played it through probably 5-10 times, I considered this to be an excellent time. Then I went and looked at the best times ever posted in Pokemon Emerald...and I discovered that the world record for completing the game is 2 hours, 33 minutes, and 23 seconds. Wow. Not only was my time not that great, but I literally wasted nearly 3 out of every 4 of the seconds which I used in the game, compared to the best outcomes. All of the sudden, my stat means little, because I lacked context in which to understand it. My universe consisted of only my attempts, but when related to the universe's attempts, mine fell dreadfully flat.

Here's another example of statistical context, which may be more familiar to most people. My wife knows next to nothing about baseball. I once told her that a batter for my beloved Chicago Cubs had a batting average of .300, which means that he gets a hit 3 out of 10 times. To me, and to any baseball fan, this is an important number. Hitters who spend their careers around the .300 mark are not only good hitters, but generally will make the Hall of Fame if they are able to do it long enough. To my wife, however, this was not impressive. She felt that if a batter fails more often than they succeed, they cannot be very good. Yet in the entire history of baseball, no batter has ever hit above .500 (or getting hits 50% of the time) for an entire season. Nor have they ever come close. This is due to the fact that pitchers, on average, are successful more often in getting batters out than batters are in getting the best of pitchers. 30%, thus, is a remarkable achievement, but to someone not steeped in the context of baseball lore, it represents more failures than successes. So context is vitally important to understanding statistics.

In addition to context, however, theory is vitally important to understanding causation, which is important to understanding statistics. I'm sure that the age old saying that correlation does not equal causation will have made its way into your brain at some point. But it's such an ill-understood phrase that it warrants discussion here.

Did you know that ice cream sales correlate magnificently with violent crime? It's true. As you can see in the graph below, these two numbers track with each other very closely. They both hit lows at about the same time of the year, and both hit highs at the same time of the year. So what's our policy choice? Ban ice cream sales! Violent crime is solved!

So this is why property values are so low around ice cream shops...

Or maybe not. The critical reader may think, I know that the two are related, but why are they related? What is it about ice cream's nature that causes it to relate to violent crime? Or maybe it is violent crime that is causing people to buy ice cream? Without a good theoretical understanding of the relationship between the two, any statistical relationship between the two is meaningless.

So, we are left with three choices. Figure out why ice cream is causing violent crime, figure out why violent crime is causing ice cream sales, or figure out which other, yet unnamed phenomena causes both! My bet's on the last one.

The common identifier between violent crime and ice cream sales is that they both peak during the hottest months of the year. Since we cannot establish any theoretical link between crime and ice cream, in the sense that we can't comprehend why one would cause the other, then we have to move towards asking ourselves what factor might be causing both ice cream sales and violent crime. The common denominator, it turns out, is heat. We still don't know definitely why violent crime peaks during the summer, but we at least have some possible theories.

All of this is to say that statistics are meaningless without context or theory. But believing that statistics alone are powerful, even when understood through the lens of context and theory, underestimates our ability to deceive through the selection of statistics. The primary reason for this is that focus, tone, and causal inference are all greatly suspect to the bias of the author, and that this doesn't have to necessarily be a nefarious thing.

Focus

The focus of a story is hugely important, and, I would argue, the primary source of media bias, even when stories are being reported in an accurate manner. I remember a time during the Obamacare debates when everyone was running stories about the pros and cons of the bill. The first story I saw was from MSNBC, which ran a piece on a family whose child got sick, and the private insurance company refused to pay for the medical bills of the family, despite their seemingly legitimate needs that appeared to be covered under the terms of their insurance policy.

The second story I saw was from Fox News, which ran a story on how broken Veteran's Administration health care is in the U.S., with discussions about long wait times, horrible outcomes for patients, chronic understaffing, and unclean facilities.

The lessons that the reporters wanted you to take away from the stories were clear. Either private insurance was untrustworthy and may leave your family in a problem when you most need it, or the government may not do a job that's any better, and perhaps even worse than the private insurance.

As I thought about this conundrum, it struck me that the most fascinating thing about it is that both reports were true, and fit the norms of journalistic ethics. The VA is not terribly well run based on what I've heard, and people frequently claim to have legitimate claims denied by private insurance companies for a wide range of reasons. But both stories reached out to representatives from alternative parties, both presented verifiable facts, both had reputable sources to back up their claims...you get the idea.

The problem is that each reporter needs to focus on something. He/she can probably only pump out an article a day, and perhaps a longer form piece every two weeks or so. There are literally thousands of newsworthy things going on each day. How does the reporter choose which things to cover? The answer is largely whatever interests him/her, and what his boss wants him/her to cover. Those are not bad motivations, but they are not exactly objective ones.

In the case of the Times piece linked earlier, the reporters choose to focus on one aspect of the problems facing children in public schools in New York City: whether or not "every student has a real chance to attend a good school". And they find that, in fact, every student does not have a real chance, and support it with very fascinating and, I believe, accurate data. Black and Hispanic students in New York City have a greatly reduced chance of being selected to attend the premier public schools. This is problematic, and should be to anyone.

Yet could you imagine another reporter approaching the same data with a different focus, and writing a very different story? I can. Consider this extended selection from the piece:

"Under a system created during Mayor Michael R. Bloomberg's administration, eighth graders can apply anywhere in the city, in theory unshackling them from failing, segregated neighborhood schools. Students select up to 12 schools and get matched to one by a special algorithm. This process was part of a package of Bloomberg-era reforms intended to improve education in the city and diminish entrenched inequities.

There is no doubt that the changes yielded meaningful improvements. The high school graduation rate is up more than 20 points since 2005, as the administration of Mayor Bill de Blasio has built on Mr. Bloomberg's gains. The graduation gap between white and black or Hispanic students, while still significant and troubling, has narrowed."

The piece, in good journalistic fashion, mentions this data. It mentions it once more at the bottom of the piece as well. But the reality is that in an article spanning thousands of words, it dedicates little attention to this success, and a huge amount of attention to the failure of the continued existence of disparate outcomes for minority children. If the Wall Street Journal had been publishing this piece, I could imagine that the percentages would be flipped, with extensive coverage of universal gains (20 points of increased graduation rates) and the relative gains of blacks and hispanics (narrower gaps in graduation rates compared to the pre-reform era).

But it needs to be pointed out that neither the Times article, nor the hypothetical Wall Street Journal article would be wrong, nor would they be violating any sort of journalistic ethics by publishing these articles. Both are accurate and necessary voices, but they do contain bias, simply by their choice of focus.

Tone

The bias in the NYT article isn't simply in terms of choice of focus, there also exists very telling signs as to what the article's predisposition is in terms of its tone. The article begins right away by telegraphing exactly what their perspective is: school choice in the city is not delivering. Simply reading the title clearly demonstrates this to the careful reader. "The Broken Promises of Choice in New York City Schools". They aren't exactly playing softball here! Despite the existence of caveats regarding the success of the program, there is probably 85% of the article that is decidedly negative in tone. Of the two primary "case studies" of students in the article, neither winds up leaving their home of the Bronx for a good school. Sukanya ends up picking schools in the Bronx at the top of her list due to family pressures (she wanted to go to Manhattan, but her parents didn't like the idea), and Jayda ends up getting placed in a school that was 6th on her list, and doesn't leave the Bronx either. The selection of these two cases shows what kind of focus and tone the reporters wanted to have, but also match the data regarding the likelihood of leaving the Bronx for a school elsewhere. So it's not a clear cut case of omission as much as it is an interactive force of tone and focus, which are largely unavoidable.

Causal Inference

The most challenging form of bias to exorcise is in inferring causality, and it is rarely mentioned in any news story whatsoever. Consider the following comment from the story (posted by a Steve L from Chestnut Ridge, NY), which nicely summarizes this challenge:

"As a high school teacher, I can state definitively that it is simplistic to say schools have improved because graduation rates have gone up. Several changes were enacted during this time period to ensure that more kids graduated without necessarily having learned any more. Among them:
1. Targeted credit recovery--wherein students who failed a course merely have to make up just enough material to get a passing grade, rather than retake the whole course. Students can now do this for up to three failed courses. This did not exist prior to the Bloomberg reforms.
2. Conversion chart scoring on Regents exams. Passing Regents exams means getting a 65 on the exam, but 65 no longer means 65%. The use of conversion charts, in which a scorer looks up the final grade by seeing where the multiple choice total intersects the long-answer total, guarantees that a certain pre-determined number of students pass every exam. On the basic algebra exam, the actual percentage score that converts to a "65" is now somewhere between 33-40%.
3. More exceptions have been made for general and special education students about which Regents they need to graduate.
4. Many schools now have special days built into the schedule, especially around exam time, for students to complete work that they didn't do when they were supposed to. Before that, students were held to a higher degree of responsibility and suffered the consequences if they didn't.

And there are others, but I only have 1500 letters."

Steve poses a real challenge to the hypothetical Wall Street Journal approach, by suggesting that we are perhaps not measuring what we think we are. Steve suggests that graduation rates are not in and of themselves a good measure of educational quality, and that even narrowing those gaps does not necessarily mean that our children are being well educated. While this is his intended critique, however, his comment can be taken at an even deeper level than that.

Maybe graduation rates are going up, and gaps narrowing, but perhaps it's not due to the choice elements of the reforms at all! Perhaps it is these other policy changes which are causing the improvements in the graduation rate. And his clear suggestion is that these other changes are not increasing the quality of the education so that more people can graduate, it's that the changes are lowering the bar so that more people can graduate.

Additionally, consider another alternative causal factor briefly mentioned in the piece, home life. This selection acknowledges family factors as important, yet still insists that graduation rates are a good measure of school success:

"Graduation rates are not a perfect proxy for education quality. In many schools, students arrive far behind, and it is a major effort to help them graduate on time. Elsewhere, ninth graders show up on Day 1 doing work at grade level or above, so the steps required to get them diplomas are less onerous. And it is difficult to say how much of a school’s success is because of what happens within its walls — the curriculum, the teachers, the leadership — and how much is because of advantages children bring from home.

But graduation remains a meaningful measure of a school, and of the opportunities it provides. If parents felt they had another option, how many would be happy to send their children to a school where more than a quarter of students do not graduate?"

The critic may say that simply looking at variations in graduation rates between schools may be less of a measure of the quality of the school, and more of a measure of the quality of the community in which the school lies. If the parents in an area are less likely, on average, to support their kids' success, then the schools in the area are also less likely to graduate students at a high rate. While this critique is acknowledged in the article, the predisposed bias of the authors come through in the degree to which they discuss this causal theory, compared to how frequently they discuss the more institutional problems with school choice (challenging mobility between schools, extremely complex admissions process, poor preparation for elite schools at the pre-high school level, etc). Again, this is not misleading, nor is it unethical. It is simply a mostly unavoidable choice involved with having humans report the news.

In any case, isolating the causal factors that lead to the statistics we love can be super challenging. And when those statistics are always chosen by a focus driven by personal desires, and spoken through the tone of reporters with a dog in the fight, we must be very careful to thoughtfully consider the logical underpinnings of the statistics, why we might think two variables have a relationship, and what alternative variables may be explaining the phenomena.

How Should We Then Live?

Noted 20th Century Christian theologian and apologist Francis Schaeffer is best known in America for his book, How Should We Then Live?, which asks several important questions, such as if we accept Christianity as true, then what impact does that have on the Christian's view of church, state, culture, and community? How should we interpret the important ideas in Western civilization according to the teachings of the Bible? I would like to ask a similar question here.

If it is true that media bias always exists, and that the fantastical picture of journalistic neutrality will always be colored by bias, then how should we then live?

Most people (myself included) are extraordinarily tempted to live in a bubble. Bias makes us feel safe. People on the left likely felt gratified by the Times piece, and annoyed that NY schools haven't gone far enough to fix the problems facing Black and Hispanic students in the city. People on the right likely didn't read it or didn't give it a fair shot by not considering the valid things which the article accomplishes, and the objective truths which it presents (albeit colored by bias).

Yet this is a bad outcome, for a number of reasons. First, I believe it encourages people to see the other side as filled with lunatics or evil people bent on destroying the way of life of the America they know and love. If all you read is the Times and Slate, or the Wall Street Journal and National Review, you can easily fall into the trap where you see America either as a nation controlled by big bankers and others with big money trying to stick it to the American working class by amassing as big a fortune as they can, or to see America as the victim of a decades long attempt by left wing radicals and their allies in the mainstream media to supplant traditional and healthy American values and replacing them with the false gods of tolerance and political correctness. The truth likely lies in the middle, but as you increasingly embrace one view over the other, your ability to keep friendly relations with people who have different opinions becomes more and more challenging. 

Second, I think that this life in a bubble causes people to think that they somehow do not have bias at all. This is demonstrably false, and causes a bizarre worldview in which "everyone" shares their  opinions, which only encourages the notion that the other team is just extraordinarily perverse. This is reinforced when I see people on social media saying things like "How is Trump winning if not a single person in my Facebook feed is supporting Trump?" Perhaps no one on your feed is supporting Trump because you have silently sorted all of your relationships in life to be with people like you (for the record, I don't like Trump at all)?

Third, it reinforces ideas which are false, simply because you hear them from a reliable source. You can feel safe in turning off your BS detector when you are hearing your information from a trusted source, but the reality is that that detector should never be turned off. Just today, I saw an individual say in a comments section of an article say that it was a fact that illegal immigration costs the United States 900 billion dollars every year. This would account to roughly 80,000 in net losses per year for each and every undocumented immigrant currently in the country. This number seems fairly far fetched, seeing as our total economic output is about 18 trillion, which would put America's expense of undocumented immigrants at about 6% of our GDP. But a simple look at the published research on this subject would demonstrate that even the most devoted immigration restrictionists put the figure under 100 billion per year, a more believable figure.

So, how should we then live?

I have a few suggestions for you, and I will do my best to follow them as well.
  1. Keep your BS detector on. All journalism is biased, even your favorite.
  2. Read your least favorite, but still reputable newspaper regularly. If you like the Times, that's fine, but the Wall Street Journal also has excellent coverage. If you find a story that seems like a slam dunk, read some opposing coverage of it.
  3. Do a little background research on statistics. It will really help in being able to carefully read and understand even good reporting in a critical light.
Anyway, happy hunting for the truth! It's hard, but I believe that it's worth it.

-TRO

No comments:

Post a Comment