Bias in Data Analysis #musedata #musetech #data #bias

This is the second in a series of posts about confronting bias. These #longreads use narrative to help bring up bias in an accessible manner.

As Director of the Art Museum of New South Overthere, you are constantly being asked to make decisions based on data. Sure, you had your last math class in 10th grade, and then avoided math through your PhD. But, you are a specialist, I mean in lantern slides, but still, you got this, right?

Read these short scenarios and suss out where bias might come in.

Scenario 1

You are dying to know how much people like lantern slides. You write out a survey for your staff to deploy. You want to be direct with your visitors so that you don’t waste their time. So, you asked the people in the Joe Bright Memorial Lantern Slide Gallery (and broom closet).

  • “What do you like about lantern slides?”
  • “Is there anything that you don’t like about lantern slides?”
  • “What would you like to see with the lantern slide display?”

You were happy to find out that there is nothing that they don’t like about lantern slides. They also love your lantern display as it is. The only thing they want is more information, which is what you thought. They just wished they could know more about the slides! How wonderful to know what you do for a living is so relevant for people.

Explanation

In this case, you have several problems.  First, your survey questions are constructed with a particular slant towards lantern slides. This a situation of interview bias. It’s as they say in legal shows; you are leading the witness. When you construct survey questions you don’t want to tip the participants off to the “right” answer. People have an inherent need to please, and so they will answer in a way that seems correct.

Additionally, there is a selection bias at work here. You went to the gallery where you hope to make changes. On one hand, you are being proactive. However, you are skewing your data. You have a sampling error at play. A better study would interview not just current visitors to the lantern slide gallery but also those in the museum who are not currently going to the lantern slide gallery.  In other words, you want visitors and potential visitors to draw a complete picture of the situation.

______________________________________________________

Scenario 2

Your first meeting this morning was with the board. This nice old lady, Sweetie Monroe, heiress to the great Marshmallow Mills fortune, was hoping you could explain why you don’t have any students in the galleries.  Now, you have never seen Sweetie upright before 1:00 pm in the morning. But, you also know that the school scheduling staff member, Peaches LaPew, is busy every morning. Last week, she tried to get you to do a Kindergarten tour, because you had more students than staff.

You’re not going to be able to show Ms. Monroe children in the flesh (you aren’t a miracle worker), so you ask your staff to do a little comparative analysis. Your head of Marketing/ Audience Research/ Programming & Security, Joe Exhaustino, has emailed you a super long report. Does he understand how busy you are? You don’t have time to go through this like you were in school. Luckily, it has a clear summary.  You have plenty of kids coming. Fabulous.

Explanation

In this case, I have bad news for me. You didn’t look too closely at your data. You used the data like a yes-man. This is an example of choice-supportive data. If you looked more closely at the data, you would notice that only 4 percent of visitors are 18 and under. Most of those are school groups. Now, I don’t know what your measure of success is, but 4 percent of total visitors seems extremely low.

______________________________________________________

Scenario 3

Before you can get to your chance at regaling Ms. Monroe about your fabulous student tours, you find yourself stunned by numbers. You are sitting at a breakfast meeting with the directors of all the local organizations. The head of the Community Development Corporation is sharing a graphic. Apparently, the average percentage of family visitors at museums is 10 percentage. Eek. You get nervous. So, you turn to the guy next to you, the Director of Coffee Cups & Porcupine baskets. He smiles and then says, “Oh, yes, that’s just average. We are at 18%.” You leave the meeting despondent, and shoot off a quick email to your grant office/ gardener to get money right now for school tours.

Explanation

In this case, the data was combined inappropriately, though you wouldn’t necessarily know it. You didn’t have all the information when you sat in that meeting and looked the graph.  The number crunches decided more numbers were better. But, in doing so, they didn’t use like categories. In this case, the number cruncher didn’t separate out school groups. Only two of the four museums do school tour. This means that in those museums children are coming in though out the week without adults. They will have higher numbers of children than those who don’t do school tours. The lesson is that you need to thoughtful in combining data. There are other challenges with combining data. If you aren’t careful, when you aggregate data, you can accidentally contradict what the original data said. This is called the Simpson Paradox.

______________________________________________________

Scenario 4

You are hoping to buy more media adds. You call on Joe Exhaustino again, but this time with your demographic numbers. He gives you the classic bell chart with mostly mid-aged people attending the new exhibition. No surprises. So, you will just use digital adds to get more young people. Finally, an easy decision. After the ads go out, you take a turn in your Lantern Slide gallery. Something odd is up. The gallery is full of really old men. You go back to Joe and ask him if his numbers are right.  He shows you his numbers. He had used a good-sized sample. He crunched the numbers and he ended up with a graph that didn’t look right. So, Joe removed the outliers, the old people.

Explanation

Joe has been doing numbers for years. And, he assumed that the attendance numbers should conform to a bell curve. This is called the Non-Normality bias. But, another bias was in play here.  Instead of investigating the outliers, they disregarded those numbers. Joe did better on his second crack at it. Along with the quantitative data, he looked at surveys. Turns out the lantern slide gallery had become a mecca for the over 95 set. Practically, everyone in the state in that age demographic comes to the museum to check out those sweet slides, particularly on “free coffee Friday”.

Share the Post:

Related Posts

%d bloggers like this: