Much has been said about how Big Data can positively change business operations. The premise is that it improves efficiency and helps managers make better decisions. While I agree with this general assessment, big data can be a double-edged sword. It helps on one hand; but also raises issues on privacy, discrimination and bias on the other. I am going to start a series of blog posts on the potential pitfalls of using big data.
For a recent job advertisement in finance located at Washington DC, Verizon, a US telecom company, targeted their promotion “on the Facebook feeds of users 25 to 36 years old who lived in the nation's capital, or had recently visited there, and had demonstrated an interest in finance.” Such targeted ads raised questions of fairness to older workers and many critics suggested it violated the federal Age Discrimination in Employment Act of 1967. Such inadvertent biases by algorithms based in part on demographic features have come to be known as “machine bias." Propublica reported that Facebook was allowing housing advertisers to exclude viewers by race. Amazon also came under criticism when it rolled out Amazon Same day deliveries in cities -- their algorithms inadvertently excluded black neighborhoods. Cathy O’Neill in her best-selling book Weapons of Math Destruction provides multiple examples of such algorithmic injustice – instances where job applicants were weeded out based on mental health; poor evaluation procedures for high school teachers; race, gender and economic biases in product offerings, etc. -- and often these algorithmic decisions affect the most vulnerable of populations.
I want to focus this post on one of the issues that Big data faces – one of bias and representation in datasets. I will illustrate one type of bias, that of the device collecting the data. The figure shows tweets by cell phone type in the city of Washington DC (I used Mapbox’s visualization tool on “Mobile devices + Tweets” based on 280 million tweets to generate the figure). Amazing, right? It shows a clear geographic segregation between iPhone and Android users and this bias could potentially be correlated with race, gender, or economic status.
Why is this an issue? The device collecting the data can introduce bias into the data. Thomas M. Menino, Boston’s former Mayor, for example, launched the “Street Bump” App to collect data from citizens on potholes. The user’s smartphone accelerometer would record “bumps” and their location as they drove along on the road – a potential sign of a pothole that the city could then fix. Wonderful idea, right? Only, early on the App reported disproportionate number of potholes in wealthier parts of town where residents owned smart phones and were digitally engaged. So much so that John Podesta, President Obama’s counselor, who led the administration review on big data bias, called out the App as an example on how big data can potentially discriminate. The App has since been fixed. Even such well-intentioned programs have the ability to impact the most vulnerable members (the smart phone ownership was the lowest among poorer and elder people) of our society. The point is bad or biased data, even if big, leads to bad decisions and flawed policy.