FairTest: Discovering Unwarranted Associations in Data-Driven Applications

Florian Tramèr, Vaggelis Atlidakis, Roxana Geambasu, Daniel Hsu, Jean-Pierre Hubaux, Mathias Humbert, Ari Juels, and Huang Lin

EuroS&P, 2017



In today’s data-driven world, programmers routinely incorporate user data into complex algorithms, heuristics, and application pipelines. While often beneficial, this practice can have unintended and detrimental consequences, such as the discriminatory effects identified in Staples’ online pricing algorithm and the racially offensive labels recently found in Google’s image tagger. We argue that such effects are bugs that should be tested for and debugged in a manner similar to functionality, performance, and security bugs. We describe FairTest, a testing toolkit that detects unwarranted associations between an algorithm’s outputs (e.g., prices or labels) and user subpopulations, including protected groups (e.g., defined by race or gender). FairTest reports any statistically significant associations to programmers as potential bugs and ranks them by their strength while accounting for known explanatory factors. We designed FairTest for ease of use by programmers and integrated it into the evaluation framework of SciPy, a popular library for data analytics. We used FairTest experimentally to identify unfair disparate impact, offensive labeling, and disparate rates of algorithmic error in six applications and datasets. As examples, our results reveal subtle biases against older populations in the distribution of error in a real predictive health application, and offensive racial labeling in an image tagger.

  author   =   {Tram{\`e}r, Florian and Atlidakis, Vaggelis and Geambasu, Roxana and Hsu, Daniel and Hubaux, Jean-Pierre and Humbert, Mathias and Juels, Ari and Lin, Huang},
  title   =   {FairTest: Discovering Unwarranted Associations in Data-Driven Applications},
  booktitle   =   {EuroS\&P},
  year   =   {2017}