References

Abadi, Daniel, Rakesh Agrawal, Anastasia Ailamaki, Magdalena Balazinska, Philip A Bernstein, Michael J Carey, Surajit Chaudhuri, et al. 2014. “The Beckman Report on Database Research.” ACM SIGMOD Record 43 (3). ACM: 61–70.

Abazajian, Kevork N., Jennifer K. Adelman-McCarthy, Marcel A. Agüeros, Sahar S. Allam, Carlos Allende Prieto, Deokkeun An, Kurt S. J. Anderson, et al. 2009. “The Seventh Data Release of the Sloan Digital Sky Survey.” Astrophysical Journal Supplement Series 182 (2). IOP Publishing.

Abowd, John M. 2018. “The US Census Bureau Adopts Differential Privacy.” In Proceedings of the 24th Acm Sigkdd International Conference on Knowledge Discovery & Data Mining., 2867–7. ACM.

Abowd, John M., John Haltiwanger, and Julia Lane. 2004. “Integrated Longitudinal Employer-Employee Data for the United States.” American Economic Review 94 (2): 224–29.

Abowd, John M., Martha Stinson, and Gary Benedetto. 2006. “Final Report to the Social Security Administration on the SIPP/SSA/IRS Public Use File Project.” Suitland, MD: Census Bureau, Longitudinal Employer-Household Dynamics Program.

Acquisti, Alessandro. 2014. “The Economics and Behavioral Economics of Privacy.” In Privacy, Big Data, and the Public Good: Frameworks for Engagement, edited by Julia Lane, Victoria Stodden, Stefan Bender, and Helen Nissenbaum, 98–112. Cambridge University Press.

Adam Rose. 2010. “Are Face-Detection Cameras Racist?” http://content.time.com/time/business/article/0,8599,1954643,00.html. Accessed February 12, 2020.

Agrawal, Rakesh, and Ramakrishnan Srikant. 1994. “Fast Algorithms for Mining Association Rules in Large Databases.” In Proceedings of the 20th International Conference on Very Large Data Bases.

Ahlberg, Christopher, Christopher Williamson, and Ben Shneiderman. 1992. “Dynamic Queries for Information Exploration: An Implementation and Evaluation.” In Proceedings of the Sigchi Conference on Human Factors in Computing Systems, 619–26. ACM.

Al Aghbari, Zaher, Mohammed Bahutair, and Ibrahim Kamel. 2019. “GeoSimMR: A Mapreduce Algorithm for Detecting Communities Based on Distance and Interest in Social Networks.” Data Science Journal 18 (1): 13.

Alexia Fernandez Campbell. 2018. “Women accuse Facebook of illegally posting job ads that only men can see.” https://www.vox.com/business-and-finance/2018/9/18/17874506/facebook-job-ads-discrimination. Accessed February 12, 2020.

Ali, Muhammad, Piotr Sapiezynski, Miranda Bogen, Aleksandra Korolova, Alan Mislove, and Aaron Rieke. 2019. “Discrimination Through Optimization: How Facebooks Ad Delivery Can Lead to Biased Outcomes.” Proceedings of the ACM on Human-Computer Interaction 3. New York, NY, USA: Association for Computing Machinery.

Allison, Paul D. 2001. Missing Data. Sage Publications.

Anscombe, Francis J. 1973. “Graphs in Statistical Analysis.” American Statistician 27 (1): 17–21.

Apache Hadoop. n.d. “HDFS Architecture.” http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html. Accessed February 1, 2020.

Apache Software Foundation. n.d. “Apache Ambari.” http://ambari.apache.org. Accessed February 1, 2020.

Armbrust, Michael, Armando Fox, Rean Griffith, Anthony D. Joseph, Randy Katz, Andy Konwinski, Gunho Lee, et al. 2010. “A View of Cloud Computing.” Communications of the ACM 53 (4). ACM: 50–58.

Athey, Susan, and Guido W. Imbens. 2017. “The State of Applied Econometrics: Causality and Policy Evaluation.” Journal of Economic Perspectives 31 (2): 3–32.

Athey, Susan, and Stefan Wager. 2019. “Estimating Treatment Effects with Causal Forests: An Application.” https://arxiv.org/abs/1902.07409.

Barabási, Albert-László, and Réka Albert. 1999. “Emergence of Scaling in Random Networks.” Science 286 (5439). American Association for the Advancement of Science: 509–12.

Barbaro, Michael, Tom Zeller, and Saul Hansell. 2006. “A Face Is Exposed for AOL Searcher No. 4417749.” New York Times, August.

Barocas, Solon, and Helen Nissenbaum. 2014a. “Big Data’s End Run Around Procedural Privacy Protections.” Communications of the ACM 57 (11). ACM: 31–33.

———. 2014b. “The Limits of Anonymity and Consent in the Big Data Age.” In Privacy, Big Data, and the Public Good: Frameworks for Engagement, edited by Julia Lane, Victoria Stodden, Stefan Bender, and Helen Nissenbaum. Cambridge University Press.

Bastian, Hilda. 2013. “Bad Research Rising: The 7th Olympiad of Research on Biomedical Publication.” https://blogs.scientificamerican.com/absolutely-maybe/bad-research-rising-the-7th-olympiad-of-research-on-biomedical-publication/ Accessed February 1, 2020.

Batagelj, Vladimir, and Andrej Mrvar. 1998. “Pajek—Program for Large Network Analysis.” Connections 21 (2): 47–57.

Bell, Alex. 2012. “Python for Economists.” https://scholar.harvard.edu/files/ambell/files/python_for_economists.pdf Accessed February 1, 2020.

Bengio, Yoshua, Aaron Courville, and Pascal Vincent. 2013. “Representation Learning: A Review and New Perspectives.” IEEE Transactions on Pattern Analysis and Machine Intelligence 35 (8): 1798–1828.

Bhuvaneshwar, Krithika, Dinanath Sulakhe, Robinder Gauba, Alex Rodriguez, Ravi Madduri, Utpal Dave, Lukasz Lacinski, Ian Foster, Yuriy Gusev, and Subha Madhavan. 2015. “A Case Study for Cloud Based High Throughput Analysis of NGS Data Using the Globus Genomics System.” Computational and Structural Biotechnology Journal 13. Elsevier: 64–74.

Biemer, Paul P. 2010. “Total Survey Error: Design, Implementation, and Evaluation.” Public Opinion Quarterly 74 (5). AAPOR: 817–48.

———. 2011. Latent Class Analysis of Survey Error. John Wiley & Sons.

Biemer, Paul P., and Lars E. Lyberg. 2003. Introduction to Survey Quality. John Wiley & Sons.

Biemer, Paul P., and S. Lynne Stokes. 1991. “Approaches to Modeling Measurement Error.” In Measurement Errors in Surveys, edited by Paul P. Biemer, Robert M. Groves, Lars E. Lyberg, Nancy A. Mathiowetz, and Seymour Sudman, 54–68. John Wiley.

Biemer, Paul P., and Dennis Trewin. 1997. “A Review of Measurement Error Effects on the Analysis of Survey Data.” In Survey Measurement and Process Quality, edited by Lars Lyberg, Paul P. Biemer, Martin Collins, Edith De Leeuw, Cathryn Dippo, Norbert Schwarz, and Dennis Trewin, 601–32. John Wiley & Sons.

Bird, Ian. 2011. “Computing for the Large Hadron Collider.” Annual Review of Nuclear and Particle Science 61. Annual Reviews: 99–118.

Bird, Steven, Ewan Klein, and Edward Loper. 2009. Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit. O’Reilly Media.

Blei, David M., and John Lafferty. 2009. “Topic Models.” In Text Mining: Theory and Applications, edited by Ashok Srivastava and Mehran Sahami. Taylor & Francis.

Blei, David M., and Jon D. McAuliffe. 2007. “Supervised Topic Models.” In Advances in Neural Information Processing Systems. MIT Press.

Blei, David M., Andrew Ng, and Michael Jordan. 2003. “Latent Dirichlet Allocation.” Journal of Machine Learning Research 3: 993–1022.

Blitzer, John, Mark Dredze, and Fernando Pereira. 2007. “Biographies, Bollywood, Boom-Boxes and Blenders: Domain Adaptation for Sentiment Classification.” In ACL, 187–205.

Boy, Jeremy, Ronald Rensink, Enrico Bertini, Jean-Daniel Fekete, and others. 2014. “A Principled Way of Assessing Visualization Literacy.” IEEE Transactions on Visualization and Computer Graphics 20 (12): 1963–72.

Boyd, Danah, and Kate Crawford. 2012. “Critical Questions for Big Data: Provocations for a Cultural, Technological, and Scholarly Phenomenon.” Information, Communication & Society 15 (5). Taylor & Francis: 662–79.

Boyd-Graber, Jordan, Yuening Hu, and David Mimno. 2017. Applications of Topic Models. Edited by Doug Oard. Vol. 11. Foundations and Trends in Information Retrieval 2–3. NOW Publishers.

Börner, Katy. 2010. Atlas of Science: Visualizing What We Know. MIT Press.

Brady, Henry E. 2019. “The Challenge of Big Data and Data Science.” Annual Review of Political Science 22. Annual Reviews: 297–323.

Brewer, Eric. 2012. “CAP Twelve Years Later: How the ‘Rules’ Have Changed.” Computer 45 (2). IEEE: 23–29.

Broekstra, Jeen, Arjohn Kampman, and Frank Van Harmelen. 2002. “Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema.” In The Semantic Web—ISWC 2002, 54–68. Springer.

Brown, Clair, John Haltiwanger, and Julia Lane. 2008. Economic Turbulence: Is a Volatile Economy Good for America? University of Chicago Press.

Brynjolfsson, Erik, Lorin M. Hitt, and Heekyung Hellen Kim. 2011. “Strength in Numbers: How Does Data-Driven Decisionmaking Affect Firm Performance?” Available at SSRN: https://ssrn.com/abstract=1819486.

Buolamwini, Joy, and Timnit Gebru. 2018. “Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification.” In Proceedings of the 1st Conference on Fairness, Accountability and Transparency, edited by Sorelle A. Friedler and Christo Wilson, 81:77–91. Proceedings of Machine Learning Research. New York, NY, USA: PMLR.

Burkhauser, Richard V., Shuaizhang Feng, and Jeff Larrimore. 2010. “Improving Imputations of Top Incomes in the Public-Use Current Population Survey by Using Both Cell-Means and Variances.” Economics Letters 108 (1). Elsevier: 69–72.

Burt, Ronald S. 1993. “The Social Structure of Competition.” In Explorations in Economic Sociology. New York: Russel Sage Foundation.

———. 2004. “Structural Holes and Good Ideas.” American Journal of Sociology 110 (2): 349–99.

Butler, Declan. 2013. “When Google Got Flu Wrong.” Nature 494 (7436): 155.

Card, Stuart K., and David Nation. 2002. “Degree-of-Interest Trees: A Component of an Attention-Reactive User Interface.” In Proceedings of the Working Conference on Advanced Visual Interfaces, 231–45. ACM.

Carton, Samuel, Jennifer Helsby, Kenneth Joseph, Ayesha Mahmud, Youngsoo Park, Joe Walsh, Crystal Cody, CPT Estella Patterson, Lauren Haynes, and Rayid Ghani. 2016. “Identifying Police Officers at Risk of Adverse Events.” In Proceedings of the 22nd Acm Sigkdd International Conference on Knowledge Discovery and Data Mining, 67–76. KDD 16. New York, NY, USA: Association for Computing Machinery.

Caruana, Rich, Yin Lou, Johannes Gehrke, Paul Koch, Marc Sturm, and Noemie Elhadad. 2015. “Intelligible Models for Healthcare: Predicting Pneumonia Risk and Hospital 30-Day Readmission.” Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery; Data Mining (KDD 15). Association for Computing Machinery, New York, NY, USA, 1721–1730.

Catlett, Charlie, Tanu Malik, Brett Goldstein, Jonathan Giuffrida, Yetong Shao, Alessandro Panella, Derek Eder, et al. 2014. “Plenario: An Open Data Discovery and Exploration Platform for Urban Science.” Bulletin of the IEEE Computer Society Technical Committee on Data Engineering 37: 27–42.

Cavallo, Alberto, and Roberto Rigobon. 2016. “The Billion Prices Project: Using Online Prices for Measurement and Research.” Journal of Economic Perspectives 30 (2): 151–78.

Cecil, Joe, and Donna Eden. 2003. “The Legal Foundations of Confidentiality.” In Key Issues in Confidentiality Research: Results of an NSF Workshop. National Science Foundation.

Celis, L. Elisa, Lingxiao Huang, Vijay Keswani, and Nisheeth K. Vishnoi. 2019. “Classification with Fairness Constraints: A Meta-Algorithm with Provable Guarantees.” In Proceedings of the Conference on Fairness, Accountability, and Transparency, 319–28. FAT* 19. New York, NY, USA: Association for Computing Machinery.

Centers for Disease Control and Prevention. 2014. “United States Cancer Statistic: An Interactive Cancer Atlas.” http://nccd.cdc.gov/DCPC_INCA. Accessed February 1, 2016.

Chai, John J. 1971. “Correlated Measurement Errors and the Least Squares Estimator of the Regression Coefficient.” Journal of the American Statistical Association 66 (335). Taylor & Francis Group: 478–83.

Chandola, Varun, Arindam Banerjee, and Vipin Kumar. 2009. “Anomaly Detection: A Survey.” ACM Computing Surveys 41 (3).

Chapelle, Olivier, and S. Sathiya Keerthi. 2010. “Efficient Algorithms for Ranking with SVMs.” Information Retrieval 13 (3). Hingham, MA: Kluwer Academic Publishers: 201–15.

Chapelle, Olivier, Bernhard Schoelkopf, and Alexander Zien, eds. 2006. Semi-Supervised Learning. London, U.K.: MIT Press.

Chawla, Nitesh V. 2005. “Data Mining for Imbalanced Datasets: An Overview.” In The Data Mining and Knowledge Discovery Handbook, edited by Oded Maimon and Lior Rokach, 853–67. Springer.

Chen, Irene, Fredrik D Johansson, and David Sontag. 2018. “Why Is My Classifier Discriminatory?” In Proceedings of the 32nd International Conference on Neural Information Processing Systems, 3543–54. NIPS 18. Red Hook, NY, USA: Curran Associates, Inc.

Cheng, Justin, Lada A Adamic, Jon M Kleinberg, and Jure Leskovec. 2016. “Do Cascades Recur?” In Proceedings of the 25th International Conference on World Wide Web, 671–81.

Chetty, Raj. 2012. “The Transformative Potential of Administrative Data for Microeconometric Research.” http://conference.nber.org/confer/2012/SI2012/LS/ChettySlides.pdf. Accessed February 1, 2016.

Ching, Ravi, Averyand Murthy, Dmytro Dmytro Molkov, Ramkumar Vadali, and Paul Yang. 2012. “Under the Hood: Scheduling MapReduce Jobs More Efficiently with Corona.” https://www.facebook.com/notes/facebook-engineering/under-the-hood-scheduling-mapreduce-jobs-more-efficiently-with-corona/10151142560538920/.

Chouldechova, Alexandra. 2017. “Fair Prediction with Disparate Impact: A Study of Bias in Recidivism Prediction Instruments.” Big Data 5 (2): 153–63.

Chouldechova, Alexandra, and Aaron Roth. 2018. “The Frontiers of Fairness in Machine Learning.” arXiv Preprint arXiv:1810.08810.

Christen, Peter. 2012a. “A Survey of Indexing Techniques for Scalable Record Linkage and Deduplication.” IEEE Transactions on Knowledge and Data Engineering 24 (9). IEEE: 1537–55.

———. 2012b. Data Matching: Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection. Springer Science & Business Media.

Clarke, Claire. 2014. “Editing Big Data with Machine Learning Methods.” Paper presented at the Australian Bureau of Statistics Symposium, Canberra.

Cleveland, William S., and Robert McGill. 1984. “Graphical Perception: Theory, Experimentation, and Application to the Development of Graphical Methods.” Journal of the American Statistical Association 79 (387). Taylor & Francis: 531–54.

Clifton, Chris, Murat Kantarcioglu, AnHai Doan, Gunther Schadow, Jaideep Vaidya, Ahmed Elmagarmid, and Dan Suciu. 2006. “Privacy-Preserving Data Integration and Sharing.” In 9th Acm Sigmod Workshop on Research Issues in Data Mining and Knowledge Discovery, 19–26. ACM.

Cloudera. n.d. “Cloudera Manager.” https://www.cloudera.com/content/www/en-us/products/cloudera-manager.html. Accessed February 1, 2020.

Cochran, William G. 1968. “Errors of Measurement in Statistics.” Technometrics 10 (4). Taylor & Francis Group: 637–66.

Conor Dougherty. 2015. “Google Photos Mistakenly Labels Black People ’Gorillas’.” https://bits.blogs.nytimes.com/2015/07/01/google-photos-mistakenly-labels-black-people-gorillas/. Accessed February 12, 2020.

Corti, Paolo, Thomas J. Kraft, Stephen Vincent Mather, and Bborie Park. 2014. PostGIS Cookbook. Packt Publishing.

Crammer, Koby, and Yoram Singer. 2002. “On the Algorithmic Implementation of Multiclass Kernel-Based Vector Machines.” Journal of Machine Learning Research 2. JMLR.org: 265–92.

Crossno, Patricia J., Douglas D. Cline, and Jeffrey N Jortner. 1993. “A Heterogeneous Graphics Procedure for Visualization of Massively Parallel Solutions.” ASME FED 156. ASME: 65–65.

Cumby, Chad, and Rayid Ghani. 2011. “A Machine Learning Based System for Semi-Automatically Redacting Documents.” AAAI Publications, Twenty-Third IAAI Conference.

Czajka, John, Craig Schneider, Amang Sukasih, and Kevin Collins. 2014. “Minimizing Disclosure Risk in HHS Open Data Initiatives.” US Department of Health & Human Services.

Dean, Jeffrey, and Sanjay Ghemawat. 2004. “MapReduce: Simplified Data Processing on Large Clusters.” In Proceedings of the 6th Conference on Symposium on Opearting Systems Design & Implementation—Volume 6. OSDI’04. USENIX Association. http://dl.acm.org/citation.cfm?id=1251254.1251264.

DeBelius, Danny. 2015. “Let’s Tesselate: Hexagons for Tile Grid Maps.” NPR Visuals Team Blog, http://blog.apps.npr.org/2015/05/11/hex-tile-maps.html.

Decker, Ryan A., John Haltiwanger, Ron S. Jarmin, and Javier Miranda. 2016. “Where Has All the Skewness Gone? The Decline in High-Growth (Young) Firms in the US.” European Economic Review 86: 4–23.

Desai, T., F. Ritchie, and R. Welpton. 2016. “Five Safes: Designing Data Access for Research.” Working Papers 20161601, Department of Accounting, Economics; Finance, Bristol Business School, University of the West of England, Bristol.

Desmarais, Sarah L, and Jay P Singh. 2013. “Risk Assessment Instruments Validated and Implemented in Correctional Settings in the United States.” Lexington, KY: Council of State Governments. http://csgjusticecenter.org/wp-content/uploads/2014/07/Risk-Assessment-Instruments-Validated-and-Implemented-in-Correctional-Settings-in-the-United-States.pdf.

Devlin, Jacob, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. “BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding.” In Conference of the North American Chapter of the Association for Computational Linguistics.

DeWitt, David J., and Michael Stonebraker. 2008. “MapReduce: A Major Step Backwards.” http://www.dcs.bbk.ac.uk/~dell/teaching/cc/paper/dbc08/dewitt_mr_db.pdf.

Donohue III, John J., and Justin Wolfers. 2006. “Uses and Abuses of Empirical Evidence in the Death Penalty Debate.” National Bureau of Economic Research.

Doyle, Pat, Julia I. Lane, Jules J. M. Theeuwes, and Laura V. Zayatz. 2001. Confidentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies. Elsevier Science.

Drechsler, Jörg. 2011. Synthetic Datasets for Statistical Disclosure Control: Theory and Implementation. Springer.

Drew Harwell. 2019. “San Francisco becomes first city in U.S. to ban facial-recognition software.” https://www.washingtonpost.com/technology/2019/05/14/san-francisco-becomes-first-city-us-ban-facial-recognition-software/. Accessed February 12, 2020.

Duan, Lian, Lida Xu, Ying Liu, and Jun Lee. 2009. “Cluster-Based Outlier Detection.” Annals of Operations Research 168 (1). Springer: 151–68.

DuGoff, Eva H., Megan Schuler, and Elizabeth A. Stuart. 2014. “Generalizing Observational Study Results: Applying Propensity Score Methods to Complex Surveys.” Health Services Research 49 (1). Wiley Online Library: 284–303.

Duncan, George T, and S Lynne Stokes. 2004. “Disclosure Risk Vs. Data Utility: The Ru Confidentiality Map as Applied to Topcoding.” Chance 17 (3). Taylor & Francis: 16–20.

Duncan, George T., Mark Elliot, and Gonzalez Juan Jose Salazar. 2011. Statistical Confidentiality: Principles and Practice. Springer.

Dunne, Cody, and Ben Shneiderman. 2013. “Motif Simplification: Improving Network Visualization Readability with Fan, Connector, and Clique Glyphs.” In Proceedings of the Sigchi Conference on Human Factors in Computing Systems, 3247–56. ACM.

Dunning, Ted. 1993. “Accurate Methods for the Statistics of Surprise and Coincidence.” Computational Linguistics 19 (1). Cambridge, MA: MIT Press: 61–74.

Dutwin, David, and Trent D. Buskirk. 2017. “Reply.” Public Opinion Quarterly 81 (S1): 246–49.

Dwork, Cynthia, and Aaron Roth. 2014. “The Algorithmic Foundations of Differential Privacy.” Foundations and Trends in Theoretical Computer Science 9 (3–4): 211–407.

Dwork, Cynthia, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. 2012. “Fairness Through Awareness.” In Proceedings of the 3rd Innovations in Theoretical Computer Science Conference, 214–26. ITCS 12. New York, NY, USA: Association for Computing Machinery.

Economic and Social Research Council. 2016. “Administrative Data Research Network.”

Edelman, Benjamin, Michael Luca, and Dan Svirsky. 2017. “Racial Discrimination in the Sharing Economy: Evidence from a Field Experiment.” American Economic Journal: Applied Economics 9 (2): 1–22.

Elias, Peter. 2014. “A European Perspective on Research and Big Data Access.” In Privacy, Big Data, and the Public Good: Frameworks for Engagement, edited by Julia Lane, Victoria Stodden, Stefan Bender, and Helen Nissenbaum, 98–112. Cambridge University Press.

Elliott, Joshua, David Kelly, James Chryssanthacopoulos, Michael Glotter, Kanika Jhunjhnuwala, Neil Best, Michael Wilde, and Ian Foster. 2014. “The Parallel System for Integrating Impact Models and Sectors (pSIMS).” Environmental Modelling & Software 62. Elsevier: 509–16.

Elmagarmid, Ahmed K., Panagiotis G. Ipeirotis, and Vassilios S. Verykios. 2007. “Duplicate Record Detection: A Survey.” IEEE Transactions on Knowledge and Data Engineering 19 (1). IEEE: 1–16.

Evans, David S. 1987. “Tests of Alternative Theories of Firm Growth.” Journal of Political Economy 95. JSTOR: 657–74.

Evans, James A., and Jacob G. Foster. 2011. “Metaknowledge.” Science 331 (6018): 721–25.

Fan, Jianqing, and Yuan Liao. 2012. “Endogeneity in Ultrahigh Dimension.” Princeton University.

———. 2014. “Endogeneity in High Dimensions.” Annals of Statistics 42 (3): 872.

Fan, Jianqing, Fang Han, and Han Liu. 2014. “Challenges of Big Data Analysis.” National Science Review 1 (2). Oxford University Press: 293–314.

Fan, Jianqing, Richard Samworth, and Yichao Wu. 2009. “Ultrahigh Dimensional Feature Selection: Beyond the Linear Model.” Journal of Machine Learning Research 10: 2013–38.

Fekete, Jean-Daniel. 2015. “ProgressiVis: A Toolkit for Steerable Progressive Analytics and Visualization.” Paper presented at 1st Workshop on Data Systems for Interactive Analysis, Chicago, IL, October 26.

Fekete, Jean-Daniel, and Catherine Plaisant. 2002. “Interactive Information Visualization of a Million Items.” In IEEE Symposium on Information Visualization, 117–24. IEEE.

Feldman, Ronen, and James Sanger. 2006. Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data. Cambridge University Press.

Fellegi, Ivan P., and Alan B. Sunter. 1969. “A Theory for Record Linkage.” Journal of the American Statistical Association 64 (328). Taylor & Francis Group: 1183–1210.

Feng, Shi, Eric Wallace, Alvin Grissom II, Pedro Rodriguez, Mohit Iyyer, and Jordan Boyd-Graber. 2018. “Pathologies of Neural Models Make Interpretation Difficult.” In Empirical Methods in Natural Language Processing. Brussels, Belgium.

Ferragina, Paolo, and Ugo Scaiella. 2010. “TAGME: On-the-Fly Annotation of Short Text Fragments (by Wikipedia Entities).” In Proceedings of the 19th Acm International Conference on Information and Knowledge Management, 1625–8. CIKM 10. New York, NY, USA: Association for Computing Machinery.

Few, Stephen. 2009. Now You See It: Simple Visualization Techniques for Quantitative Analysis. Analytics Press.

———. 2013. Information Dashboard Design: Displaying Data for at-a-Glance Monitoring. Analytics Press.

Fielding, Roy T., and Richard N. Taylor. 2002. “Principled Design of the Modern Web Architecture.” ACM Transactions on Internet Technology 2 (2). ACM: 115–50.

Fisher, Danyel, Igor Popov, Steven Drucker, and Monica Schraefel. 2012. “Trust Me, I’m Partially Right: Incremental Visualization Lets Analysts Explore Large Datasets Faster.” In Proceedings of the Sigchi Conference on Human Factors in Computing Systems, 1673–82. ACM.

Flach, Peter. 2012. Machine Learning: The Art and Science of Algorithms That Make Sense of Data. Cambridge University Press.

Fortuna, Blaz, Marko Grobelnik, and Dunja Mladenic. 2007. “OntoGen: Semi-Automatic Ontology Editor.” In Proceedings of the 2007 Conference on Human Interface: Part Ii, 309–18. Beijing, China: Springer.

Foster, Lucia, Ron S. Jarmin, and T. Lynn Riggs. 2009. “Resolving the Tension Between Access and Confidentiality: Past Experience and Future Plans at the US Census Bureau.” 09-33. US Census Bureau Center for Economic Studies.

Fox, Armando, Steven D. Gribble, Yatin Chawathe, Eric A. Brewer, and Paul Gauthier. 1997. “Cluster-Based Scalable Network Services.” ACM SIGOPS Operating Systems Review 31 (5). ACM.

Francis, W. Nelson, and Henry Kucera. 1979. “Brown Corpus Manual.” Department of Linguistics, Brown University, Providence, Rhode Island, US.

Freeman, Linton C. 1979. “Centrality in Social Networks Conceptual Clarification.” Social Networks 1 (3). Elsevier: 215–39.

Fuller, Wayne A. 1991. “Regression Estimation in the Presence of Measurement Error.” In Measurement Errors in Surveys, edited by Paul P. Biemer, Robert M. Groves, Lars E. Lyberg, Nancy A. Mathiowetz, and Seymour Sudman, 617–35. John Wiley & Sons.

Galloway, Scott. 2017. The Four: The Hidden DNA of Amazon, Apple, Facebook and Google. Random House.

Geman, Stuart, and Donald Geman. 1990. “Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images.” In Readings in Uncertain Reasoning, edited by Glenn Shafer and Judea Pearl, 452–72. Morgan Kaufmann.

Gerrish, Sean M., and David M. Blei. 2012. “The Issue-Adjusted Ideal Point Model.” https://arxiv.org/abs/1209.6004.

Girone, Maria. 2008. “CERN Database Services for the LHC Computing Grid.” In Journal of Physics: Conference Series. Vol. 119. 5. IOP Publishing.

Girvan, Michelle, and Mark E. J. Newman. 2002. “Community Structure in Social and Biological Networks.” Proceedings of the National Academy of Sciences 99 (12). National Acad Sciences: 7821–6.

Glaeser, Edward. 2019. “Urban Management in the 21st Century: Ten Insights from Professor Ed Glaeser.” Centre for Development; Enterprise (CDE).

Glennon, Britta. 2019. “How Do Restrictions on High-Skilled Immigration Affect Offshoring? Evidence from the H-1b Program.” http://brittaglennon.com/research/.

Glueck, Michael, Azam Khan, and Daniel J. Wigdor. 2014. “Dive in! Enabling Progressive Loading for Real-Time Navigation of Data Visualizations.” In Proceedings of the Sigchi Conference on Human Factors in Computing Systems, 561–70. ACM.

Goodfellow, Ian, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. MIT Press.

Government Computer News Staff. 2018. “Data mashups at government scale: The Census Bureau ADRF.” GCN Magazine.

Göbel, Sascha, and Simon Munzert. 2018. “Political Advertising on the Wikipedia Marketplace of Information.” Social Science Computer Review 36 (2): 157–75.

Gray, Jim. 1981. “The Transaction Concept: Virtues and Limitations.” In Proceedings of the Seventh International Conference on Very Large Data Bases, 7:144–54.

Green, Donald P., and Holger L. Kern. 2012. “Modeling Heterogeneous Treatment Effects in Survey Experiments with Bayesian Additive Regression Trees.” Public Opinion Quarterly 76. AAPOR: 491–511.

Greenwood, Daniel, Arkadiusz Stopczynski, Brian Sweatt, Thomas Hardjono, and Alex Pentland. 2014. “The New Deal on Data: A Framework for Institutional Controls.” In Privacy, Big Data, and the Public Good: Frameworks for Engagement, edited by Julia Lane, Victoria Stodden, Stefan Bender, and Helen Nissenbaum. Cambridge University Press.

Griffiths, Thomas L., and Mark Steyvers. 2004. “Finding Scientific Topics.” Proceedings of the National Academy of Sciences 101 (Suppl. 1): 5228–35.

Grimmer, Justin, and Brandon M. Stewart. 2013. “Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts.” Political Analysis 21 (3): 267–97.

Gropp, William, Ewing Lusk, and Anthony Skjellum. 2014. Using Mpi: Portable Parallel Programming with the Message-Passing Interface. MIT Press.

Groves, Robert M. 2004. Survey Errors and Survey Costs. John Wiley & Sons.

Haak, Laurel L., Martin Fenner, Laura Paglione, Ed Pentz, and Howard Ratner. 2012. “ORCID: A System to Uniquely Identify Researchers.” Learned Publishing 25 (4): 259–64.

Hainmueller, Jens, and Chad Hazlett. 2014. “Kernel Regularized Least Squares: Reducing Misspecification Bias with a Flexible and Interpretable Machine Learning Approach.” Political Analysis 22 (2): 143–68.

Halevy, Alon, Peter Norvig, and Fernando Pereira. 2009. “The Unreasonable Effectiveness of Data.” IEEE Intelligent Systems 24 (2). Piscataway, NJ: IEEE Educational Activities Department: 8–12.

Hall, Peter, and Hugh Miller. 2009. “Using Generalized Correlation to Effect Variable Selection in Very High Dimensional Problems.” Journal of Computational and Graphical Statistics 18: 533–50.

Haltiwanger, John, Ron S. Jarmin, and Javier Miranda. 2013. “Who Creates Jobs? Small Versus Large Versus Young.” Review of Economics and Statistics 95 (2). MIT Press: 347–61.

Han, Hui, Lee Giles, Hongyuan Zha, Cheng Li, and Kostas Tsioutsiouliklis. 2004. “Two Supervised Learning Approaches for Name Disambiguation in Author Citations.” In Proceedings of the Joint Acm/Ieee Conference on Digital Libraries, 296–305. IEEE.

Hansen, Derek, Ben Shneiderman, and Marc A. Smith. 2010. Analyzing Social Media Networks with NodeXL: Insights from a Connected World. Morgan Kaufmann.

Hansen, Morris H., William N. Hurwitz, and William G. Madow. 1993. Sample Survey Methods and Theory. John Wiley & Sons.

Harford, Tim. 2014. “Big Data: A Big Mistake?” Significance 11 (5): 14–19.

Harrison, Lane, Katharina Reinecke, and Remco Chang. 2015. “Infographic Aesthetics: Designing for the First Impression.” In Proceedings of the 33rd Annual Acm Conference on Human Factors in Computing Systems, 1187–90. ACM.

Hart, Nick. 2019. “Two Years of Progress on Evidence-Based Policymaking in the United States.” Data Coalition Blog. Washington DC: The Data Colaition. https://www.datacoalition.org/two-years-of-progress-on-evidence-based-policymaking-in-the-united-states/.

Hastie, Trevor, Robert Tibshirani, and Jerome Friedman. 2001. The Elements of Statistical Learning. Springer.

Hayden, Erica Check. 2015. “Researchers Wrestle with a Privacy Problem.” Nature 525 (7570).

Hayden, Erika Check. 2012. “A Broken Contract.” Nature 486 (7403): 312–14.

He, Zengyou, Xiaofei Xu, and Shengchun Deng. 2003. “Discovering Cluster-Based Local Outliers.” Pattern Recognition Letters 24 (9). Elsevier: 1641–50.

Healy, Kieran, and James Moody. 2014. “Data Visualization in Sociology.” Annual Review of Sociology 40. Annual Reviews: 105–28.

Henry, Nathalie, and Jean-Daniel Fekete. 2006. “MatrixExplorer: A Dual-Representation System to Explore Social Networks.” IEEE Transactions on Visualization and Computer Graphics 12 (5). IEEE: 677–84.

Herzog, Thomas N., Fritz J. Scheuren, and William E. Winkler. 2007. Data Quality and Record Linkage Techniques. Springer Science & Business Media.

Hill, Kashmir. 2012. “How Target Figured Out a Teen Girl Was Pregnant Before Her Father Did.” Forbes, http://www.forbes.com/sites/kashmirhill/2012/02/16/how-target-figured-out-a-teen-girl-was-pregnant-before-her-father-did/#7280148734c6.

Hofmann, Thomas. 1999. “Probabilistic Latent Semantic Analysis.” In Proceedings of Uncertainty in Artificial Intelligence.

Holmberg, Anders, and Christine Bycroft. 2017. “Statistics New Zealand’s Approach to Making Use of Alternative Data Sources in a New Era of Integrated Data.” In Total Survey Error in Practice, edited by Paul P. Biemer, Edith D. de Leeuw, Stephanie Eckman, Brad Edwards, Frauke Kreuter, Lars E. Lyberg, N. Clyde Tucker, and Brady T. West. Hoboken, NJ: John Wiley; Sons.

Hox, Joop. 2010. Multilevel Analysis: Techniques and Applications. Routledge.

Hsieh, Yuli Patrick, and Joe Murphy. 2017. “Total Twitter Error: Decomposing Public Opinion Measurement on Twitter from a Total Survey Error Perspective.” In Total Survey Error in Practice, edited by Paul P. Biemer, Edith D. de Leeuw, Stephanie Eckman, Brad Edwards, Frauke Kreuter, Lars E. Lyberg, N. Clyde Tucker, and Brady T. West. Hoboken, NJ: John Wiley; Sons.

Hu, Yuening, Ke Zhai, Vlad Eidelman, and Jordan Boyd-Graber. 2014. “Polylingual Tree-Based Topic Models for Translation Domain Adaptation.” In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. Baltimore, MD.

Huang, Anna. 2008. “Similarity Measures for Text Document Clustering.” Paper presented at New Zealand Computer Science Research Student Conference, Christchurch, New Zealand, April 14–18.

Huang, Jian, Seyda Ertekin, and C. Lee Giles. 2006. “Efficient Name Disambiguation for Large-Scale Databases.” In Knowledge Discovery in Databases: PKDD 2006, 536–44. Springer.

Hukkelas, Hakon, Rudolf Mester, and Frank Lindseth. 2019. “DeepPrivacy: A Generative Adversarial Network for Face Anonymization.” https://arxiv.org/abs/1909.04538.

Human Microbiome Jumpstart Reference Strains Consortium, K. E. Nelson, G. M. Weinstock, and others. 2010. “A Catalog of Reference Genomes from the Human Microbiome.” Science 328 (5981). American Association for the Advancement of Science: 994–99.

Hundepool, Anco, Josep Domingo-Ferrer, Luisa Franconi, Sarah Giessing, Rainer Lenz, Jane Longhurst, E. Schulte Nordholt, Giovanni Seri, and P. Wolf. 2010. “Handbook on Statistical Disclosure Control.” Network of Excellence in the European Statistical System in the Field of Statistical Disclosure Control.

Husband Fealing, Kaye, Julia Ingrid Lane, Jack Marburger, and Stephanie Shipp. 2011. Science of Science Policy: The Handbook. Stanford University Press.

Ibrahim, Joseph G., and Ming-Hui Chen. 2000. “Power Prior Distributions for Regression Models.” Statistical Science 15 (1). JSTOR: 46–60.

Imai, Kosuke, Marc Ratkovic, and others. 2013. “Estimating Treatment Effect Heterogeneity in Randomized Program Evaluation.” Annals of Applied Statistics 7 (1). Institute of Mathematical Statistics: 443–70.

Imbens, Guido W., and Donald B. Rubin. 2015. Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction. Cambridge University Press.

Inselberg, Alfred. 2009. Parallel Coordinates. Springer.

Institute For Research On Innovation And Science (IRIS) Research. 2019. Summary Documentation for the IRIS UMETRICS 2019 Data Release. Institute for Research on Innovation; Science (IRIS). https://iris.isr.umich.edu/research-data/2019datarelease-summarydoc/.

Institute for Social Research. 2013. “PSID File Structure and Merging PSID Data Files.” Technical report. http://psidonline.isr.umich.edu/Guide/FileStructure.pdf.

Ioannidis, John P. A. 2005. “Why Most Published Research Findings Are False.” PLoS Medicine 2 (8): e124.

Iyyer, Mohit, Varun Manjunatha, Jordan Boyd-Graber, and Hal Daumé III. 2015. “Deep Unordered Composition Rivals Syntactic Methods for Text Classification.” In Association for Computational Linguistics.

James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2013. An Introduction to Statistical Learning. Springer.

Japec, Lilli, Frauke Kreuter, Marcus Berg, Paul Biemer, Paul Decker, Cliff Lampe, Julia Lane, Cathy O’Neil, and Abe Usher. 2015. “Big Data in Survey Research: AAPOR Task Force Report.” Public Opinion Quarterly 79 (4). AAPOR: 839–80.

Jarmin, Ron S., and Javier Miranda. 2002. “The Longitudinal Business Database.” Available at SSRN: https://ssrn.com/abstract=2128793.

Jarmin, Ron S., Thomas A. Louis, and Javier Miranda. 2014. “Expanding the Role of Synthetic Data at the US Census Bureau.” Statistical Journal of the IAOS 30 (2): 117–21.

Jeff Larson and Surya Mattu and Lauren Kirchner and Julia Angwin. 2016. “How We Analyzed the COMPAS Recidivism Algorithm.” https://www.propublica.org/article/how-we-analyzed-the-compas-recidivism-algorithm. Accessed February 12, 2020.

Johnson, Brian, and Ben Shneiderman. 1991. “Tree-Maps: A Space-Filling Approach to the Visualization of Hierarchical Information Structures.” In Proceedings of the Ieee Conference on Visualization, 284–91. IEEE.

Jones, Paul, and Peter Elias. 2006. “Administrative Data as a Research Resource: A Selected Audit.” ESRC National Centre for Research Methods.

Jovanovic, Boyan. 1982. “Selection and the Evolution of Industry.” Econometrica: Journal of the Econometric Society 50 (3). JSTOR: 649–70.

Julia Angwin and Jeff Larson. 2016. “Bias in Criminal Risk Scores Is Mathematically Inevitable, Researchers Say.” https://www.propublica.org/article/bias-in-criminal-risk-scores-is-mathematically-inevitable-researchers-say. Accessed February 12, 2020.

Julia Angwin and Jeff Larson and Surya Mattu and Lauren Kirchner. 2016. “Machine Bias.” https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing. Accessed February 12, 2020.

Julia Angwin and Terry Parris Jr. 2016. “Facebook Lets Advertisers Exclude Users by Race.” https://www.propublica.org/article/facebook-lets-advertisers-exclude-users-by-race. Accessed February 12, 2020.

Kabo, Felichism, Yongha Hwang, Margaret Levenstein, and Jason Owen-Smith. 2015. “Shared Paths to the Lab: A Sociospatial Network Analysis of Collaboration.” Environment and Behavior 47 (1). SAGE Publications: 57–84.

Karr, Alan, and Jerome P. Reiter. 2014. “Analytical Frameworks for Data Release: A Statistical View.” In Privacy, Big Data, and the Public Good: Frameworks for Engagement, edited by Julia Lane, Victoria Stodden, Stefan Bender, and Helen Nissenbaum. Cambridge University Press.

Keshif. n.d. “Infographics Aesthetics Dataset Browser.” http://keshif.me/demo/infographics_aesthetics. Accessed February 1, 2020.

Kilbertus, Niki, Mateo Rojas Carulla, Giambattista Parascandolo, Moritz Hardt, Dominik Janzing, and Bernhard Schölkopf. 2017. “Avoiding Discrimination Through Causal Reasoning.” In Advances in Neural Information Processing Systems 30, 656–66. Curran Associates, Inc.

Kim, Kunho, Madian Khabsa, and C Lee Giles. 2016. “Inventor Name Disambiguation for a Patent Database Using a Random Forest and Dbscan.” In 2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL), 269–70. IEEE.

Kim, Yoonsang, Jidong Huang, and Sherry Emery. 2016. “Garbage in, Garbage Out: Data Collection, Quality Assessment and Reporting Standards for Social Media Data Use in Health Research, Infodemiology and Digital Disease Detection.” Journal of Medical Internet Research 18 (2).

King, Gary, Jennifer Pan, and Margaret E. Roberts. 2013. “How Censorship in China Allows Government Criticism but Silences Collective Expression.” American Political Science Review 107 (2): 1–18.

Kinney, Satkartar K., Jerome P. Reiter, Arnold P. Reznek, Javier Miranda, Ron S. Jarmin, and John M. Abowd. 2011. “Towards Unrestricted Public Use Business Microdata: The Synthetic Longitudinal Business Database.” International Statistical Review 79 (3). Wiley Online Library: 362–84.

Kirk, Andy. 2012. Data Visualization: A Successful Design Process. Packt Publishing.

Kiss, Tibor, and Jan Strunk. 2006. “Unsupervised Multilingual Sentence Boundary Detection.” Computational Linguistics 32 (4). Cambridge, MA: MIT Press: 485–525.

Kleinberg, Jon, Jens Ludwig, Sendhil Mullainathan, and Ziad Obermeyer. 2015. “Prediction Policy Problems.” American Economic Review 105 (5): 491–95.

Kleinberg, Jon, Sendhil Mullainathan, and Manish Raghavan. 2017. “Inherent Trade-Offs in the Fair Determination of Risk Scores.” In 8th Innovations in Theoretical Computer Science Conference (ITCS 2017), edited by Christos H. Papadimitriou. Vol. 67. Dagstuhl, Germany: Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik.

Kohler, Ulrich, and Frauke Kreuter. 2012. Data Analysis Using Stata. 3rd Edition. Stata Press.

Kolb, Lars, Andreas Thor, and Erhard Rahm. 2012. “Dedoop: Efficient Deduplication with Hadoop.” Proceedings of the VLDB Endowment 5 (12): 1878–81.

Kong, Lingpeng, Nathan Schneider, Swabha Swayamdipta, Archna Bhatia, Chris Dyer, and Noah A. Smith. 2014. “A Dependency Parser for Tweets.” In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 1001–12. Association for Computational Linguistics.

Kopf, Dan. 2018. “This year’s Nobel Prize in economics was awarded to a Python convert.” Quartz. https://qz.com/1417145/economics-nobel-laureate-paul-romer-is-a-python-programming-convert/.

Köpcke, Hanna, Andreas Thor, and Erhard Rahm. 2010. “Evaluation of Entity Resolution Approaches on Real-World Match Problems.” Proceedings of the VLDB Endowment 3 (1–2). VLDB Endowment: 484–93.

Kraak, Menno-Jan. 2014. Mapping Time: Illustrated by Minard’s Map of Napoleon’s Russian Campaign of 1812. ESRI Press.

Kreuter, Frauke, and Roger D. Peng. 2014. “Extracting Information from Big Data: Issues of Measurement, Inference, and Linkage.” In Privacy, Big Data, and the Public Good: Frameworks for Engagement, edited by Julia Lane, Victoria Stodden, Stefan Bender, and Helen Nissenbaum, 257–75. Cambridge University Press.

Kreuter, Frauke, Rayid Ghani, and Julia Lane. 2019. “Change Through Data: A Data Analytics Training Program for Government Employees.” Harvard Data Science Review 1 (2).

Kuhn, H. W. 2005. “The Hungarian Method for the Assignment Problem.” Naval Research Logistics 52 (1). Wiley Online Library: 7–21.

Kuhn, Max, and Kjell Johnson. 2013. Applied Predictive Modeling. Springer Science & Business Media.

Kullback, Solomon, and Richard A. Leibler. 1951. “On Information and Sufficiency.” Annals of Mathematical Statistics 22 (1). JSTOR: 79–86.

Kumar, Mohit, Rayid Ghani, and Zhu-Song Mei. 2010. “Data Mining to Predict and Prevent Errors in Health Insurance Claims Processing.” In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 65–74. KDD ’10. ACM.

Kusner, Matt J, Joshua Loftus, Chris Russell, and Ricardo Silva. 2017. “Counterfactual Fairness.” In Advances in Neural Information Processing Systems 30, 4066–76. Curran Associates, Inc.

Lafferty, John D., Andrew McCallum, and Fernando C. N. Pereira. 2001. “Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data.” In Proceedings of the Eighteenth International Conference on Machine Learning, 282–89. Morgan Kaufmann.

Lahiri, Partha, and Michael D Larsen. 2005. “Regression Analysis with Linked Data.” Journal of the American Statistical Association 100 (469). Taylor & Francis: 222–30.

Lakkaraju, Himabindu, Everaldo Aguiar, Carl Shan, David Miller, Nasir Bhanpuri, Rayid Ghani, and Kecia L. Addison. 2015. “A Machine Learning Framework to Identify Students at Risk of Adverse Academic Outcomes.” In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1909–18. KDD ’15. ACM.

Lam, Heidi, Enrico Bertini, Petra Isenberg, Catherine Plaisant, and Sheelagh Carpendale. 2012. “Empirical Studies in Information Visualization: Seven Scenarios.” IEEE Transactions on Visualization and Computer Graphics 18 (9). IEEE: 1520–36.

Lambrecht, Anja, and Catherine Tucker. 2019. “Algorithmic Bias? An Empirical Study of Apparent Gender-Based Discrimination in the Display of Stem Career Ads.” Management Science 65 (7): 2966–81.

Landauer, Thomas, and Susan Dumais. 1997. “Solutions to Plato’s Problem: The Latent Semantic Analysis Theory of Acquisition, Induction and Representation of Knowledge.” Psychological Review 104 (2): 211–40.

Lane, Julia. 2007. “Optimizing Access to Micro Data.” Journal of Official Statistics 23: 299–317.

———. 2020. “Tiered Access: Risk and Utility.” Washington DC: Committee of Professional Associations on Federal Statistics (COPAFS).

Lane, Julia, and Victoria Stodden. 2013. “What? Me Worry? What to Do About Privacy, Big Data, and Statistical Research.” AMSTAT News 438. American Statistical Association: 14.

Lane, Julia, Jason Owen-Smith, Rebecca F. Rosen, and Bruce A. Weinberg. 2015. “New Linked Data on Research Investments: Scientific Workforce, Productivity, and Public Value.” Research Policy 44. Elsevier: 1659–71.

Lane, Julia, Jason Owen-Smith, Joseph Staudt, and Bruce A. Weinberg. 2018. “New Measurement of Innovation.” In Center for Economic Studies and Research Data Centers Research Report: 2017, edited by US Census Bureau. Washington DC: U.S. Census Bureau.

Lane, Julia, Victoria Stodden, Stefan Bender, and Helen Nissenbaum, eds. 2014. Privacy, Big Data, and the Public Good: Frameworks for Engagement. Cambridge: Cambridge University Press.

Lazer, David, Ryan Kennedy, Gary King, and Alessandro Vespignani. 2014. “The Parable of Google Flu: Traps in Big Data Analysis.” Science 343.

Levitt, Steven D., and Thomas J. Miles. 2006. “Economic Contributions to the Understanding of Crime.” Annual Review of Law Social Science 2. Annual Reviews: 147–64.

Lewis, David D. 1998. “Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval.” In Proceedings of European Conference of Machine Learning, 4–15.

Li, Guan-Cheng, Ronald Lai, Alexander D’Amour, David M. Doolin, Ye Sun, Vetle I. Torvik, Z. Yu Amy, and Lee Fleming. 2014. “Disambiguation and Co-Authorship Networks of the Us Patent Inventor Database (1975–2010).” Research Policy 43 (6). Elsevier: 941–55.

Lifka, D., I. Foster, S. Mehringer, M. Parashar, P. Redfern, C. Stewart, and S. Tuecke. 2013. “XSEDE Cloud Survey Report.” Technical report, National Science Foundation, USA, http://hdl.handle.net/2142/45766.

Lin, Jimmy, and Chris Dyer. 2010. Data-Intensive Text Processing with MapReduce. Morgan & Claypool Publishers.

Lins, Lauro, James T Klosowski, and Carlos Scheidegger. 2013. “Nanocubes for Real-Time Exploration of Spatiotemporal Datasets.” IEEE Transactions on Visualization and Computer Graphics 19 (12). IEEE: 2456–65.

Little, Roderick J. A., and Donald B. Rubin. 2014. Statistical Analysis with Missing Data. John Wiley & Sons.

Liu, Zhicheng, and Jeffrey Heer. 2014. “The Effects of Interactive Latency on Exploratory Visual Analysis.” IEEE Transactions on Visualization and Computer Graphics 20 (12). IEEE: 2122–31.

Lockwood, Glenn K. 2015. “Conceptual Overview of Map-Reduce and Hadoop.” http://www.glennklockwood.com/data-intensive/hadoop/overview.html.

Lohr, Sharon. 2009. Sampling: Design and Analysis. Cengage Learning.

Lundberg, Scott M., and Su-In Lee. 2017. “A Unified Approach to Interpreting Model Predictions.” In Proceedings of the 31st International Conference on Neural Information Processing Systems, 4768–77. NIPS 17. Red Hook, NY, USA: Curran Associates Inc.

Lynch, James. 2018. “Not Even Our Own Facts: Criminology in the Era of Big Data.” Criminology 56 (3). Wiley Online Library: 437–54.

MacEachren, Alan M., Stephen Crawford, Mamata Akella, and Gene Lengerich. 2008. “Design and Implementation of a Model, Web-Based, GIS-Enabled Cancer Atlas.” Cartographic Journal 45 (4). Maney Publishing: 246–60.

MacKinlay, Jock. 1986. “Automating the Design of Graphical Presentations of Relational Information.” ACM Transactions on Graphics 5 (2). ACM: 110–41.

Malik, Waqas Ahmed, Antony Unwin, and Alexander Gribov. 2010. “An Interactive Graphical System for Visualizing Data Quality–Tableplot Graphics.” In Classification as a Tool for Research, 331–39. Springer.

Malmkjær, K. 2002. The Linguistics Encyclopedia. Routledge.

Manning, Christopher D., Mihai Surdeanu, John Bauer, Jenny Finkel, Steven J. Bethard, and David McClosky. 2014. “The Stanford CoreNLP Natural Language Processing Toolkit.” In Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, 55–60.

Marburger, John H. 2005. “Wanted: Better Benchmarks.” Science 308 (5725). American Association for the Advancement of Science: 1087.

Marcus, Mitchell P., Beatrice Santorini, and Mary A. Marcinkiewicz. 1993. “Building a Large Annotated Corpus of English: The Penn Treebank.” Computational Linguistics 19 (2): 313–30.

Mas, Alexandre, and Enrico Moretti. 2009. “Peers at Work.” American Economic Review 99 (1): 112–45.

Maskeri, Girish, Santonu Sarkar, and Kenneth Heafield. 2008. “Mining Business Topics in Source Code Using Latent Dirichlet Allocation.” In Proceedings of the 1st India Software Engineering Conference, 113–20. ACM.

McCallister, Erika, Timothy Grance, and Karen A Scarfone. 2010. SP 800-122. Guide to Protecting the Confidentiality of Personally Identifiable Information (PII). National Institute of Standards; Technology.

McCallum, Andrew Kachites. 2002. “MALLET: A Machine Learning for Language Toolkit.” http://mallet.cs.umass.edu.

Meij, Edgar, Marc Bron, Laura Hollink, Bouke Huurnink, and Maarten Rijke. 2009. “Learning Semantic Query Suggestions.” In Proceedings of the 8th International Semantic Web Conference, 424–40. ISWC ’09. Springer.

Meng, Xiao-Li. 2018. “Statistical Paradises and Paradoxes in Big Data (I): Law of Large Populations, Big Data Paradox, and the 2016 US Presidential Election.” The Annals of Applied Statistics 12 (2): 685–726.

Mikolov, Tomas, Ilya Sutskever, Kai Chen, Greg S. Corrado, and Jeff Dean. 2013. “Distributed Representations of Words and Phrases and Their Compositionality.” In Advances in Neural Information Processing Systems, 3111–9. Morgan Kaufmann.

Mitchell, Tom M. 1997. Machine Learning. McGraw-Hill.

Moffatt, C. L. 1999. “Visual Representation of SQL Joins.” http://www.codeproject.com/Articles/33052/Visual-Representation-of-SQL-Joins.

Molina, Giovanni, Fahad AlGhamdi, Mahmoud Ghoneim, Abdelati Hawwari, Nicolas Rey-Villamizar, Mona Diab, and Thamar Solorio. 2016. “Overview for the Second Shared Task on Language Identification in Code-Switched Data.” In Proceedings of the Second Workshop on Computational Approaches to Code Switching, 40–49. Austin, Texas: Association for Computational Linguistics.

Molinaro, Anthony. 2005. SQL Cookbook: Query Solutions and Techniques for Database Developers. O’Reilly Media.

Moreno, Jacob L. 1934. Who Shall Survive?: A New Approach to the Problem of Human Interrelations. Washington, D.C.: Nervous and Mental Disease Publishing Co.

Mortensen, Peter Stendahl, and Carter Walter Bloch. 2005. Oslo Manual: Guidelines for Collecting and Interpreting Innovation Data. Organisation for Economic Co-operation and Development.

Munzner, Tamara. 2014. Visualization Analysis and Design. CRC Press.

Narayanan, Arvind, and Vitaly Shmatikov. 2008. “Robust de-Anonymization of Large Sparse Datasets.” In IEEE Symposium on Security and Privacy, 111–25. IEEE.

Natarajan, Kalaivany, Jiuyong Li, and Andy Koronios. 2010. Data Mining Techniques for Data Cleaning. Springer.

National Academies. 2014. “Proposed Revisions to the Common Rule for the Protection of Human Subjects in the Behavioral and Social Sciences.” Washington DC: National Academies of Sciences.

National Academies of Sciences, Engineering, and Medicine and others. 2018. Federal Statistics, Multiple Data Sources, and Privacy Protection: Next Steps. National Academies Press.

National Center for Health Statistics. 2019. “The Linkage of National Center for Health Statistics Survey Data to the National Death Index – 2015 Linked Mortality File (Lmf): Methodology Overview and Analytic Considerations.” https://www.cdc.gov/nchs/data-linkage/mortality-methods.htm.

Navigli, Roberto, Stefano Faralli, Aitor Soroa, Oier de Lacalle, and Eneko Agirre. 2011. “Two Birds with One Stone: Learning Semantic Models for Text Categorization and Word Sense Disambiguation.” In Proceedings of the 20th Acm International Conference on Information and Knowledge Management. ACM.

Neamatullah, Ishna, Margaret M. Douglass, Li-wei H. Lehman, Andrew Reisner, Mauricio Villarroel, William J. Long, Peter Szolovits, George B. Moody, Roger G. Mark, and Gari D. Clifford. 2008. “Automated de-Identification of Free-Text Medical Records.” BMC Medical Informatics and Decision Making 8.

Nelson, Robert K. 2010. “Mining the Dispatch.” http://dsl.richmond.edu/dispatch/.

Newman, Mark. 2005. “A Measure of Betweenness Centrality Based on Random Walks.” Social Networks 27 (1). Elsevier: 39–54.

———. 2010. Networks: An Introduction. Oxford University Press.

Nguyen, Viet-An, Jordan Boyd-Graber, and Philip Resnik. 2012. “SITS: A Hierarchical Nonparametric Model Using Speaker Identity for Topic Segmentation in Multiparty Conversations.” In Proceedings of the Association for Computational Linguistics. Jeju, South Korea.

———. 2013. “Lexical and Hierarchical Topic Regression.” In Advances in Neural Information Processing Systems. Lake Tahoe, Nevada.

Nguyen, Viet-An, Jordan Boyd-Graber, Philip Resnik, and Jonathan Chang. 2014. “Learning a Concept Hierarchy from Multi-Labeled Documents.” In Proceedings of the Annual Conference on Neural Information Processing Systems. Morgan Kaufmann.

Nguyen, Viet-An, Jordan Boyd-Graber, Philip Resnik, and Kristina Miler. 2015. “Tea Party in the House: A Hierarchical Ideal Point Topic Model and Its Application to Republican Legislators in the 112th Congress.” In Association for Computational Linguistics. Beijing, China.

Niculae, Vlad, Srijan Kumar, Jordan Boyd-Graber, and Cristian Danescu-Niculescu-Mizil. 2015. “Linguistic Harbingers of Betrayal: A Case Study on an Online Strategy Game.” In Association for Computational Linguistics. Beijing, China.

Nielsen, Michael. 2012. Reinventing Discovery: The New Era of Networked Science. Princeton University Press.

Nissenbaum, Helen. 2009. Privacy in Context: Technology, Policy, and the Integrity of Social Life. Stanford University Press.

———. 2011. “A Contextual Approach to Privacy Online.” Daedalus 140 (4). MIT Press: 32–48.

———. 2019. “Contextual Integrity up and down the Data Food Chain.” Theoretical Inquiries in Law 20 (1). De Gruyter: 221–56.

Obe, Regina O., and Leo S. Hsu. 2015. PostGIS in Action, 2nd Edition. Manning Publications.

Obstfeld, David. 2005. “Social Networks, the Tertius Iungens Orientation, and Involvement in Innovation.” Administrative Science Quarterly 50 (1). SAGE Publications: 100–130.

Office of Management and Budget. 2019. “M-19-23: Phase 1 Implementation of the Foundations for Evidence-Based Policymaking Act of 2018: Learning Agendas, Personnel, and Planning Guidance.” Washington DC: https://www.whitehouse.gov/wp-content/uploads/2019/07/M-19-23.pdf.

Ohm, Paul. 2010. “Broken Promises of Privacy: Responding to the Surprising Failure of Anonymization.” UCLA Law Review 57: 1701.

———. 2014. “The Legal and Regulatory Framework: What Do the Rules Say About Data Analysis?” In Privacy, Big Data, and the Public Good: Frameworks for Engagement, edited by Julia Lane, Victoria Stodden, Helen Nissenbaum, and Stefan Bender. Cambridge University Press.

Olson, Judy M., and Cynthia A. Brewer. 1997. “An Evaluation of Color Selections to Accommodate Map Users with Color-Vision Impairments.” Annals of the Association of American Geographers 87 (1). Taylor & Francis: 103–34.

Organisation of Economic Co-operation and Development. 2004. “A Summary of the Frascati Manual.” Main Definitions and Conventions for the Measurement of Research and Experimental Development 84.

Ott, Myle, Yejin Choi, Claire Cardie, and Jeffrey T. Hancock. 2011. “Finding Deceptive Opinion Spam by Any Stretch of the Imagination.” In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies—Volume 1, 309–19. HLT ’11. Stroudsburg, PA: Association for Computational Linguistics.

Owen-Smith, Jason, and Walter W. Powell. 2003. “The Expanding Role of University Patenting in the Life Sciences: Assessing the Importance of Experience and Connectivity.” Research Policy 32 (9). Elsevier: 1695–1711.

———. 2004. “Knowledge Networks as Channels and Conduits: The Effects of Spillovers in the Boston Biotechnology Community.” Organization Science 15 (1). INFORMS: 5–21.

Pan, Ian, Laura B Nolan, Rashida R Brown, Romana Khan, Paul van der Boor, Daniel G Harris, and Rayid Ghani. 2017. “Machine Learning for Social Services: A Study of Prenatal Case Management in Illinois.” American Journal of Public Health 107 (6). American Public Health Association: 938–44.

Pang, Bo, and Lillian Lee. 2008. Opinion Mining and Sentiment Analysis. Paperback; Now Publishers.

Park, Hae-Sang, and Chi-Hyuck Jun. 2009. “A Simple and Fast Algorithm for K-Medoids Clustering.” Expert Systems with Applications 36 (2). Elsevier: 3336–41.

Paul, Michael, and Roxana Girju. 2010. “A Two-Dimensional Topic-Aspect Model for Discovering Multi-Faceted Topics.” In Association for the Advancement of Artificial Intelligence.

Pennebaker, James W., and Martha E. Francis. 1999. Linguistic Inquiry and Word Count. Loose Leaf; Lawrence Erlbaum.

Pentland, Alex, Daniel Greenwood, Brian Sweatt, Arek Stopczynski, and Yves-Alexandre de Montjoye. 2014. “Institutional Controls: The New Deal on Data.” In Privacy, Big Data, and the Public Good: Frameworks for Engagement, edited by Julia Lane, Victoria Stodden, Stefan Bender, and Helen Nissenbaum, 98–112. Cambridge University Press.

Peters, Matthew, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. “Deep Contextualized Word Representations.” In Conference of the North American Chapter of the Association for Computational Linguistics.

Peterson, Andrew, and Arthur Spirling. 2018. “Classification Accuracy as a Substantive Quantity of Interest: Measuring Polarization in Westminster Systems.” Political Analysis 26 (1). Cambridge University Press: 120–28.

Petrakos, George, Claudio Conversano, Gregory Farmakis, Francesco Mola, Roberta Siciliano, and Photis Stavropoulos. 2004. “New Ways of Specifying Data Edits.” Journal of the Royal Statistical Society, Series A 167 (2). Wiley Online Library: 249–74.

Plaisant, Catherine, Jesse Grosjean, and Benjamin B. Bederson. 2002. “SpaceTree: Supporting Exploration in Large Node Link Tree, Design Evolution and Empirical Evaluation.” In IEEE Symposium on Information Visualization, 57–64. IEEE.

Plank, Barbara, Anders Søgaard, and Yoav Goldberg. 2016. “Multilingual Part-of-Speech Tagging with Bidirectional Long Short-Term Memory Models and Auxiliary Loss.” In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 412–18. Berlin, Germany: Association for Computational Linguistics.

Plumb, Gregory, Denali Molitor, and Ameet Talwalkar. 2018. “Model Agnostic Supervised Local Explanations.” Proceedings of the 32nd International Conference on Neural Information Processing Systems (NIPS 18). Curran Associates Inc., Red Hook, NY, USA, 2520–2529.

Potash, Eric, Joe Brew, Alexander Loewi, Subhabrata Majumdar, Andrew Reece, Joe Walsh, Eric Rozier, Emile Jorgenson, Raed Mansour, and Rayid Ghani. 2015. “Predictive Modeling for Public Health: Preventing Childhood Lead Poisoning.” In Proceedings of the 21th Acm Sigkdd International Conference on Knowledge Discovery and Data Mining, 2039–47. KDD ’15. ACM.

Powell, Walter W. 2003. “Neither Market nor Hierarchy.” Sociology of Organizations: Classic, Contemporary, and Critical Readings 315: 104–17.

Powell, Walter W., Douglas R. White, Kenneth W. Koput, and Jason Owen-Smith. 2005. “Network Dynamics and Field Evolution: The Growth of Interorganizational Collaboration in the Life Sciences.” American Journal of Sociology 110 (4). JSTOR: 1132–1205.

President’s Council of Advisors on Science and Technology. 2014. “Big Data and Privacy: A Technological Perspective.” Washington, DC: Executive Office of the President.

Provost, Foster, and Tom Fawcett. 2013. Data Science for Business: What You Need to Know About Data Mining and Data-Analytic Thinking. O’Reilly Media.

Puts, Marco, Piet Daas, and Ton de Waal. 2015. “Finding Errors in Big Data.” Significance 12 (3). Wiley Online Library: 26–29.

R Core Team. 2013. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. http://www.R-project.org/.

Rabiner, Lawrence R. 1989. “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition.” Proceedings of the IEEE 77 (2): 257–86.

Ram, Karthik. 2013. “Git Can Facilitate Greater Reproducibility and Increased Transparency in Science.” Source Code for Biology and Medicine 8 (1).

Ramage, Daniel, David Hall, Ramesh Nallapati, and Christopher Manning. 2009. “Labeled LDA: A Supervised Topic Model for Credit Attribution in Multi-Labeled Corpora.” In Proceedings of Empirical Methods in Natural Language Processing.

Ramakrishnan, Raghu, and Johannes Gehrke. 2002. Database Management Systems, 3rd Edition. McGraw-Hill.

Reid, Giles, Felipa Zabala, and Anders Holmberg. 2017. “Extending Tse to Administrative Data: A Quality Framework and Case Studies from Stats Nz.” Journal of Official Statistics 33 (2): 477–511.

Reiter, Jerome P. 2012. “Statistical Approaches to Protecting Confidentiality for Microdata and Their Effects on the Quality of Statistical Inferences.” Public Opinion Quarterly 76 (1). AAPOR: 163–81.

Ribeiro, Marco Tulio, Sameer Singh, and Carlos Guestrin. 2016. “‘Why Should I Trust You?’: Explaining the Predictions of Any Classifier.” In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1135–44. San Francisco, CA, USA.

Richardson, Leonard. n.d. “Beautiful Soup.” http://www.crummy.com/software/BeautifulSoup/. Accessed February 1, 2020.

Ritter, Alan, Mausam, Oren Etzioni, and Sam Clark. 2012. “Open Domain Event Extraction from Twitter.” In Proceedings of the 18th Acm Sigkdd International Conference on Knowledge Discovery and Data Mining, 1104–12. KDD 12. New York, NY, USA: Association for Computing Machinery.

Rodolfa, K., E. Salomon, L. Haynes, I. Mendieta, J. Larson, and R. Ghani. 2020. “Predictive Fairness to Reduce Misdemeanor Recidivism Through Social Service Interventions.” In Proceedings of the ACM Conference on Fairness, Accountability, and Transparency (ACM FAT*) 2020.

Rubin, Donald B. 1976. “Inference and Missing Data.” Biometrika 63: 581–92.

Ruggles, Steven, Catherine Fitch, Diana Magnuson, and Jonathan Schroeder. 2019. “Differential Privacy and Census Data: Implications for Social and Economic Research.” AEA Papers; Proceedings (Vol. 109, pp. 403-08).

Russell Brandom. 2019. “Facebook has been charged with housing discrimination by the US government.” https://www.theverge.com/2019/3/28/18285178/facebook-hud-lawsuit-fair-housing-discrimination. Accessed February 12, 2020.

Sah, Shagan, Ameya Shringi, Raymond Ptucha, Aaron M. Burry, and Robert P. Loce. 2017. “Video Redaction: A Survey and Comparison of Enabling Technologies.” Journal of Electronic Imaging 26 (5): 1–14.

Saket, Bahador, Paolo Simonetto, Stephen Kobourov, and Katy Börner. 2014. “Node, Node-Link, and Node-Link-Group Diagrams: An Evaluation.” IEEE Transactions on Visualization and Computer Graphics 20 (12). IEEE: 2231–40.

Salton, Gerard. 1968. Automatic Information Organization and Retrieval. McGraw-Hill.

Samuel, Arthur L. 1959. “Some Studies in Machine Learning Using the Game of Checkers.” IBM Journal of Research and Development 3 (3). IBM: 210–29.

Sandhaus, Evan. 2008. “The New York Times Annotated Corpus.” Philadelphia: Linguistic Data Consortium, http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp? catalogId=LDC2008T19.

Saraiya, Purvi, Chris North, and Karen Duca. 2005. “An Insight-Based Methodology for Evaluating Bioinformatics Visualizations.” IEEE Transactions on Visualization and Computer Graphics 11 (4). IEEE: 443–56.

Schafer, Joseph L, and John W Graham. 2002. “Missing Data: Our View of the State of the Art.” Psychological Methods 7 (2). American Psychological Association: 147.

Schafer, Joseph L. 1997. Analysis of Incomplete Multivariate Data. CRC Press.

Schermann, Michael, Holmer Hemsen, Christoph Buchmüller, Till Bitter, Helmut Krcmar, Volker Markl, and Thomas Hoeren. 2014. “Big Data.” Business & Information Systems Engineering 6 (5). Springer: 261–66.

Scheuren, Fritz, and William E. Winkler. 1993. “Regression Analysis of Data Files That Are Computer Matched.” Survey Methodology 19 (1): 39–58.

Schierholz, Malte, Miriam Gensicke, Nikolai Tschersich, and Frauke Kreuter. 2018. “Occupation Coding During the Interview.” Journal of the Royal Statistical Society: Series A (Statistics in Society) 181 (2): 379–407.

Schnell, Rainer. 2014. “An Efficient Privacy-Preserving Record Linkage Technique for Administrative Data and Censuses.” Statistical Journal of the IAOS 30: 263–70.

———. 2016. “German Record Linkage Center.”

Schnell, Rainer, Tobias Bachteler, and Jörg Reiher. 2009. “Privacy-Preserving Record Linkage Using Bloom Filters.” BMC Medical Informatics and Decision Making 9 (1). BioMed Central Ltd: 41.

Schoenman, Julie A. 2012. “The Concentration of Health Care Spending.” NIHCM Foundation Data Brief. National Institute for Health Care Management.

Scholkopf, Bernhard, and Alexander J. Smola. 2001. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press.

Schwartz, Amy Ellen, Michele Leardo, Siddhartha Aneja, and Brian Elbel. 2016. “Effect of a School-Based Water Intervention on Child Body Mass Index and Obesity.” JAMA Pediatrics 170 (3). American Medical Association: 220–26.

Scott, Steven L., Alexander W. Blocker, Fernando V. Bonassi, H. Chipman, E. George, and R. McCulloch. 2013. “Bayes and Big Data: The consensus Monte Carlo Algorithm.” In EFaBBayes 250 Conference. Vol. 16.

Sethian, James A., Jean-Philippe Brunet, Adam Greenberg, and Jill P. Mesirov. 1991. “Computing Turbulent Flow in Complex Geometries on a Massively Parallel Processor.” In Proceedings of the 1991 ACM/IEEE Conference on Supercomputing, 230–41. ACM.

Severance, Charles. 2013. “Python for Informatics: Exploring Information.” http://www.pythonlearn.com/book.php; CreateSpace.

Shawe-Taylor, John, and Nello Cristianini. 2004. Kernel Methods for Pattern Analysis. Cambridge University Press.

Shelton, Taylor, Ate Poorthuis, Mark Graham, and Matthew Zook. 2014. “Mapping the Data Shadows of Hurricane Sandy: Uncovering the Sociospatial Dimensions of ‘Big Data’.” Geoforum 52. Elsevier: 167–79.

Shlomo, Natalie. 2014. “Probabilistic Record Linkage for Disclosure Risk Assessment.” In International Conference on Privacy in Statistical Databases, 269–82. Springer.

———. 2018. “Statistical Disclosure Limitation: New Directions and Challenges.” Journal of Privacy and Confidentiality 8 (1).

Shneiderman, Ben. 1992. “Tree Visualization with Tree-Maps: 2-D Space-Filling Approach.” ACM Transactions on Graphics 11 (1). ACM: 92–99.

———. 2008. “Extreme Visualization: Squeezing a Billion Records into a Million Pixels.” In Proceedings of the 2008 Acm Sigmod International Conference on Management of Data, 3–12. ACM.

Shneiderman, Ben, and Catherine Plaisant. 2015. “Sharpening Analytic Focus to Cope with Big Data Volume and Variety.” Computer Graphics and Applications, IEEE 35 (3). IEEE: 10–14.

Sies, Helmut. 1988. “A New Parameter for Sex Education.” Nature 332 (495). Nature Publishing Group.

Silberschatz, Abraham, Henry F. Korth, and S. Sudarshan. 2010. Database System Concepts, 6th Edition. McGraw-Hill.

Smalheiser, Neil R, and Vetle I Torvik. 2009. “Author Name Disambiguation.” Annual Review of Information Science and Technology 43 (1). Wiley Online Library: 1–43.

Smola, Alex J., and Bernhard Schölkopf. 2004. “A Tutorial on Support Vector Regression.” Statistics and Computing 14 (3). Kluwer Academic Publishers: 199–222.

Snow, John. 1855. On the Mode of Communication of Cholera. John Churchill.

Spielman, Seth E., and Alex Singleton. 2015. “Studying Neighborhoods Using Uncertain Data from the American Community Survey: A Contextual Approach.” Annals of the Association of American Geographers 105 (5): 1003–25.

Squire, Peverill. 1988. “Why the 1936 Literary Digest Poll Failed.” Public Opinion Quarterly 52 (1). AAPOR: 125–33.

Stanford Visualization Group. n.d. “Dorling Cartograms in ProtoVis.” http://mbostock.github.io/protovis/ex/cartogram.html. Accessed February 1, 2020.

Stanton, Mark W, and MK Rutherford. 2006. The High Concentration of Us Health Care Expenditures. Agency for Healthcare Research; Quality.

Stasko, John, Carsten Görg, and Zhicheng Liu. 2008. “Jigsaw: Supporting Investigative Analysis Through Interactive Visualization.” Information Visualization 7 (2). SAGE Publications: 118–32.

Steorts, Rebecca C, Rob Hall, and Stephen E Fienberg. 2014. “SMERED: A Bayesian Approach to Graphical Record Linkage and de-Duplication.” https://arxiv.org/abs/1312.4645.

Stephens-Davidowitz, S., and H. Varian. 2015. “A Hands-on Guide to Google Data.” http://people.ischool.berkeley.edu/~hal/Papers/2015/primer.pdf.

Stock, James H., and Mark W. Watson. 2002. “Forecasting Using Principal Components from a Large Number of Predictors.” Journal of the American Statistical Association 97 (460). Taylor & Francis: 1167–79.

Stopczynski, Arkadiusz, Vedran Sekara, Piotr Sapiezynski, Andrea Cuttone, Mette My Madsen, Jakob Eg Larsen, and Sune Lehmann. 2014. “Measuring Large-Scale Social Networks with High Resolution.” PloS One 9 (4). Public Library of Science.

Strandburg, Katherine J. 2014. “Monitoring, Datafication and Consent: Legal Approaches to Privacy in the Big Data Context.” In Privacy, Big Data, and the Public Good: Frameworks for Engagement, edited by Julia Lane, Victoria Stodden, Stefan Bender, and Helen Nissenbaum. Cambridge University Press.

Strasser, Carly. 2014. “Git/GitHub: A Primer for Researchers.” http://datapub.cdlib.org/2014/05/05/github-a-primer-for-researchers/.

Strauch, Christof. 2009. “NoSQL Databases.” http://www.christof-strauch.de/nosqldbs.pdf.

Stuart, Elizabeth A. 2010. “Matching Methods for Causal Inference: A Review and a Look Forward.” Statistical Science 25 (1). NIH Public Access: 1.

Sutton, Richard S., and Andrew G. Barto. 2018. Reinforcement Learning. an Introduction. Cambridge, MA: The MIT Press.

Sweeney, Latanya. 2001. “Computational Disclosure Control: A Primer on Data Privacy Protection.” MIT.

Szalay, Alexander S., Jim Gray, Ani R. Thakar, Peter Z. Kunszt, Tanu Malik, Jordan Raddick, Christopher Stoughton, and Jan vandenBerg. 2002. “The SDSS Skyserver: Public Access to the Sloan Digital Sky Server Data.” In Proceedings of the 2002 Acm Sigmod International Conference on Management of Data, 570–81. ACM.

Talley, Edmund M., David Newman, David Mimno, Bruce W. Herr, Hanna M. Wallach, Gully A. P. C. Burns, A. G. Miriam Leenders, and Andrew McCallum. 2011. “Database of NIH Grants Using Machine-Learned Categories and Graphical Clustering.” Nature Methods 8 (6): 443–44.

Tanner, Adam. 2013. “Harvard Professor Re-Identifies Anonymous Volunteers in DNA Study.” Forbes, http://www.forbes.com/sites/adamtanner/2013/04/25/harvard-professor-re-identifies-anonymous-volunteers-in-dna-study/#6cc7f6b43e39.

Tennekes, M., E. de Jonge, and Piet Daas. 2012. “Innovative Visual Tools for Data Editing.” Presented at the United Nations Economic Commission for Europe Work Session on Statistical Data. Available online at http://www.pietdaas.nl/beta/pubs/pubs/30_Netherlands.pdf.

Tennekes, Martijn, and Edwin de Jonge. 2011. “Top-down Data Analysis with Treemaps.” In Proceedings of the International Conference on Imaging Theory and Applications and International Conference on Information Visualization Theory and Applications, 236–41. SciTePress.

Tennekes, Martijn, Edwin de Jonge, and Piet J. H. Daas. 2013. “Visualizing and Inspecting Large Datasets with Tableplots.” Journal of Data Science 11 (1): 43–58.

The Northpointe Suite. 2016. “Response to ProPublica: Demonstrating accuracy equity and predictive parity.” https://www.equivant.com/response-to-propublica-demonstrating-accuracy-equity-and-predictive-parity/. Accessed February 12, 2020.

Thompson, William W., Lorraine Comanor, and David K. Shay. 2006. “Epidemiology of Seasonal Influenza: Use of Surveillance Data and Statistical Models to Estimate the Burden of Disease.” Journal of Infectious Diseases 194 (Supplement 2). Oxford University Press: S82–S91.

Tibshirani, Robert. 1996. “Regression Shrinkage and Selection via the Lasso.” Journal of the Royal Statistical Society, Series B 58. JSTOR: 267–88.

Trewin, D., A. Andersen, T. Beridze, L. Biggeri, I. Fellegi, and T. Toczynski. 2007. “Managing Statistical Confidentiality and Microdata Access: Principles and Guidelines of Good Practice.” Geneva: Conference of European Statisticians, United Nations Economic Commision for Europe.

Tuarob, Suppawong, Line C. Pouchard, and C. Lee Giles. 2013. “Automatic Tag Recommendation for Metadata Annotation Using Probabilistic Topic Modeling.” In Proceedings of the 13th ACM/IEEE-CS Joint Conference on Digital Libraries, 239–48. JCDL ’13. ACM.

Tufte, Edward. 2001. The Visual Display of Quantitative Information, 2nd Edition. Cheshire, CT: Graphics Press.

———. 2006. Beautiful Evidence, 2nd Edition. Cheshire, CT: Graphics Press.

Ugander, Johan, Brian Karrer, Lars Backstrom, and Cameron Marlow. 2011. “The Anatomy of the Facebook Social Graph.” https://arxiv.org/abs/1111.4503.

University of Oxford. 2006. “British National Corpus.” http://www.natcorp.ox.ac.uk/.

UnivTask Force on Differential Privacy for Census Data. 2019. “Implications of Differential Privacy for Census Bureau Data and Scientific Research.” Minneapolis, MN, USA: Institute for Social Research; Data Innovation, University of Minnesota.

Ustun, Berk, and Cynthia Rudin. 2016. “Supersparse Linear Integer Models for Optimized Medical Scoring Systems.” Machine Learning 102: 349–91.

———. 2019. “Learning Optimized Risk Scores.” Journal of Machine Learning Research 20 (150): 1–75.

Valentino-DeVries, Josephine, Natasha Singer, Michael Keller, and Aaron Krolick. 2018. “Your Apps Know Where You Were Last Night, and They’re Not Keeping It Secret.” New York, New York, USA.

Valliant, Richard, Jill A. Dever, and Frauke Kreuter. 2018. Practical Tools for Designing and Weighting Survey Samples. 2nd Edition. Springer.

Varian, Hal R. 2014. “Big Data: New Tricks for Econometrics.” Journal of Economic Perspectives 28 (2): 3–28.

Ventura, Samuel L., Rebecca Nugent, and Erica R. H. Fuchs. 2015. “Seeing the Non-Stars:(Some) Sources of Bias in Past Disambiguation Approaches and a New Public Tool Leveraging Labeled Records.” Research Policy. Elsevier.

Vigen, Tyler. 2015. Spurious Correlations. Hachette Books.

Voigt, Rob, Nicholas P. Camp, Vinodkumar Prabhakaran, William L. Hamilton, Rebecca C. Hetey, Camilla M. Griffiths, David Jurgens, Dan Jurafsky, and Jennifer L. Eberhardt. 2017. “Language from Police Body Camera Footage Shows Racial Disparities in Officer Respect.” Proceedings of the National Academy of Sciences 114 (25): 6521–6.

Wallach, Hanna, David Mimno, and Andrew McCallum. 2009. “Rethinking LDA: Why Priors Matter.” In Advances in Neural Information Processing Systems.

Wallgren, Anders, and Britt Wallgren. 2007. Register-Based Statistics: Administrative Data for Statistical Purposes. John Wiley & Sons.

Wang, Chong, David Blei, and Li Fei-Fei. 2009. “Simultaneous Image Classification and Annotation.” In Computer Vision and Pattern Recognition.

Wang, Yi, Hongjie Bai, Matt Stanton, Wen-Yen Chen, and Edward Y. Chang. 2009. “PLDA: Parallel Latent Dirichlet Allocation for Large-Scale Applications.” In International Conference on Algorithmic Aspects in Information and Management.

Ward, Kenneth Church. 2017. “Word2Vec.” Natural Language Engineering 23 (1). Cambridge University Press: 155–62.

Ward, Matthew O., Georges Grinstein, and Daniel Keim. 2010. Interactive Data Visualization: Foundations, Techniques, and Applications. CRC Press.

Weinberg, Bruce A., Jason Owen-Smith, Rebecca F. Rosen, Lou Schwarz, Barbara McFadden Allen, Roy E. Weiss, and Julia Lane. 2014. “Science Funding and Short-Term Economic Activity.” Science 344 (6179). American Association for the Advancement of Science: 41–43.

Wezerek, Gus, and David Van Riper. 2020. “Changes to the Census Could Make Small Towns Disappear.” New York, New York, USA.

Whang, Steven Euijong, David Menestrina, Georgia Koutrika, Martin Theobald, and Hector Garcia-Molina. 2009. “Entity Resolution with Iterative Blocking.” In Proceedings of the 2009 Acm Sigmod International Conference on Management of Data, 219–32. ACM.

White, Harrison C., Scott A. Boorman, and Ronald L. Breiger. 1976. “Social Structure from Multiple Networks. I. Block Models of Roles and Positions.” American Journal of Sociology 81. JSTOR: 730–80.

Wick, Michael, Sameer Singh, Harshal Pandya, and Andrew McCallum. 2013. “A Joint Model for Discovering and Linking Entities.” In Proceedings of the 2013 Workshop on Automated Knowledge Base Construction, 67–72. ACM.

Wikipedia. n.d. “Representational State Transfer.” https://en.wikipedia.org/wiki/Representational_state_transfer. Accessed February 1, 2020.

Wilbanks, John. 2014. “Portable Approaches to Informed Consent and Open Data.” In Privacy, Big Data, and the Public Good: Frameworks for Engagement, edited by Julia Lane, Victoria Stodden, Stefan Bender, and Helen Nissenbaum, 98–112. Cambridge University Press.

Winkler, William E. 2009. “Record Linkage.” In Handbook of Statistics 29a, Sample Surveys: Design, Methods and Applications, edited by Danny Pfeffermann and C. R. Rao, 351–80. Elsevier.

———. 2014. “Matching and Record Linkage.” Wiley Interdisciplinary Reviews: Computational Statistics 6 (5). John Wiley & Sons, Inc.: 313–25.

Wongsuphasawat, Krist, and Jimmy Lin. 2014. “Using Visualizations to Monitor Changes and Harvest Insights from a Global-Scale Logging Infrastructure at Twitter.” In Proceedings of the Ieee Conference on Visual Analytics Science and Technology, 113–22. IEEE.

Wu, Xindong, Vipin Kumar, J. Ross Quinlan, Joydeep Ghosh, Qiang Yang, Hiroshi Motoda, Geoffrey J. McLachlan, et al. 2008. “Top 10 Algorithms in Data Mining.” Knowledge and Information Systems 14 (1). Springer: 1–37.

Wuchty, Stefan, Benjamin F Jones, and Brian Uzzi. 2007. “The Increasing Dominance of Teams in Production of Knowledge.” Science 316 (5827). American Association for the Advancement of Science: 1036–9.

Yarkoni, Tal, Dean Eckles, James Heathers, Maggie Levenstein, Paul Smaldino, and Julia I. Lane. 2019. “Enhancing and Accelerating Social Science via Automation: Challenges and Opportunities.” DARPA.

Yates, Dave, and Scott Paquette. 2010. “Emergency Knowledge Management and Social Media Technologies: A Case Study of the 2010 Haitian Earthquake.” In Proceedings of the 73rd Asis&T Annual Meeting on Navigating Streams in an Information Ecosystem. Vol. 47. ASIS&T ’10. Silver Springs, MD: American Society for Information Science.

Yost, Beth, Yonca Haciahmetoglu, and Chris North. 2007. “Beyond Visual Acuity: The Perceptual Scalability of Information Visualizations for Large Displays.” In Proceedings of the Sigchi Conference on Human Factors in Computing Systems, 101–10. ACM.

Zachary, Wayne W. 1977. “An Information Flow Model for Conflict and Fission in Small Groups.” Journal of Anthropological Research 33 (4): 452–73.

Zafar, Muhammad Bilal, Isabel Valera, Manuel Gomez Rogriguez, and Krishna P. Gummadi. 2017. “Fairness Constraints: Mechanisms for Fair Classification.” In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, edited by Aarti Singh and Jerry Zhu, 54:962–70. Proceedings of Machine Learning Research. Fort Lauderdale, FL, USA: PMLR.

Zayatz, Laura. 2007. “Disclosure Avoidance Practices and Research at the US Census Bureau: An Update.” Journal of Official Statistics 23 (2). Statistics Sweden (SCB): 253–65.

Zemel, Rich, Yu Wu, Kevin Swersky, Toni Pitassi, and Cynthia Dwork. 2013. “Learning Fair Representations.” In Proceedings of the 30th International Conference on Machine Learning, edited by Sanjoy Dasgupta and David McAllester, 28:325–33. Proceedings of Machine Learning Research 3. Atlanta, Georgia, USA: PMLR.

Zeng, Qing T., Doug Redd, Thomas C. Rindflesch, and Jonathan R. Nebeker. 2012. “Synonym, Topic Model and Predicate-Based Query Expansion for Retrieving Clinical Documents.” In American Medical Informatics Association Annual Symposium, 1050–9.

Zhang, Li-Chun. 2012. “Topics of Statistical Theory for Register-Based Statistics and Data Integration.” Statistica Neerlandica 66 (1): 41–63.

Zhu, Jun, Ning Chen, Hugh Perkins, and Bo Zhang. 2013. “Gibbs Max-Margin Topic Models with Fast Sampling Algorithms.” In Proceedings of the International Conference of Machine Learning.

Zhu, Xiaojin. 2008. “Semi-Supervised Learning Literature Survey.” http://pages.cs.wisc.edu/~jerryzhu/pub/ssl_survey.pdf.

Zolas, Nikolas, Nathan Goldschlag, Ron Jarmin, Paula Stephan, Jason Owen-Smith, Rebecca F Rosen, Barbara McFadden Allen, Bruce A Weinberg, and Julia Lane. 2015. “Wrapping It up in a Person: Examining Employment and Earnings Outcomes for Ph.D. Recipients.” Science 350 (6266). American Association for the Advancement of Science: 1367–71.

Zygmunt, Z. 2013. “Machine Learning Courses Online.” http://fastml.com/machine-learning-courses-online.