Richard Gonzalez
Center Director, Research Center for Group Dynamics, Institute for Social Research
Director, BioSocial Methods Collaborative, RCGD
Amos N Tversky Collegiate Professor, Psychology and Statistics, LSA
Professor of Marketing, Stephen M Ross School of Business
Professor of Integrative Systems and Design, College of Engineering
E-mail: | Email Richard Gonzalez |
Address: | Research Center for Group Dynamics Institute for Social Research University of Michigan 426 Thompson Street Ann Arbor, Michigan 48106 |
Phone: | 734-647-6785 |
Machine learning and the selection of statistical interactions
Narisetty, N., Mukherjee, B., Chen, Y., Gonzalez, R., & Meeker, J. (2019). Selection of nonlinear interactions by a forward stepwise algorithm: Application to identifying environmental chemical mixtures affecting health outcomes. Statistics in Medicine, 38, 1582-1600. doi:10.1002/sim.8059 PDF
Abstract
In this paper, we propose a stepwise forward selection algorithm for detecting the effects of a set of correlated exposures and their interactions on a health outcome of interest when the underlying relationship could potentially be nonlinear. Though the proposed method is very general, our application in this paper remains to be on analysis of multiple pollutants and their interactions. Simultaneous exposure to multiple environmental pollutants could affect human health in a multitude of complex ways. For understanding the health effects of multiple environmental exposures, it is often important to identify and estimate complex interactions among exposures. However, this issue becomes analytically challenging in the presence of potential nonlinearity in the outcome-exposure response surface and a set of correlated exposures. Through simulation studies and analyses of test datasets that were simulated as a part of a data challenge in multipollutant modeling organized by the National Institute of Environmental Health Sciences (http://www.niehs.nih. gov/about/events/pastmtg/2015/statistical/), we illustrate the advantages of our proposed method in comparison with existing alternative approaches. A particular strength of our method is that it demonstrates very low false positives across empirical studies. Our method is also used to analyze a dataset that was released from the Health Outcomes and Measurement of the Environment Study as a benchmark beta-tester dataset as a part of the same workshop.