U.S. flag

An official website of the United States government

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

National Academies of Sciences, Engineering, and Medicine; Division on Engineering and Physical Sciences; Board on Mathematical Sciences and Their Applications; Committee on Applied and Theoretical Statistics. Refining the Concept of Scientific Inference When Working with Big Data: Proceedings of a Workshop. Washington (DC): National Academies Press (US); 2017 Feb 24.

Cover of Refining the Concept of Scientific Inference When Working with Big Data

Refining the Concept of Scientific Inference When Working with Big Data: Proceedings of a Workshop.

Show details

References

  • Angrist JD, Evans WN. Children and their parents' labor supply: Evidence from exogenous variation in family size. The American Economic Review. 1998;88(3):450–477.
  • Barnett I, Mukherjee R, Lin X. Journal of the American Statistics Association. 2016. The generalized higher criticism for testing SNP-set effects in genetic association studies. (accepted). http://dx​.doi.org/10​.1080/01621459.2016.1192039. [PMC free article: PMC5517103] [PubMed: 28736464]
  • Bazot C, Dobigeon N, Tourneret JY, Zaas AK, Ginsburg GS, Hero AO. Unsupervised Bayesian linear unmixing of gene expression microarrays. BMC Bioinformatics. 2013;14 [PMC free article: PMC3681645] [PubMed: 23506672] [CrossRef]
  • Begley S. The best medicine. Scientific American. 2011;305(1):50–55. [PubMed: 21717958]
  • Belloni A, Chernozhukov V, Wang L. Pivotal estimation via square-root lasso in nonparametric regression. Annals of Statistics. 2014;42(2):757–788.
  • Benjamini Y. Simultaneous and selective inference: Current successes and future challenges. Biometrical Journal. 2010;52(6):708–721. [PubMed: 21154895]
  • Benjamini Y, Hochberg Y. Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society Series B. 1995;57(1):289–300.
  • Box GEP. Robustness in the strategy of scientific model building. In: Launer RL, Wilkinson GN, editors. Robustness in Statistics. Academic Press Inc.; New York: 1979. pp. 201–236.
  • Box GEP, Jenkins GM, Reinsel GC. Time Series Analysis: Forecasting and Control. 3rd ed. Englewood Cliffs; N.J.: Prentice Hall: 1994.
  • Brown EN, Purdon PL, Van Dort CJ. General anesthesia and altered states of arousal: A systems neuroscience analysis. Annual Review of Neuroscience. 2011;34:601–628. [PMC free article: PMC3390788] [PubMed: 21513454]
  • Bühlmann P, van de Geer S. Statistics for High-Dimensional Data: Methods, Theory, and Applications. New York: Springer Science and Business Media; 2011.
  • Chen R, Mias GI, Li-Pook-Than J, Jiang L, Lam HYK, Chen R, Miriami E. Personal omics profiling reveals dynamic molecular and medical phenotypes. Cell. 2012;148(6):1293–1307. [PMC free article: PMC3341616] [PubMed: 22424236]
  • Ching S, Cimenser A, Purdon PL, Brown EN, Kopell NJ. Thalamocortical model for a propofol-induced α-rhythm associated with loss of consciousness. Proceedings of the National Academy of Sciences. 2010;107(52):22665–22670. [PMC free article: PMC3012501] [PubMed: 21149695]
  • Cimenser A, Purdon PL, Pierce ET, Walsh JL, Salazar-Gomez AF, Harrell PG, Tavares-Stoeckel C, Habeeb K, Brown EN. Tracking brain states under general anesthesia by using global coherence analysis. Proceedings of the National Academy of Sciences. 2011;108(21):8832–8837. [PMC free article: PMC3102391] [PubMed: 21555565]
  • Cornelissen L, Kim SE, Purdon PL, Brown EN, Berde CB. Age-dependent electroencephalogram (EEG) patterns during sevoflurane general anesthesia in infants. ELife. 2015;4 [PMC free article: PMC4502759] [PubMed: 26102526] [CrossRef]
  • Dattner I, Klaassen CA. Optimal rate of direct estimators in systems of ordinary differential equations linear in functions of the parameters. Electronic Journal of Statistics. 2015;9(2):1939–1973.
  • Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L. Imagenet: A large-scale hierarchical image database. IEEE Computer Society Conference on Computer Vision and Pattern Recognition; Miami, Fla.: 2009. [June 20-25].
  • Duchi JC, Jordan MI, Wainwright MJ. Privacy aware learning. Journal of the ACM. 2014;61(6) [CrossRef]
  • Firouzi H, Rajaratnam B, Hero AO. Two-stage sampling, prediction and adaptive regression via correlation screening (sparcs). IEEE Transactions on Information Theory. 2017;63(1):698–714.
  • Fithian W, Sun D, Taylor J. Optimal inference after model selection. 2014. (arXiv preprint arXiv:1410.2597).
  • FTC (Federal Trade Commission). Big Data: A Tool for Inclusion or Exclusion? Understanding the Issues. 2016. https://www​.ftc.gov/system​/files/documents​/reports/big-data-tool-inclusion-orexclusion-understanding-issues​/160106big-data-rpt.pdf.
  • Genberg BL, Hogan JW, Braitstein P. Home testing and counselling with linkage to care. The Lancet HIV. 2016;3(6):e244–e246. [PMC free article: PMC8573678] [PubMed: 27240786]
  • Haneuse S, Daniels M. A general framework for considering selection bias in EHR-based studies: What data are observed and why? eGEMS (Generating Evidence & Methods to Improve Patient Outcomes). 2016;4(1) article 16. doi: http://dx​.doi.org/10​.13063/2327-9214.1203. [PMC free article: PMC5013936] [PubMed: 27668265]
  • Haris A, Witten D, Simon N. Convex modeling of interactions with strong heredity. Journal of Computational and Graphical Statistics. 2016;25(4):981–1004. [PMC free article: PMC5353363] [PubMed: 28316461]
  • Hawkes AG. Spectra of some self-exciting and mutually exciting point processes. Biometrika. 1971;58(1):83–90.
  • Henderson J, Michailidis G. Network reconstruction using nonparametric additive ODE models. PLoS ONE. 2014;9(4) [PMC free article: PMC3986056] [PubMed: 24732037] [CrossRef]
  • Hero AO, Rajaratnam B. Large-scale correlation screening. Journal of the American Statistical Association. 2011;106(496):1540–1552.
  • Hero AO, Rajaratnam B. Hub discovery in partial correlation graphs. IEEE Transactions on Information Theory. 2012;58(9):6064–6078.
  • Hero AO, Rajaratnam B. Foundational principles for large-scale inference: Illustrations through correlation mining. Proceedings of the IEEE. 2016;104(1):93–110. [PMC free article: PMC4827453] [PubMed: 27087700]
  • Hsiao KJ, Kulesza A, Hero AO. Social collaborative retrieval. IEEE Journal of Selected Topics in Signal Processing. 2014;8(4):680–689.
  • Huang Y. Integrative statistical learning with applications to predicting features of diseases and health [Ph.D. thesis]. University of Michigan; Ann Arbor, Mich.: 2011.
  • Huang Y, Zaas AK, Rao A, Dobigeon N, Woolf PJ, Veldman T, Øien NC. Temporal dynamics of host molecular responses differentiate symptomatic and asymptomatic influenza A infection. PLoS Genetics. 2011;7(8) [PMC free article: PMC3161909] [PubMed: 21901105] [CrossRef]
  • Hurvich CM, Tsai CL. The impact of model selection on inference in linear regression. The American Statistician. 1990;44(3):214–217.
  • Hurvich CM, Zeger S. Frequency Domain Bootstrap Methods for Time Series. New York: New York University; 1987.
  • Imai K, King G, Stuart EA. Misunderstandings between experimentalists and observationalists about causal inference. Journal of the Royal Statistical Society: Series A (Statistics in Society). 2008;171(2):481–502.
  • Irizarry RA, Wang C, Zhou Y, Speed TP. Gene set enrichment analysis made simple. Statistical Methods in Medical Research. 2009;18(6):565–575. [PMC free article: PMC3134237] [PubMed: 20048385]
  • Joffe MM, Yang WP, Feldman HI. Selective ignorability assumptions in causal inference. The International Journal of Biostatistics. 2010;6(2) [PubMed: 21969995] [CrossRef]
  • Kass RE, Ventura V, Brown EN. Statistical issues in the analysis of neuronal data. Journal of Neurophysiology. 2005;94(1):8–25. [PubMed: 15985692]
  • Langfelder P, Mischel PS, Horvath S. When is hub gene selection better than standard meta-analysis? PLoS ONE. 2013;8(4) [PMC free article: PMC3629234] [PubMed: 23613865] [CrossRef]
  • Lee JD, Sun Y, Taylor JE. Proceedings of the 26th International Conference on Neural Information Processing Systems. Lake Tahoe, Nev.: 2013. [December 5-10]. On model selection consistency of M-estimators with geometrically decomposable penalties; pp. 342–350.
  • Lee JD, Sun DL, Sun Y, Taylor JE. Exact post-selection inference, with application to the lasso. Annals of Statistics. 2016;44(3):907–927.
  • Liu TY, Burke T, Park LP, Woods CW, Zaas AK, Ginsburg GS, Hero AO. An individualized predictor of health and disease using paired reference and target samples. BMC Bioinformatics. 2016;17 [PMC free article: PMC4722633] [PubMed: 26801061] [CrossRef]
  • Lock EF, Hoadley KA, Marron JS, Nobel AB. Joint and individual variation explained (JIVE) for integrated analysis of multiple data types. Annals of Applied Statistics. 2013;7(1):523–542. [PMC free article: PMC3671601] [PubMed: 23745156]
  • Lockhart R, Taylor J, Tibshirani RJ, Tibshirani R. A significance test for the lasso. Annals of Statistics. 2014;42(2):413–468. [PMC free article: PMC4285373] [PubMed: 25574062]
  • Loh PL, Wainwright MJ. High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity. In: Shawe-Taylor J, Zemel RS, Bartlett PL, Pereira F, Weinberger KQ, editors. Advances in Neural Information Processing Systems. Cambridge, Mass.: MIT Press; 2011. pp. 2726–2734.
  • Meng Z, Wei D, Wiesel A, Hero AO III. Proceedings of the 16th International Conference on Artificial Intelligence and Statistics (AISTATS). Scottsdale, Ariz.: 2013. [April 29-May 1]. Distributed learning of Gaussian graphical models via marginal likelihoods. http://www​.jmlr.org/proceedings​/papers/v31/meng13a.pdf.
  • Miettinen OS. The need for randomization in the study of intended effects. Statistics in Medicine. 1983;2(2):267–271. [PubMed: 6648141]
  • Morris AP, Voight BF, Teslovich TM, Ferreira T, Segre AV, Steinthorsdottir V, Strawbridge RJ. Large-scale association analysis provides insights into the genetic architecture and pathophysiology of type 2 diabetes. Nature Genetics. 2012;44(9):981–990. [PMC free article: PMC3442244] [PubMed: 22885922]
  • Mukherjee R, Pillai NS, Lin X. Hypothesis testing for high-dimensional sparse binary regression. Annals of Statistics. 2015;43(1):352–381. [PMC free article: PMC4522432] [PubMed: 26246645]
  • NITRD/NCO (National Coordination Office for Networking and Information Technology Research and Development). The Federal Big Data Research and Development Strategic Plan. Washington, D.C.: National Science and Technology Council; 2016. https://www​.whitehouse​.gov/sites/default​/files/microsites/ostp​/NSTC/bigdatardstrategicplan-nitrd​_final-051916.pdf.
  • NRC (National Research Council). Frontiers in Massive Data Analysis. Washington, D.C.: The National Academies Press; 2013.
  • NRC. Training Students to Extract Value from Big Data: Summary of a Workshop. Washington, D.C.: The National Academies Press; 2014. [PubMed: 26065052]
  • NSCI (National Strategic Computing Initiative). National Strategic Computing Initiative Strategic Plan. 2016. https://www​.whitehouse​.gov/sites/whitehouse​.gov/files/images​/NSCI%20Strategic%20Plan.pdf.
  • Pearl J. Causality: Models, Reasoning, and Inference. Cambridge, U.K.: Cambridge University Press; 2000.
  • Pillow JW, Shlens J, Paninski L, Sher A, Litke AM, Chichilnisky EJ, Simoncelli EP. Spatio-temporal correlations and visual signalling in a complete neuronal population. Nature. 2008;454(7207):995–999. [PMC free article: PMC2684455] [PubMed: 18650810]
  • Poole D, Raftery AE. Inference for deterministic simulation models: The Bayesian melding approach. Journal of the American Statistical Association. 2000;95(452):1244–1255.
  • Purdon PL, Pierce ET, Mukamel EA, Prerau MJ, Walsh JL, Wong KFK, Salazar-Gomez AF. Electroencephalogram signatures of loss and recovery of consciousness from propofol. Proceedings of the National Academy of Sciences. 2013;110(12):E1142–E1151. [PMC free article: PMC3607036] [PubMed: 23487781]
  • Radchenko P, James GM. Variable selection using adaptive nonlinear interaction structures in high dimensions. Journal of the American Statistical Association. 2010;105(492):1541–1553.
  • Ramos EA. Resampling methods for time series [PhD Dissertation]. Department of Statistics, Harvard University; Cambridge, Mass.: 1988.
  • Rau A, Marot G, Jaffrézic F. Differential meta-analysis of RNA-seq data from multiple studies. BMC Bioinformatics. 2014;15 [PMC free article: PMC4021464] [PubMed: 24678608] [CrossRef]
  • Ravikumar P, Lafferty J, Liu H, Wasserman L. Sparse additive models. Journal of the Royal Statistical Society: Series B (Statistical Methodology). 2009;71(5):1009–1030.
  • Shen R, Olshen AB, Ladanyi M. Integrative clustering of multiple genomic data types using a joint latent variable model with application to breast and lung cancer subtype analysis. Bioinformatics. 2009;25(22):2906–2912. [PMC free article: PMC2800366] [PubMed: 19759197]
  • Simon N, Tibshirani R. Standardization and the group lasso penalty. Statistica Sinica. 2012;22(3):983–1001. [PMC free article: PMC4527185] [PubMed: 26257503]
  • Singh R, Xu J, Berger B. Global alignment of multiple protein interaction networks with application to functional orthology detection. Proceedings of the National Academy of Sciences. 2008;105(35):12763–12768. [PMC free article: PMC2522262] [PubMed: 18725631]
  • Smith M. “Computer Science for All.” White House Blog, January 30. 2016. https://www​.whitehouse​.gov/blog/2016/01/30​/computer-science-all.
  • Song S, Chaudhuri K, Sarwate AD. Proceedings of the 18th International Conference on Artificial Intelligence and Statistics (AISTATS). San Diego, Calif.: May 9-12, 2015. Learning from data with heterogeneous noise using SGD. http://www​.jmlr.org/proceedings​/papers/v38/song15.pdf.
  • Sripada C, Kessler D, Fang Y, Welsh RC, Prem Kumar K, Angstadt M. Disrupted network architecture of the resting brain in attention-deficit/hyperactivity disorder. Human Brain Mapping. 2014;35(9):4693–4705. [PMC free article: PMC6869736] [PubMed: 24668728]
  • Sun T, Zhang CH. Scaled sparse linear regression. Biometrika. 2012;99(4) [CrossRef]
  • Tian X, Taylor JE. Selective inference with a randomized response. 2015. (arXiv preprint arXiv:1507.06739).
  • Tian X, Loftus JR, Taylor JE. Selective inference with unknown variance via the square-root LASSO. 2015. (arXiv preprint arXiv:1504.08031).
  • Tukey JW. Exploratory Data Analysis. Boston, Mass.: Pearson; 1977.
  • van de Geer S, Bühlmann P, Ritov YA, Dezeure R. On asymptotically optimal confidence regions and tests for high-dimensional models. Annals of Statistics. 2014;42(3):1166–1202.
  • VanderWeele TJ. Invited commentary: Structural equation models and epidemiologic analysis. American Journal of Epidemiology. 2012;167(7):608–612. [PMC free article: PMC3530375] [PubMed: 22956513]
  • Wainwright MJ, Jordan MI. Graphical models, exponential families, and variational inference. Foundations and Trends in Machine Learning. 2008;1(1-2):1–305.
  • Wang K, Narayanan M, Zhong H, Tompa M, Schadt EE, Zhu J. Meta-analysis of inter-species liver co-expression networks elucidates traits associated with common human diseases. PLoS Computational Biology. 2009;5(12) [PMC free article: PMC2787626] [PubMed: 20019805] [CrossRef]
  • Wasserman L, Roeder K. High-dimensional variable selection. Annals of Statistics. 2009;37(5A):2178–2201. [PMC free article: PMC2752029] [PubMed: 19784398]
  • Wilson JD, Wang S, Mucha PJ, Bhamidi S, Nobel AB. A testing-based extraction algorithm for identifying significant communities in networks. Annals of Applied Statistics. 2014;8(3):1853–1891.
  • Woods CW, McClain MT, Chen M, Zaas AK, Nicholson BP, Varkey J, Veldman T. A host transcriptional signature for presymptomatic detection of infection in humans exposed to influenza H1N1 or H3N2. PLoS ONE. 2013;8(1) [PMC free article: PMC3541408] [PubMed: 23326326] [CrossRef]
  • Wu H, Lu T, Xue H, Liang H. Sparse additive ordinary differential equations for dynamic gene regulatory network modeling. Journal of the American Statistical Association. 2014;109(506):700–716. [PMC free article: PMC4104722] [PubMed: 25061254]
  • Wu MC, Lee S, Cai T, Li Y, Boehnke M, Lin X. Rare-variant association testing for sequencing data with the sequence kernel association test. American Journal of Human Genetics. 2011;89(1):82–93. [PMC free article: PMC3135811] [PubMed: 21737059]
  • Ying R, Sharma M, Celum C, Baeten JM, van Rooyen H, Hughes JP, Garnett G, Barnabas RV. Home testing and counselling to reduce HIV incidence in a generalised epidemic setting: A mathematical modelling analysis. The Lancet HIV. 2016;3(6):e275–e282. [PMC free article: PMC4927306] [PubMed: 27240790]
  • Yuan M, Lin Y. Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology). 2006;68(1):49–67.
  • Zaas AK, Burke T, Chen M, McClain M, Nicholson B, Veldman T, Tsalik EL. A host-based RT-PCR gene expression signature to identify acute respiratory viral infection. Science Translational Medicine. 2013;5(203) [PMC free article: PMC4286889] [PubMed: 24048524] [CrossRef]
  • Zhao P, Yu B. On model selection consistency of Lasso. Journal of Machine Learning Research. 2006;7:2541–2563.
  • Zhu R, Zeng D, Kosorok MR. Reinforcement learning trees. Journal of the American Statistical Association. 2015;110(512):1770–1784. [PMC free article: PMC4760114] [PubMed: 26903687]
  • Zubizarreta JR, Small DS, Rosenbaum PR. Isolation in the construction of natural experiments. Annals of Applied Statistics. 2014;8(4):2096–2121.
Copyright 2017 by the National Academy of Sciences. All rights reserved.
Bookshelf ID: NBK424917

Views

  • PubReader
  • Print View
  • Cite this Page
  • PDF version of this title (2.5M)

Related information

  • PMC
    PubMed Central citations
  • PubMed
    Links to PubMed

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...