Home Registration Speakers Program Abstracts Posters Direction Sponsors

Keynote Talk




COVID-19 Vaccine Efficacy Trials and “Immune Correlates of Protection” in the Moderna COVE Trial

Speaker: Peter Gilbert

Abstract:

Randomized, double-blind phase 3 COVID-19 vaccine efficacy trials assess how well candidate vaccines prevent infection and disease caused by the SARS-CoV-2 virus. The NIH-supported COVID-19 Prevention Network (CoVPN) is co-conducting (with vaccine manufacturers) five such phase 3 trials, which include the objective to assess post-vaccination antibody biomarkers as various types of “immune correlates of protection (CoPs).” CoPs can be formally defined using several statistical frameworks, including risk prediction, treatment effect modification, treatment effect mediation, and surrogate/replacement endpoint evaluation. An ultimate application of the statistical analyses is to help define surrogate endpoints that can constitute the basis for traditional or accelerated approval of vaccines. In this talk I will first summarize the statistical design of the COVID-19 vaccine efficacy trials, and secondly describe the results of the first CoVPN study (Moderna COVE study) yielding results on CoPs. In COVE, antibody markers were measured via a two-phase case-cohort sampling design. Based on controlled effects causal inference, vaccine recipients with post-vaccination neutralizing antibody titers that had value undetectable (< 2.42), 100, or 1000, respectively, vaccine efficacy estimates were 51% (95% CI -51, 83%), 91% (87, 94%), and 96% (94, 98%), respectively. Moreover, based on natural effects mediation analysis, an estimated 68% (58, 78%) of the vaccine's overall efficacy was mediated through neutralizing antibody titers. These results help define a biomarker with utility to influence decision-making for COVID-19 vaccines. Statistical issues in assessing and interpreting CoPs are discussed.

Biography:

Dr. Peter Gilbert, Professor of Biostatistics at the Fred Hutch and University of Washington Department of Biostatistics, focuses on the design and analysis of clinical trials of candidate vaccines for HIV, COVID-19 and other infectious diseases. He specializes in statistical methods and data analyses of these trials to understand how vaccine efficacy depends on immune responses to vaccination and on genetic features of infectious pathogens, so-called sieve analysis. He is PI of the Statistical Center for the NIAID HIV Vaccine Trials Network, and plays a similar role for US Government supported Coronavirus Prevention Network COVID vaccine efficacy trials.




Plenary Talks




Adaptive Spectral Analysis and Learning of Nonstationary Time Series

Speaker: Robert Krafty

Abstract:

Technological advances have led to an explosion in researchers' and clinicians' ability to collect high-dimensional time series signals. The ability to fully utilize these data has been inhibited by shortage of parsimonious yet flexible methods that can be used to learn and conduct inference on dynamic frequency patterns. Motivated by high-density EEG (hdEEG) from a patient receiving transcranial magnetic stimulation (TMS) while hospitalized for a first-break psychotic episode, in this talk we discuss a nonparametric approach to the spectral analysis of a high-dimensional multivariate nonstationary time series signal. The procedure is based on a novel frequency-domain factor model that provides a flexible yet parsimonious representation of spectral matrices from a large number of simultaneously observed time series (e.g. EEG from many locations in the brain). Formulated in a fully Bayesian framework, the time series is adaptively partitioned into approximately stationary segments, where both the number and location of partition points are assumed unknown. Stochastic approximation Monte Carlo (SAMC) techniques are used to accommodate the unknown number of segments, and a conditional Whittle likelihood-based Gibbs sampler is developed for efficient sampling within segments. By averaging over the distribution of partitions, the proposed method can approximate both abrupt and slowly varying changes in spectral matrices, and flexibly learn dynamic frequency patterns.

Biography:

Dr. Robert Krafty joined Emory University in September 2020 as Rollins Distinguished Professor and Chair of Biostatistics and Bioinformatics. He previously served on the faculty of the University of Pittsburgh and Temple University. Dr. Krafty leads a transdisciplinary research group that develops and applies methods for using information contained in biomedical and psychosocial time series data to develop and deploy behavioral interventions to improve mental health. Examples of his work include the use of actigraphy and other mobile devices to develop interventions to avoid major depressive episodes and suicidality in recently bereaved adults, and in the combined use of EEG and heart rate variability to inform interventions to improve health and functioning in those serving as a caregiver for a spouse with dementia.


Statistical Computing Meets Quantum Computing

Speaker: Ping Ma

Abstract:

With the rapid development of quantum computers, quantum computing has been studied extensively. Unlike electronic computers, a quantum computer operates on quantum processing units, or qubits, which can take values 0, 1, or both simultaneously due to the superposition property. The number of complex numbers required to characterize quantum states usually grows exponentially with the size of the system. For example, a quantum system with p qubits can be in any superposition of 2^p orthonormal states simultaneously, while a classical system can only be in one state at a time. Such a paradigm change has motivated significant developments of scalable quantum algorithms in many areas. However, quantum algorithms tackling statistical problems are still lacking. In this talk, I will present challenges and opportunities for developing quantum algorithms. I will introduce a novel quantum algorithm for a statistical problem.

Biography:

Dr. Ping Ma is a Distinguished Research Professor at the University of Georgia and co-directs the big data analytics lab. He was a Beckman Fellow at the Center for Advanced Study at the University of Illinois at Urbana-Champaign, a Faculty Fellow at the US National Center for Supercomputing Applications, and a recipient of the National Science Foundation CAREER Award. His paper won the best paper award of the Canadian Journal of Statistics in 2011. He delivered the 2021 National Science Foundation Distinguished Lecture. Professor Ma serves on multiple editorial boards. He is a Fellow of the American Statistical Association.


Computational Methods for Healthcare Access Modeling

Speaker: Nicoleta Serban

Abstract:

The research presented in this seminar has been motivated by one of my research programs to bring rigor in measurement of and inference on healthcare access, with a recent published book titled Healthcare System Access: Measurement, Inference and Intervention. I will begin with an overview of the underlying framework to assess healthcare access with a focus on health policy making. I will use this framework to motivate the access model, a classic assignment optimization but with many important computational challenges, including spatial coupling, complex system constraints, large-scale decision space and data uncertainty. I will present computationally efficient methods for addressing large-scale optimization problems accounting for spatial coupling in the context of uncertainty quantification.

Biography:

Dr. Nicoleta Serban is Virginia C. and Joseph C. Mello Professor in the H. Milton Stewart School of Industrial and Systems Engineering at the Georgia Institute of Technology. Dr. Serban's research record is quite diverse, from mathematical statistics to modeling to data analysis to statistical learning, with recent contributions on drawing principled inferences on healthcare delivery and health policy. She has also been involved in broad impact research activities; the most noteworthy is the leadership of the Health Analytics initiative . This is a collaborative effort anchored in partnership with a varied network of clinicians, healthcare providers, and public health entities. To date, she has published more than 65 journal articles, a collaborative (with Dr. William B. Rouse) book titled Understanding and Managing the Complexity of Healthcare published by MIT Press and single-authored book titled Healthcare System Access: Measurement, Inference and Intervention published by Wiley. She is the editor for physical sciences, engineering, and the environment for the Annals of Applied Statistics Journal. She has reviewed for multiple funding agencies and she has served in multiple workshops and meetings organized by the National Academies.




Invited Talks




Notes from the field: Teaching collaborators about best practices regarding p-values and statistical significance.

Speaker: Craig Borkowf

Abstract:

In this presentation, I share my personal perspective as an applied statistician on the best practices for reporting and interpreting p-values and statistical significance. I first discuss my Branch's educational efforts to convey the ASA's recommendations (2016, 2019) on these practices to the CDC's Division of HIV Prevention. In particular, I present a graphical example that shows the location of confidence intervals relative to chosen thresholds, which helps collaborators go beyond dichotomous statements about statistical significance and think carefully about meaningful effects from a public health perspective. I then describe two approaches for engaging collaborators, noting some successes and challenges. The first uses a color-coding method to illustrate whether the lower and upper confidence bounds rule out any benefit/harm or substantial benefit/harm. The second focuses on writing text that accurately describes a study's limited ability both to detect small differences (even if they exist) and to rule out meaningful differences (even if they do not exist). I conclude with some personal thoughts about how this educational initiative might progress in the next few years.


Large-Scale Simultaneous Testing Using Kernel Density Estimation

Speaker: Santu Ghosh

Abstract:

A century ago, when Student's t-statistic was introduced, no one ever imagined its increasing applicability in the modern era. It finds applications in highly multiple hypothesis testing, feature selection and ranking, high dimensional signal detection, etc. Student's t-statistic is constructed based on the empirical distribution function (EDF). An alternative choice to the EDF is the kernel density estimate (KDE), a smoothed version of the EDF. The novelty of the work consists of an alternative to Student's t-test that uses the KDE technique and exploration of the usefulness of the KDE-based t-test in the context of its application to large-scale simultaneous hypothesis testing. An optimal bandwidth parameter for the KDE approach is derived by minimizing the asymptotic error between the true p-value and its asymptotic estimate based on normal approximation. We compare our method to several possible alternatives with respect to the false discovery rate. We show in simulations that our method produces a lower proportion of false discoveries than its competitors. The usefulness of the proposed methods is further illustrated through a gene expression data example.


Fitting interpretable Machine Learning models with main effect and low-order interactions using boosted model-based trees

Speaker: Linwei Hu

Abstract:

There is a great deal of interest recently on machine learning interpretability, especially in regulated industries where one needs to understand and explain the results to various stakeholders. This talk presents a method called GAMI-Tree for developing inherently-interpretable machine learning models. It is based on a functional ANOVA decomposition of the model and estimating just the main effects and low-order interactions. While this concept is known in statistics, the challenge is to develop fast and scalable algorithms to do non-parametric estimation with large datasets. Explainable boosting machine (EBM) was proposed in Lou et al. (2013) to address this challenge. Unlike EBM, GAMI-Tree uses customized model-based trees and performs as all as, or better than, EBM on simulated and real data sets. In addition, GAMI-Tree is better in capturing interaction effects as it uses more flexible base learner.


Statistics for Statisticians: Looking into the Past through Citations

Speaker: Pengsheng Ji

Abstract:

We have a new dataset covering about 80K statistical papers published in the last 40 years and use it to study a few aspects of the field of statistics through citations. First, we present the dynamic ranking of statistics journals using the Stigler model and PageRank. Second, we predict the highly cited papers using logistic model and G. boost and identify the most important features of these papers.


Exploiting low-dimensional structures of data sets in machine learning with deep neural networks

Speaker: Wenjing Liao

Abstract:

Many data in real-world applications are in a high-dimensional space but exhibit low-dimensional structures. In mathematics, these data can be modeled as random samples on a low-dimensional manifold. Our goal is to estimate a target function or a nonlinear operator between infinite dimensional function spaces by neural networks. This talk is based on an efficient approximation theory of deep ReLU networks for functions supported on a low-dimensional manifold. We further establish the sample complexity for regression and operator estimation with finite samples of data. When data are sampled on a low-dimensional manifold, the sample complexity crucially depends on the intrinsic dimension of the manifold instead of the ambient dimension of the data. These results demonstrate that deep neural networks are adaptive to low-dimensional geometric structures of data sets.


Modeling spiky functions with derivatives of smooth functions in function-on-function regression

Speaker: Ruiyan Luo

Abstract:

Smoothness penalty is an efficient regularization and dimension reduction tool for functional regression. However, for spiky functional data observed on a dense grid, the coefficient function in functional regression can be spiky and hence the smoothness regularization is inefficient and leads to over-smoothing. We propose a novel approach to fit the functional-on-function regression model by viewing the spiky coefficient functions as the derivatives of smooth auxiliary functions. Compared to smoothness regularization or sparsity regularization which are imposed directly on the spiky coefficient function in existing methods, imposing smoothness regularization on the smooth auxiliary functions can more efficiently reduce the dimension and improve the performance of fitted model. With the estimated smooth auxiliary functions and by taking derivatives, we can fit the model and make prediction. Simulation studies and real data applications show that compared to the existing methods, the new method can greatly improve model performance when the coefficient function is spiky, and performs similarly well when the coefficient function is smooth.


Nested and Multipart Studies: Flaming Fiasco or Efficiently Economical?

Speaker: Christina Mehta

Abstract:

"Nested" and "multipart" studies are two ways of expanding the scope of a research program beyond what might otherwise be possible with available funding. Nested studies are cost-effective because they leverage the parent study infrastructure and personnel within which they are cocooned. Multipart studies are cost-effective because they leverage the same cohort of participants for use in interlinked research studies that share common components. There is little information on the practical implications of either nested or multipart study designs. This proposal will describe the real-world advantages, disadvantages, and important considerations of nested and multipart studies and highlight experiences gained from leading the statistical aspects of a complex nested, multipart study on whether estrogen insufficiency-induced inflammation leverages HIV-induced inflammation to cause end organ damage and worsen age-related co-morbidities affecting the neuro-hypothalamic-pituitary-adrenal axis (brain), musculoskeletal (bone), and cardiovascular organ systems (heart; BBH study) conducted by the Specialized Center for Research Excellence on Sex Differences (SCORE) at Emory University.


How Social Vulnerability Predicted the Frequency and Intensity of COVID-19 A case in point using Georgia county-specific data

Speaker: Mohamed Mubasher

Abstract:

Racial disparity adversely impacts COVID-19 infection and fatality rates. Hospitalization rates due to the pandemic among African Americans/Latinx/Hispanics in Georgia have been among the highest in the nation. Sociopolitical determinants also termed Social Vulnerability have been identified as one of the main factors plaguing minorities on the face of the burden of the pandemic. Social Vulnerability Index (SVI) was developed by ATSDR at CDC . The basic formula projects how disaster unfolds: "(Risk = Hazard * (Vulnerability - Resources) where Risk is the likelihood or expectation of loss; Hazard is a condition posing the threat of harm; Vulnerability is the extent to which persons or things are likely to be affected; and Resources are those assets in place that will diminish the effects of hazards" (Dwyer et al. 2004; UCLA Center for Public Health and Disasters 2006). Using census-tract-specific percentile ranks, SVI was developed taking into account four domains including socioeconomic status, household composition, minority status and language and housing type and transportation. Along with SVI, we used GA Department of Public Health, CDC data and County Health Rankings & Roadmaps specific statistics to cross-sectionally and longitudinally employ (Poisson) Generalized Linear Mixed Models to functionally relate infection/death rates vis-a-vis % racial population difference (60+years %whites - %African Americans (AA)/Blacks), education, unemployed, uninsured, % obese and racial differences in respiratory infection discharge rates. We additionally but separately also modeled county-specific frequency and intensity of COVID-19 up to August 30, 2021 as function of SVI, comorbid conditions, % obese, vaccination rates and insurance status. Analyses also evaluated the sensitivity of SVI in predictive models in lieu of socioeconomic variables (ethnicity/race, income, education, gender and age). Results revealed the magnitude and the significance of the burden posed by social vulnerability in response to the outbreak of the SARS-CoV-2 pandemic. Older age, male gender, AA /other minority group, presence of comorbid conditions, obesity and lower vaccination rates also independently but adversely predicted higher COVID-19 infection and related mortality rates. SVI significantly correlated with these findings.


The role of causal mediation analysis in personalized decision making

Speaker: Razieh Nabi

Abstract:

In practice, clinicians synthesize information on patient characteristics, such as medical history and life style, to tailor a sequence of treatment decisions for the patient. The goal of precision medicine is to make this clinical decision-making process evidence-based, and find optimal treatment rules to maximize the likelihood of desirable outcomes. Unfortunately, the full benefit of a treatment regime may not be realized since patients often do not fully adhere to the treatment plan due to toxic side effects of the medication. In this talk, we describe how we can combine tools from causal inference, mediation analysis, and reinforcement learning to account for differential adherence while learning high quality policies and decision rules.


A missing data method for deconfounding in neuroimaging studies

Speaker: Ben Risk

Abstract:

Resting-state fMRI studies remove participants that fail motion quality control criteria. Motion is particularly problematic in studies on children and neurodevelopmental disorders, including autism spectrum disorder (ASD). In ASD studies, popular motion quality control criteria result in the removal of the majority of children. Moreover, children with more severe ASD are more likely to be excluded. To address the sampling bias, we define a target parameter for the difference in functional connectivity between ASD and typically developing children. We call this target parameter the deconfounded group difference, which utilizes the distribution of diagnosis-specific behavioral variables across usable and unusable scans. We estimate the deconfounded group difference using doubly robust targeted minimum loss-based estimation with an ensemble of machine learning methods for the propensity and outcome models. In a study of ASD and typically developing children, we find more extensive differences than the naive estimator. Our findings suggest the deconfounded group difference can reveal the pathophysiology of neurological disorders in populations with high motion.


Modernizing CDC's data and IT infrastructure to accelerate the adoption of advanced statistical methods

Speaker: Heather Strosnider

Abstract:

CDC and its public health partners are undergoing a transformative data modernization to move from siloed and brittle public health data systems to connected, resilient, adaptable, and sustainable "response-ready" systems that can help us solve problems before they happen and reduce the harm caused by the problems that do happen. This transformation aims to build a new digital infrastructure with increased data access, data harmonization, and shared tools and services to support the data lifecycle. The new digital infrastructure will allow public health to leverage advanced statistical methods such as machine learning. With modernized digital infrastructure, public health will be better able to monitor the complex, interconnected dimensions of health and to predict and forecast future pandemics and non-infectious threats. This presentation will describe CDC’s data modernization strategy and vision.


Big-data infectious disease estimation: From flu to covid-19

Speaker: Shihao Yang

Abstract:

For epidemics control and prevention, timely insights of potential hot spots are invaluable. An alternative to traditional epidemic surveillance, which often lags behind real-time by days or even weeks, big data from the Internet provide important information about the current epidemic trends. We will present a few big-data approaches for influenza prediction, and how the approaches are applied to covid-19 prediction in the current pandemic.

Biography:

Dr. Shihao Yang is an assistant professor in School of Industrial & Systems Engineering at Georgia Tech. Prior to joining Georgia Tech, he was a post-doc in Biomedical Informatics at Harvard Medical School after finishing his PhD in statistics from Harvard University. Dr. Yang's research focuses on data science for healthcare, with special interest in big-data infectious disease prediction, and electronic health records.


Counterfactual Analysis of Cross-Sectional Data Using Quantile Process Regression

Speaker: Yonggang Yao

Abstract:

This work illustrates how you can apply quantile process regression (QPR) to perform counterfactual analysis for cross-sectional data. QPR builds the probability distribution model for a response variable conditional on its associated explanatory covariates by fitting quantile regression models in the entire quantile-level range from 0 to 1. For cross-sectional treatment-control comparison studies, you can use QPR to predict counterfactual distributions of the responses for the treatment-group subjects. That predictions counterfactually assume the treatment-group subjects were in the control group. Because QPR estimates the entire response distributions, you can then evaluate treatment effects and treatment-control-subjects-selection bias by using a variety of statistical standards such as mean difference, median different, and Mann-Whitney-Wilcoxon U test. In addition, when a continuous mediation variable is involved, QPR can furthermore predict the distribution of the mediation variable and perform causal mediation analysis. This work exemplifies the QPR counterfactual analysis methods by analyzing the impact of mothers smoking habits on newborns body weights.


High Quantile Regression for Tail Dependent Time Series

Speaker: Ting Zhang

Abstract:

Quantile regression serves as a popular and powerful approach for studying the effect of regressors on quantiles of a response distribution. However, existing results on quantile regression were mainly developed when the quantile level is fixed, and the data are often assumed to be independent. Motivated by recent applications, we consider the situation where (i) the quantile level is not fixed and can grow with the sample size to capture the tail phenomena; and (ii) the data are no longer independent but collected as a time series that can exhibit serial dependence in both tail and non-tail regions. To study the asymptotic theory for high quantile regression estimators in the time series setting, we introduce a previously undescribed tail adversarial stability condition, and show that it leads to an interpretable and convenient framework for obtaining limit theorems for time series that exhibit serial dependence in the tail region but are not necessarily strong mixing. Numerical experiments are provided to illustrate the effect of tail dependence on high quantile regression estimators, where simply ignoring the tail dependence may lead to misleading p-values.




Round Table Discussions




SAS Global Academic Programs: A Corporate Partner to Enable Analytics Education

Lead: Jacqueline Johnson

Abstract:

At SAS Global Academic Programs, our primary mission is to support faculty and students with aligning skill development to industry demand. We draw on labor market data to understand the current demand for analytics and SAS skill and to guide the development of academic resources. During our roundtable, we will discuss the free, comprehensive support we provide to faculty members for the learning and teaching of SAS including free software access, opportunities for educator training and workshops, curriculum consultations to support integration of SAS into the classroom, and joint academic programs.


Full-time and Internship Opportunities with Wells Fargo

Lead: Tracey Tullie

Abstract:

A representative from Wells Fargo, will lead the discussion and provide details for full-time and internship opportunities at Wells Fargo. She will also talk about necessary skills needed to be a successful quantitative analyst in Corporate America.