Which econometric method should you use for causal inference of health policy?

TL;DR

A paper by Ress and Wild (2024) provide the following recommendations in answering this question.

When aiming to control for a large covariate set, consider using the superlearner to estimate nuisance parameters. When employing the superlearner to estimate nuisance parameters, consider using doubly robust estimation approaches, such as AIPW and TMLE.When faced with a small covariate set, consider using regression to estimate nuisance parameters.When employing regression to estimate nuisance parameters, consider using singly robust estimation approaches, such as propensity score matching or IPW.

How did they arrive at these recommendations? To find out, read on.

Description of plasmode simulation on study methodology

To answer the question “Which econometric method should you use for causal inference of health policy?”, one has to make a number of research decisions. 

First, one must decide whether to simulate the effect of a policy intervention or incorporate real-world data into the simulation.  The advantage of the former approach is that we know the truth and can create any data generating scenario we want; because we (the researcher) have ourselves constructed the data generating process, we have a gold standard to compare against and can test out various data generating processes.  The problem with this approach, is its hypothetical nature.  Specifically, Ress and Wild (2024) write:

Many simulation studies…are characterized by relatively simple confounding structures with few variables, leading to varying results depending on the data structure modeled and the methods under consideration…Because the optimal choice for an estimation strategy depends on the research question, data features, population characteristics and method assumptions, simulation results are only applicable to the specific simulation setting.

Instead, the authors opt for a plasmode simulation. What is a plasmode simulation?

In a plasmode simulation, the covariates from a real dataset are used without alteration, while the values for the outcome variables are simulated based on the estimated associations between covariates and outcomes from the original data, ensuring that the true effect size is known. The advantage of this approach is that it preserves the high‐dimensional and complex covariate structure of the source data, providing a simulation environment that closely resembles real‐world conditions.

See also  Forgot to Cancel ACA Marketplace Insurance When Employer Coverage Started…

In short, while the underlying covariates are not changed, researchers can test the robustness of different estimation methods through controlled modifications to the real dataset, such as artificially inserting or removing certain relationships, introducing or removing biases, adding noise, or altering specific variables. This allows for the controlled examination of how statistical methods perform under different known conditions.

A second research decision is to determine which estimation methods should be evaluated. Ress and Wild (2024) consider the following approaches:

Propensity Score Matching: This method involves estimating the probability of treatment assignment based on observed covariates, allowing researchers to match treated and untreated units with similar propensity scores, thereby reducing selection bias in observational studies.Inverse Probability of Treatment Weighting (IPTW): IPTW assigns weights to individuals based on the inverse of their probability of receiving the treatment they actually received, allowing for the creation of a pseudo-population where treatment assignment is independent of observed covariates, thus facilitating causal inference.Entropy Balancing: This technique, developed in Hainmueller 2012, adjusts the weights of the sample to achieve covariate balance between treated and control groups by minimizing the distance between the weighted means of covariates, ensuring that the distribution of covariates is similar across groups.Difference-in-Differences Analysis (DID): DID is a quasi-experimental design that compares the changes in outcomes over time between a treatment group and a control group, helping to estimate causal effects while controlling for unobserved confounding factors that are constant over time. Augmented Inverse Probability Weighting (AIPW): AIPW combines IPTW with regression adjustment to improve efficiency and reduce bias by incorporating both the propensity score and a model for the outcome, allowing for more robust causal estimates. Specifically, AIPW is a doubly robust estimator because it produced unbiased estimates whenever either the propensity score model or the outcome regression is correctly specified.Targeted Maximum Likelihood Estimation (TMLE): TMLE is a semi-parametric method that optimally combines machine learning and traditional statistical techniques to estimate causal effects while targeting specific parameters of interest, thus providing robust estimates even in complex settings.

See also  Medical Denial Catch-22. Can insurance deny treating the illness that resulted in the denial of treating your original illness?

Third, the authors must consider how to estimate nuisance parameters. The key nuisance parameters are the propensity score and the outcome model. Estimation of the nuisance parameters was performed using the superlearner package.

…we used the superlearner algorithm implemented in the SuperLearner [R] package (Polley et al., 2021), which allowed us to incorporate non‐parametric approaches. We included the following five algorithms as baselearners: generalized linear model with penalized maximum likelihood (glmnet function) (Friedman et al., 2010), random forest (ranger function) (Wright & Ziegler, 2017), gradient boosting (xgboost function) (Chen et al., 2015), support vector machines (svm function) (…Karatzoglou et al., 2006), and multivariate adaptive regression splines (earth function) (Friedman, 1991).

Fourth, one must consider a specific intervention to evaluate and how to simulate the data. The intervention the authors considered was an German initiative aiming to improve health care in a socially deprived urban area. Specifically, the intervention included (i) cross-sectoral network of health, social and community care providers and (ii) a community health advice and navigation service. (for more details see Ress and Wild 2023). To simulate the plasmode data for this intervention, Ress and Wild (2024) use the following procedure:

Estimate the association between treatment, outcome and covariates.Use the estimated coefficients to predict the outcomes but modify the treatment coefficient to the desired effect size.Draw J subsets of size s by resampling‐with‐replacement and perform steps 4 and 5 for each of those subsets.Introduce noise by sampling the outcomes from suitable distributions using the simulated values from step 3 as expected values.Analyze the simulated data.

Fifth, one must determine the set of performance metrics to use to evaluate the study. The performance metrics considered included:

See also  Broadening horizons in health technology

Bias: calculated as the mean difference between the estimated and true treatment effect. Since the true treatment effect is known through the plasmode, bias can be calculated. Standard error. The empirical standard error (SE) reflects the dispersion of the estimated effects around their mean. In other words, it measures the precision of the estimator.Confidence level coverage. This is calculated as the proportion of confidence intervals (CIs) that contain the true effect. Let’s say we are using a 95% CI. If only 80% of CI contained the true effect, the CI would be considred to narrow; conversely, if 99% of the CI contained the true effect, the CIs would be considered too wide.

Based on this approach, the authors find that there is no clear winner:.

We found that TMLE combined with the superlearner performed best in terms of bias and SE, but exhibited shortcomings in terms of CI coverage. When considering all performance measures and outcomes, the combination of matching and subsequent DiD analysis in conjunction with regression for nuisance parameter estimation performed best.

What are the takeaways from this research? The authors nicely lay this out at the end of their article:

When aiming to control for a large covariate set, consider using the superlearner to estimate nuisance parameters. When employing the superlearner to estimate nuisance parameters, consider using doubly robust estimation approaches, such as AIPW and TMLE.When faced with a small covariate set, consider using regression to estimate nuisance parameters.When employing regression to estimate nuisance parameters, consider using singly robust estimation approaches, such as propensity score matching or IPW.

You can read the full article here. What do you think of the use of plasmode simulations?