Types of propensity score matching

Summarized from a nice review paper by Basu, Unuigbe and Masseria (2023).

Stratifying by Quintiles of propensity scores (PS). Under this approach, the empirical distribution of the estimated PS across the entire sample (including treated and untreated participants) is divided into quintiles. Indicator variables for the first 4 quintiles are then used as covariates, along with the treatment indicator and the interactions between them, in an ordinary least squares regressionInverse probability weighting (IPW). In this method, the difference in the weighted average of the outcomes between the treatment and untreated groups gives a consistent estimate of the different mean treatment effect parameters depending on how the weights are constructed and to which observations these weights are applied. In this method, the weights are the inverse propensity scores. Propensity score matching: Nearest Neighbor. Matching estimators are also nonparametric, but unlike the IPW, matching estimators are less sensitive to the parametric specification of the PS. Under the neearest neighbor approach, treated individuals are matched with one or more individuals in the untreated group based on the propensity score. Matching is usually carried out with replacement (ie, the same observation can be used for matching multiple times) and can be 1:1 (especially for nearest neighbor estimators) or 1:m (for radius estimators). With caliper matching, the caliper requires that the the difference in PS between matched observations must be below some specified, fixed threshold. Propensity score matching: Kernel-based or local linear. Under kernel-based matching, a 1:m matching based on a specified bandwidth is performed, like nearest neighbor. Unlike nearest neighbor, however, they differ in that not all matched observations get the same weight when averaging across them. This approaches in essence smooth the comparison (i.e., untreated) group by weighting them based on the closeness of the match within a certain bandwidth. Typically, a bandwidth of 0.06 for the kernel-based matching estimator and a central band of N*0.25 for the local-linear regression-based matching estimator is used. The bandwidth controls the amount by which the data are smoothed. Large values of bandwidth will lead to large amounts of smoothing, resulting in low variance but high bias. Small values of bandwidth will lead to less smoothing, resulting in high variance but low bias. Doubly Robust Estimators. I have described these here. The doubly robust estimators use both IPW and and adjustment by regression modeling approach to estimate the average treatment effect (ATE). By augmented PS with a regression model, the augmentation increases estimator efficiency. Further, the doubly robust estimators has improved consistency in the presence of model misspecification.

You can read the whole article here including a more detailed supplemental appendix here.