Monte-Carlo Means More! (That’s a pun).

Friday, July 13, 2018

(I picked a rather unlucky day to start a blog.)

Standardization is a rather popular (dare I say, “standard”) tool to adjust for confounding in causal inference problems seeking to estimate the effect of some exposure, $A$, on some outcome, $Y$. For continuous outcomes, we often seek the mean of the potential outcome $Y^a$. For those not familiar, $Y^a$ denotes the hypothetical outcome that would be observed if treatment were set to $A = a$, the idea being that the outcome at hand could vary depending upon the treatment actually received.

In observational data, the set of outcomes among those actually receiving treatment $A = a$ is simply not the same thing as a random sample from the marginal distribution of $Y^a$, as it would be in a well-conducted randomized controlled trial. Its mean can be identified under certain assumptions, however–in part by information contained in observed common causes of $A$ and $Y$ (say, $X$).

For simplicity, I’m going to ignore basically all things related to estimation in this post, and just focus on one, somewhat narrow issue that I’ve been thinking about for a while: the tools we typically use to perform standardization can give us us much, much more than just a mean. I suspect that individuals reading this who subscribe to the Bayesian paradigm/ideology/religion will already have internalized this fact to a degree, since they will be familiar with the idea that estimation/inference on transformations of $\theta$ can be performed on transformations of the resulting Monte-Carlo draws of $\theta|\text{Data}$ post-Metropolis without having to repeatedly cycle through the entire estimation procedure, where a frequentist may need to jump through hoops to accomplish the same goals. Therefore, frequentists especially: pay attention!

In its simplest form, the standardization formula looks a little something like this:

$\textbf{E}[Y^a] = \int \textbf{E}[Y|X, A] dF_X(x) = \iint y f_{Y|X, A}(y|x, a) f_{X}(x) dy dx$

Whether this formulation is a defensible realization of the marginal mean of interest will depend upon the usual identifying causal assumptions, not elaborated upon here. But looking at this conceptually, the marginal mean really just looks like a conditional mean, averaged over the thing(s) you don’t want to condition on (note that the right-hand side is the parametric formulation suitable for continuous variables). Seems conceptually nice enough. But computationally, are you going to have to dig up your college calculus book and re-learn trig substitutions, partial fraction reduction, or integration by parts to try to analytically ascertain the solution to this potentially pesky and possibly prickly integral?

Fortunately, you can leave your calculus textbook on its shelf where it belongs to acquire some more dust. People have already figured out ways to handle this, and one answer lies in the world of Monte-Carlo integration.

This works as follows. Suppose I generate random draws of $x_1, x_2, \dots x_n$ from some presumed distribution $F_X(x)$, and then on the basis of these realizations of $X$ and some set value $A = a$ for treatment, generate random draws of $y_1^a, y_2^a, \dots y_n^a$ from some presumed distribution $F_{Y|X, A}(y|x, a)$. Then,

$\frac{1}{n}\sum_{i = 1}^{n} y_i^a \approx \textbf{E}[Y^a].$

Indeed, the Law of Large Numbers defends this approximation rather directly and readily: $\frac{1}{n} \sum_{i = 1}^{n} y_i^a \longrightarrow_p \textbf{E}[Y^a].$

But there is an important step that happened along the way here. By sequentially simulating random draws from the distribution of $F_X(x)$ and $F_{Y|X, A}(y|x, a)$ (keeping the pairs intact), we have simulated from the joint distribution of $(X, Y^a)$. By ignoring $X$ (the conceptual equivalent of marginalizing over it), our random draws $y_1^a, y_2^a, \dots y_n^a$ form an approximation of the entire marginal distribution of $Y^a$. This is actually also seen by the Law of Large Numbers, since for each $c \in \mathbb{R}$,

$\frac{1}{n} \sum_{i = 1}^{n} \textbf{1}(y_i^a \leq c) \longrightarrow_p P(Y^a \leq c) = F_{Y^a}(c).$

Where am I going with this? If your models are correct and your causal assumptions are met, then your Monte-Carlo integration procedure has ostensibly succeeded in approximating the joint distribution of $(X, Y^a)$, and thus, the marginal distribution of $Y^a$. So even though it may have been hard (or impossible in the world of observational-data-based causal inference) to sample from the marginal distribution of $Y^a$ directly, this procedure gives you a way to analytically do so.

Therefore, even though we motivated the need for Monte-Carlo integration by asking the statistical deities for a path towards a marginal mean that was analytically intractable, why not keep in mind when defining future research objectives and identifying target parameters that this procedure gives us an approximation to the entire marginal distribution of $Y^a$, and in turn, basically anything–within reason–we could want to know about it? Why not target other characteristics, such as:

• Its median?
• Its 97.5th percentile?
• The proportion above some relevant threshold, $c$?
• $P(Y^a > Y_0)$, where $Y_0$ is some reference distribution?
• The mean-to-median ratio of the hyperbolic cosine of $Y^a$?

Okay, I got slightly carried away there, as one does from time to time. And of course, the choice of a mean is often well-motivated, so whether you’re interested in targeting a mean or some other parameter will depend upon the nature of what you’re doing…don’t forget to be smart about that! But I hope my point is clear: Though we often use standardization–and its longitudinal cousin, g-computation–to target means, the way we go about crafting a solution to the (sometimes) multi-dimensional and (almost assuredly) nausea-inducing integral opens up the door to much more than means. So if you want some parameter from a marginal distribution that is hard to sample from–whether for reasons related to causal inference or otherwise–keep Monte-Carlo integration somewhere in your toolbox.

To boot(strap), had you actually pulled out your calculus textbook and identified the correct permutation of integration techniques that would serve as the proverbial key to unlock the analytic solution to that integral you wanted, you’d have gotten your marginal mean and called it a day–namely, you’d never have taken that intermediate step of approximating an entire marginal distribution and you would have lost your shot at getting all those other things you may have wanted to know about $Y^a$. How exciting! So in short, Monte-Carlo means more! (That’s a pun).