Interactive Q-learning for Dynamic Treatment Regimes

Saturday, February 16, 2013
Auditorium/Exhibit Hall C (Hynes Convention Center)
Kristin A. Linn , North Carolina State University, Raleigh, NC
Eric B. Laber , North Carolina State University, Raleigh, NC
Forming evidence-based rules for optimal treatment allocation over time is a priority in personalized medicine research. Such rules must be estimated from data collected in observational or randomized studies. Popular methods for estimating optimal sequential decision rules from data, such as Q-learning, are approximate dynamic programming algorithms that require modeling non-smooth transformations of the data. Postulating a simple, well-fitting model for the transformed data can be difficult, and under many simple generative models the most commonly employed working models - namely linear models - are known to be misspecified. We propose an alternative strategy for estimating optimal sequential decision rules wherein all modeling takes place before applying non-smooth transformations of the data. This simple re-ordering of the modeling and transformation steps leads to high-quality estimated sequential decision rules.  Additionally, the proposed estimators involve only conditional mean and variance modeling of smooth functionals of the data. Consequently, standard statistical procedures can be used for exploratory analysis, model building, and model validation. Furthermore, under minimal assumptions, the proposed estimators enjoy simple normal limit theory.