Examining the Sample Complexity of the Maximum Likelihood Estimator in Phylogenetic Tree Reconstruction

Saturday, February 16, 2013
Auditorium/Exhibit Hall C (Hynes Convention Center)
Sangwon Hyun , University of Michigan - Ann Arbor, Ann Arbor, MI
Jose Alcala Burgos , University of Michigan - Ann Arbor, Ann Arbor, MI
It has been conjectured that the Sample Complexity (SC) of the maximum likelihood estimator (MLE) in Phylogenetic tree reconstruction decays quadratically as a function of the shortest branch length and grows exponentially as a function of the tree depth. It is difficult to verify this conjecture directly because calculating the MLE in this case is NP-hard (Non-Deterministic Polynomial-Time hard). Using the theory of large deviations it is possible to give an analytical expression for the exponential rate of decay of the probability of failed reconstruction with the MLE, which in turn gives the sample complexity. This has been done in a paper by Withers and Nadarajah and it is an easy extension of classical result of Cherno ff. The analytical expression for the exponential rate of decay is given in terms of the Cherno ff distance between the Phylogenetic trees. We perform a Monte Carlo simulation to calculate the Cherno ff distances and resolve the dependence on the shortest branch length and the tree depth numerically.