References

[1] Pravech Ajawatanawong and Sandra L. Baldauf. Evolution of protein indels in plants, animals and fungi. BMC Evolutionary Biology, 13:140, 2013. doi: 10.1186/1471-2148-13-140.

[2] D. J. Aldous. Exchangeability and related topics. In École d’Été de Probabilités de Saint-Flour XIII, pages 1–198. Springer, 1985.

[3] C. Armero and M. J. Bayarri. Prior assessments for prediction in queues. The Statistician, 43(1):139–153, 1994.

[4] M. J. Beal, Z. Ghahramani, and C. E. Rasmussen. The infinite hidden Markov model. In T. G. Dietterich, S. Becker, and Z. Ghahramani, editors, Advances in Neural Information Processing Systems 14, pages 577–584, Cambridge, MA, USA, 2002. MIT Press.

[5] D. M. Blei and M. I. Jordan. Variational inference for Dirichlet process mixtures. Bayesian Analysis, 1:121–143, 2006.

[6] Robert K. Bradley, Adam Roberts, Michael Smoot, Sudeep Juvekar, Jaeyoung Do, Colin Dewey, Ian Holmes, and Lior Pachter. Fast statistical alignment. PLoS Computational Biology, 5(5):e1000392, 2009. doi: 10.1371/journal.pcbi.1000392.

[7] Reed A. Cartwright. Problems and solutions for estimating indel rates and length distributions. Molecular Biology and Evolution, 26(2):473–480, 2009. doi: 10.1093/molbev/ msn275.

[8] I. Cohn, T. El-Hay, N. Friedman, and R. Kupferman. Mean field variational approximation for continuous-time Bayesian networks. Journal of Machine Learning Research, 11:2745–2783, 2010.

[9] P. L. Conti. Bayesian inference for linear growth birth and death processes. Journal of Statistical Planning and Inference, 120(1–2):65–84, 2003.

[10] Don Coppersmith and Persi Diaconis. Random walk with reinforcement. Unpublished manuscript, 1986.

[11] N. De Maio. The cumulative indel model: Fast and accurate statistical evolutionary alignment. Systematic Biology, 2020.

[12] Persi Diaconis and Silke W. W. Rolles. Bayesian analysis for reversible Markov chains. The Annals of Statistics, 34(3):1270–1292, 2006. doi: 10.1214/009053606000000290.

[13] M. Ekeberg, C. Lövkvist, Y. Lan, M. Weigt, and E. Aurell. Improved contact prediction in proteins: using pseudolikelihoods to infer Potts models. Physical Review E, 87:012707, 2013.

[14] M. D. Escobar and M. West. Bayesian density estimation and inference using mixtures. Journal of the American Statistical Association, 90:577–588, 1995.

[15] W. J. Ewens. The sampling theory of selectively neutral alleles. Theoretical Population Biology, 3:87–112, 1972.

[16] J. Felsenstein. Evolutionary trees from DNA sequences: a maximum likelihood approach. Journal of Molecular Evolution, 17:368–376, 1981.

[17] T. L. Griffiths and Z. Ghahramani. The Indian Buffet Process: an introduction and review. Journal of Machine Learning Research, 12:1185–1224, 2011.

[18] J. Hein. An algorithm for statistical alignment of sequences related by a binary tree. Pacific Symposium on Biocomputing, pages 179–190, 2000.

[19] A. Hobolth and J. L. Jensen. Statistical inference in evolutionary models of DNA sequences via the EM algorithm. Statistical Applications in Genetics and Molecular Biology, 4:Article 18, 2005.

[20] Asger Hobolth and Jens Ledet Jensen. Summary statistics for endpoint-conditioned continuous-time Markov chains. Journal of Applied Probability, 48(4):911–924, 2011. doi: 10.1239/jap/1324046009.

[21] M. D. Hoffman, D. M. Blei, C. Wang, and J. Paisley. Stochastic variational inference. Journal of Machine Learning Research, 14:1303–1347, 2013.

[22] I. Holmes. A probabilistic model for the evolution of RNA structure. BMC Bioinformatics, 5:166, 2004.

[23] I. Holmes. A model of indel evolution by finite-state, continuous-time machines. Genetics, 216:1187–1204, 2020.

[24] I. Holmes and W. J. Bruno. Evolutionary HMMs: a bayesian approach to multiple alignment. Bioinformatics, 17:803–820, 2001.

[25] I. Holmes and G. M. Rubin. An Expectation Maximization algorithm for training hidden substitution models. Journal of Molecular Biology, 317:753–764, 2002.

[26] H. Ishwaran and L. F. James. Gibbs sampling methods for stick-breaking priors. Journal of the American Statistical Association, 96:161–173, 2001.

[27] S. Jain and R. M. Neal. A split–merge Markov chain Monte Carlo procedure for the Dirichlet process mixture model. Journal of Computational and Graphical Statistics, 13: 158–182, 2004.

[28] Vladimir Jojic, Nebojsa Jojic, Chris Meek, Dan Geiger, Adam Siepel, David Haussler, and David Heckerman. Efficient approximations for learning phylogenetic HMM models from data. In Bioinformatics, volume 20, pages i161–i168, 2004. doi: 10.1093/bioinformatics/bth917.

[29] D. G. Kendall. On the generalized birth-and-death process. Annals of Mathematical Statistics, 19:1–15, 1948.

[30] A. Large and I. Holmes. Nested birth–death processes are competitive with parameter-heavy neural networks as time-dependent models of protein evolution. bioRxiv, 2026. doi: 10.1101/2026.02.02.702952.

[31] N. Lartillot and H. Philippe. A Bayesian mixture model for across-site heterogeneities in the amino-acid replacement process. Molecular Biology and Evolution, 21:1095–1109, 2004.

[32] S. Q. Le and O. Gascuel. An improved general amino acid replacement matrix. Molecular Biology and Evolution, 25:1307–1320, 2008.

[33] C. Lee, C. Grasso, and M.F. Sharlow. Multiple sequence alignment using partial order graphs. Bioinformatics, 18:452–464, 2002.

[34] D. Linzner and H. Koeppl. Cluster variational approximations for structure learning of continuous-time Bayesian networks from incomplete data. In Advances in Neural Information Processing Systems 31, pages 7880–7890, 2018.

[35] A. Löytynoja and N. Goldman. An algorithm for progressive multiple alignment of sequences with insertions. Proceedings of the National Academy of Sciences of the USA, 102 (30):10557–62, 2005.

[36] G. A. Lunter, I. Miklós, Y. S. Song, and J. Hein. An efficient algorithm for statistical multiple alignment on arbitrary phylogenetic trees. Journal of Computational Biology, 10: 869–889, 2003.

[37] Xiao-Li Meng and Donald B. Rubin. Maximum likelihood estimation via the ECM algorithm: a general framework. Biometrika, 80(2):267–278, 1993.

[38] Eric P. Nawrocki and Sean R. Eddy. Infernal 1.0: inference of RNA alignments. Bioinformatics, 25(10):1335–1337, 2009.

[39] J. Pitman and M. Yor. The two-parameter Poisson–Dirichlet distribution derived from a stable subordinator. Annals of Probability, 25:855–900, 1997.

[40] S. Prillo, Y. Deng, P. Boyeau, X. Li, P.-Y. Chen, and Y. S. Song. CherryML: scalable maximum likelihood estimation of phylogenetic models. Nature Methods, 20:1232–1240, 2023.

[41] V. Rao and Y. W. Teh. Fast MCMC sampling for Markov jump processes and extensions. Journal of Machine Learning Research, 14:3295–3320, 2013.

[42] B. D. Redelings and M. A. Suchard. Joint Bayesian estimation of alignment and phylogeny. Systematic Biology, 54:401–418, 2005.

[43] Silke W. W. Rolles. How edge-reinforced random walk arises naturally. Probability Theory and Related Fields, 126(2):243–260, 2003.

[44] D. Sankoff. Simultaneous solution of the RNA folding, alignment, and protosequence problems. SIAM Journal of Applied Mathematics, 45:810–825, 1985.

[45] A. Stolcke. An efficient probabilistic context-free parsing algorithm that computes prefix probabilities. Computational Linguistics, 21(2):165–201, 1995.

[46] E. Susko, L. Lincker, and A. J. Roger. Accelerated estimation of frequency classes in site-heterogeneous profile mixture models. Molecular Biology and Evolution, 35(5):1266–1283, 2018.

[47] Paula Tataru and Asger Hobolth. Comparison of methods for calculating conditional expectations of sufficient statistics for continuous time Markov chains. BMC Bioinformatics, 12:465, 2011. doi: 10.1186/1471-2105-12-465.

[48] Y. W. Teh, M. I. Jordan, M. J. Beal, and D. M. Blei. Hierarchical Dirichlet processes. Journal of the American Statistical Association, 101:1566–1581, 2006.

[49] The UniProt Consortium. UniProtKB/Swiss-Prot release statistics, 2026. URL https://web.expasy.org/docs/relnotes/relstat.html.

[50] J. L. Thorne, H. Kishino, and J. Felsenstein. An evolutionary model for maximum likelihood alignment of DNA sequences. Journal of Molecular Evolution, 33:114–124, 1991.

[51] J. L. Thorne, H. Kishino, and J. Felsenstein. Inching toward reality: an improved likelihood model of sequence evolution. Journal of Molecular Evolution, 34:3–16, 1992.

[52] Oscar Westesson, Gerton Lunter, Benedict Paten, and Ian Holmes. Accurate reconstruction of insertion-deletion histories by statistical phylogenetics. PLoS ONE, 7(4):e34572, 2012. doi: 10.1371/journal.pone.0034572.

[53] Lin Xu, Hong Chen, Xiaohua Hu, Rongmei Zhang, Ze Zhang, and Z. W. Luo. Average gene length is highly conserved in prokaryotes and eukaryotes and diverges only between the two kingdoms. Molecular Biology and Evolution, 23(6):1107–1108, 2006. doi: 10.1093/molbev/ msk019.

[54] Z. Yang. Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: approximate methods. Journal of Molecular Evolution, 39:306–314, 1994.