Thinking Science with Statistics and Computers¶

Abstract

I’m a scientist whose career has focused on developing statistical and computational methods for genetics and genomics. Here, I share advice and lessons learned from conducting research and from teaching others how to do it. I also provide reports on experiments related to my open-source software.

Jun 1, 2026
in bioinformatics
13 min read

The genome annotation handling shootout 🔫

Genome annotation data are a fundamental reflection of the state of our understanding of a genome. Any software package that claims to provide generic genomic data handling must also be great for handling genome annotations (aka genome features). Right? In this post, I put this assertion to the test for biopython, cogent3, and scikit-bio. In a nutshell, only cogent3 acquits itself with some distinction. On large datasets, cogent3 can be orders of magnitude faster and use orders of magnitude less memory than the others. It also requires much less code 🤯. But there is room for improvement across all packages.

May 20, 2026
in bioinformatics
10 min read

The parser shootout 🔫

Bioinformatics data processing remains dominated by plain text formats. In this post, I contrast the performance of the popular biopython, cogent3, and scikit-bio packages for reading three sequence file formats and two genome annotation formats. Despite how simple these tasks might seem, you'll see there's a lot of variation in performance! The takeaway message is that cogent3 is nearly always faster for parsing these basic file formats, while biopython typically uses less RAM.

Mar 1, 2022
in computational science
10 min read

Setting up your computer for computational research

How much impact can a computer setup possibly have? Some choices can make your daily life as a computational scientist a lot happier. Conversely, other choices can make you curse yourself. If it sounds like I'm talking from experience, you're correct -- 20+ years worth.

Feb 25, 2022
in computational science
9 min read

So you're starting a research project, where do you begin?

This post is intended for anyone starting out in research and development with a computational bent, whether in a science (e.g. computational biology) or engineering context. In particular, it is focussed on the research process that leads to new tools and analysis methods.