Skip to content

Thinking Science with Statistics and Computers

Abstract

I’m a scientist whose career has focused on developing statistical and computational methods for genetics and genomics. Here, I share advice and lessons learned from conducting research and from teaching others how to do it. I also provide reports on experiments related to my open-source software.

The parser shootout šŸ”«

Bioinformatics data processing remains dominated by plain text formats. In this post, I contrast the performance of the popular biopython, cogent3, and scikit-bio packages for reading three sequence file formats and two genome annotation formats. Despite how simple these tasks might seem, you'll see there's a lot of variation in performance! The takeaway message is that cogent3 is nearly always faster for parsing these basic file formats, while biopython typically uses less RAM.