Computation is an integral part of today's research as data has grown too large or too complex to be analysed by hand. An ever-growing fraction of science is performed computationally and many wet-lab biologists spend part of their time on the computer. Many scientists struggle with this aspect of research as they have not been properly trained in the necessary set of skills. The result is that too much time is spent using inefficient tools when progress could be faster. This course provides training in several key tools, with a focus on good development practices that encourage efficient and reproducible research computing.
This is a course for researchers in the life sciences who are using computers for their analyses, even if not full time. The target student will be familiar with some command line/programmatic computer usage, will want to become more confident using these tools efficiently and reproducibly. A target student will have written a for loop in some language before, but will not know what git is (or at least not be very comfortable using git).
- Introduction to Python scripting
- Introduction to the Unix shell and usage of cluster resources
- Version control with Git and Github
- Analysis pipeline management
- Scientific Python & working with biological data
- Literate programming with Jupyter notebooks
After the workshop, participants will be able to:
- write and organise their own scripts for data analysis
- use version control to keep track of, and revert changes to their files
- implement efficient and reproducible pipelines that combine multiple tools and scripts