Enabling Open Science Through Research Code: Insights from Episode 5 - Testing Research Code

Anelda van der Walt, Jyoti Bhogal, Saranjeet Kaur Bhogal, Abhishek Dasgupta, Sheena O’Connell, Mireille Grobbelaar

Last updated on Mar 4, 2025 6 min read Open Science

In the world of research software development, ensuring that code runs correctly and produces reliable results is crucial. However, software testing is often overlooked, particularly in academic settings where researchers may lack formal training in programming best practices. This challenge was a central theme in the recent discussion during episode 5 of our six-part series “Enabling Open Science Through Research Code,” where experts from diverse backgrounds shared their experiences and insights on making software testing accessible and effective.

During the session, panellists emphasised that while research software development is often self-taught, adopting best practices such as software testing can improve both individual efficiency and collaboration. One of the mantras that continuously came up in the different episodes is that…

We should be our own best collaborators first

Episode 5 once again showed how good coding practices can benefit the coder as much as other contributors, collaborators, and users. This post summarises the discussion, highlighting practical tips for beginners and more advanced strategies for experienced developers.

Tips, Tools, and Practices for Novices

If you’re new to coding or research software development, software testing may seem like an advanced topic. However, as our panellists suggest, starting with simple practices can make a big difference in the reliability of your code.

1. Start with Assertions

Assertions are a lightweight way to check that your code behaves as expected. Sheena shared an example from her experience, explaining how beginners often overlook errors that could easily be caught by using assert statements in Python.
Assertions act as built-in sanity checks and can help catch errors early before they lead to misleading research results.

2. Use Smoke Testing

Simply running your code from start to finish to ensure it doesn’t crash is an easy way to catch obvious issues. Sheena described this as a useful technique for those who may not yet have structured testing frameworks in place.
Automated tools like nbval can help validate Jupyter Notebooks by checking that expected outputs remain consistent, an approach that Abhishek recommended.

3. Learn Basic Unit Testing

Unit tests check individual functions or small sections of code. Abhishek stressed that even a simple test like assert 2 + 3 == 5 is a great first step.
In Python, pytest is a lightweight and easy-to-use testing framework, whereas in R, {testthat} provides a simple way to implement tests within research code.

4. Keep Code Modular

Writing functions instead of long scripts makes it easier to test individual components. As Sheena pointed out, “Untested code is brittle.” Structuring code into reusable functions makes adding tests more straightforward.

5. Automate Where Possible

Using GitHub Actions or other Continuous Integration (CI) tools can automate testing whenever you update your code. Abhishek mentioned that CI is a great way to ensure code works across multiple platforms, which is particularly helpful when collaborating with researchers using different operating systems.
For R users, the {usethis} package simplifies the workflow setup, automating repetitive tasks (for both R packages as well as non-package projects), as suggested by Saranjeet. She also recommended using the {fusen} package for anyone new to R package development (and also for experienced developers while they are prototyping functions) as it involves writing the function as well as its documentation and unit tests in the same file, avoiding the risk of forgetting crucial steps.

Source: Torsten Zelger. https://www.softwaretestpro.com/category/cartoon/.

Approaches to Software Testing for Experienced Developers

For those already familiar with software development, a more structured approach to testing can improve reproducibility and collaboration. The panellists discussed key strategies for advanced users, including:

1. Different Types of Testing

Unit Testing: Ensures small, isolated parts of the code work as expected.
Integration Testing: Checks if different parts of the system function correctly together. Abhishek highlighted the importance of this, noting that end-to-end validation is often overlooked in research projects.
Regression Testing: Ensures new changes don’t break existing functionality, an approach Sheena described as essential for long-term code maintenance.
Property-Based Testing: Uses tools like hypothesis in Python to test code with a range of inputs, uncovering unexpected behaviours. Abhishek noted this as particularly useful for numerical research software.

2. Version Control and CI/CD

Git and GitHub/GitLab: Version control helps track changes, making it easier to debug issues. Sheena emphasised that “researchers who don’t use version control end up with folders full of files like final_version_revised_final_final.py, which is a nightmare to maintain.”
Continuous Integration (CI): Running tests automatically on GitHub ensures all code changes are validated before being merged. Abhishek explained that setting up CI allows researchers to avoid relying on manual testing across different machines.

3. Cross-Platform Testing

Researchers often work across different operating systems. Automating tests in cloud environments ensures compatibility with Windows, macOS, and Linux.
Tools like tox or nox (Python) allow testing across multiple versions of a language. Abhishek highlighted that CI workflows, once set up, eliminate the need to manually ask colleagues to test code on different platforms.

4. Automating Data Validations

Data quality is critical in research. Sheena introduced pandera in Python, a tool that helps verify data integrity and data schemas before analysis.
Statistical validation using libraries like scipy.stats can confirm that distributions match expected properties, an approach recommended for data-intensive projects.

5. Balancing Testing with Research Deadlines

Writing tests may seem like extra work, but it prevents time-consuming debugging later. Abhishek acknowledged that “getting 100% test coverage is impractical for most research projects, but basic integration tests should be a minimum.”
Using automated tools like R’s {targets} package helps manage computational workflows efficiently, only re-running necessary parts of the code. Saranjeet recommended this as a time-saving approach for researchers handling large datasets.

Conclusion

Software testing is not just for professional developers—it is an essential practice for research reproducibility and collaboration. While beginners can start with simple assertions and unit tests, advanced users can implement continuous integration, property-based testing, and automated data validation to ensure code reliability. As Anelda concluded, “Testing is not just about making research more open; it’s about being your own best collaborator.” By integrating these testing strategies into research software development, we can move toward a more open, transparent, and reliable scientific ecosystem. You can find our recording on YouTube and the resource sheet with links to specific tools, documents, and training resources on Zenodo.

Next month we’ll be looking at funding for research software projects. Join us on 20 March 2025 at 8:30 - 10:00 UTC (your local time) for the final episode of this series.

{The first draft of this post was written by ChatGPT using the transcript of our meetup as input}

This meetup series is a collaboration between Talarify, RSSE Africa, RSE Asia, AREN, and ReSA.

Open Science