This is case study for Principle V4: Innovation and improvement
In 2021, OSR published its review on Reproducible Analytical Pipelines: Overcoming barriers to adoption. The Reproducible Analytical Pipeline, also referred to as RAP, is a set of principles and good practices for data analysis and presentation.
RAP was developed by statistics producers in the Department for Culture, Media and Sport and the Government Digital Service in 2017 as a solution to overcome several problems: in particular, time-consuming and error-prone manual processes, and an overreliance on spreadsheets and proprietary software for data storage, analysis and presentation. RAP combines modern statistical tools with software development good practice to carry out all the steps of statistical production, from input data to the final output, in a high quality, sustainable and transparent way.
A minimum standard of RAP was developed by the Best Practice and Impact Team (now the Analysis Standards and Pipelines (ASAP) team) which are:
- Peer review to ensure the process is reproducible and identify improvements
- No or minimal manual interference, for example copy-paste, point-click or drag-drop steps – instead the process should be carried out using computer code which can be inspected by others
- Open-source programming languages, such as R or Python, for coding so that processes do not rely on proprietary software licenses and can be reproduced by statistics producers and users
- Version control software, such as Git, to guarantee an audit trail of changes made to code
- Publication of code, whenever possible, on code hosting platforms such as GitHub to improve transparency
- Well-commented code and embedded documentation to ensure the process can be understood and used by others
- Embedding of existing quality assurance practices in code, following guidance set by organisations and the GSS
These fundamental principles that form the basis for the minimum standard can be further enhanced – for example by writing code in modular functions that allow for reuse, or introducing unit tests to ensure that code works as expected. It is also important to note that adopting RAP principles is not necessarily about incorporating all of the above – implementing just some of these principles will generate valuable improvements.
RAP benefits – enabling innovation and improvement in official statistics – the ONS Centre for Crime and Justice (CCJ)
The Nature of Crime data tables produced by the Centre for Crime and Justice (CCJ) at ONS previously relied heavily on Excel and SPSS. To reduce manual effort, save time and improve reproducibility, the CCJ replaced the existing process with R and python code and introduced Git for version control.
Implementing RAP principles resulted in a significant reduction in the time taken to produce the statistics: what was originally three weeks’ worth of work for thirteen analysts was reduced to under an hours’ work for one. The CCJ were also able to create new analysis more quickly (as an example, it took an hour to add nine new tables to the python pipeline).
With the time saved, the CCJ focused on providing more value for users – publishing historic time series, adding more measures and granularity to the tables, and developing its survey processes to provide new crime estimates about COVID-19. The team adapted the code for this project in order to automate the production of other statistics, such as those on violent crime. Overall, implementing RAP allowed the CCJ to continue to meet its existing output commitments whilst freeing up resources to focus on meeting user needs.
The code for the crime tables is available on GitHub and the team has blogged about its RAP transformation.
Planning how to implement RAP principles – Statistics producers should be empowered to develop RAPs themselves
The process to achieve the above results involved demonstrating the efficiency and quality improvements to senior leaders at the CCJ who then established a team to deliver further RAP developments. With agreement from their line managers, the members of staff who were interested dedicated two days a week to this team. Support from the Deputy Director and other senior leaders was essential in protecting this time commitment and prioritising development work among competing priorities. This level of senior support also meant that analysts felt more able to get involved in the project in the first place.
To support the development work, the Good Practice Team (GPT), now ASAP provided mentoring and training. This helped to embed RAP knowledge and skills within CCJ. Despite some initial apprehension about implementing RAP, the team members became confident in the new skills they developed and felt proud of their work and have now gone on to create and share their own crime_analysis package. The CCJ applied this approach to offering mentoring internally without the support of GPT and continued to focus on skills development across the division. To illustrate this, CCJ are now using a pair-programming technique to quality assure code and have created a bespoke RAP learning pathway specific to the data and table production processes for the team.
This example shows how producers can enable innovation and improvement in official statistics when they are empowered to develop RAP in their areas. With commitment and support from senior managers to implement RAP principles, the team have been able to continue to meet its existing output commitments, while using its newly freed up resources to focus on meeting new user needs.