Using RAP to strengthen organisational approaches to quality management

This is a case study for Principle T4: Transparent processes and management.

The Department for Transport (DfT) has fostered a culture of innovation and improvement which has supported the application of core RAP principles and supported the quality management of its official statistics. This culture has been created through a combination of enthusiastic and driven individuals, strong senior support and strategic direction, and working in an open and transparent way. DfT’s RAP developments have been underpinned by a strategic goal to produce most of its statistics using a RAP approach.

DfT work transparently and openly using GitHub, to share code and host materials from DfT’s weekly coding meetings and signpost to useful resources online. DfT have developed and published an R cookbook of coding standards, that specify DfT’s minimum requirements for ‘good code’. DfT require that the master version of a script is not edited without going through a code review and encourage the use of automated testing (Continuous Integration) tools. The R cookbook is community edited, so standards can evolve as change as needed.

To introduce RAP principles to its official statistics, the DfT has focused on automating data tables and quality assurance processes. DfT identified these as the best areas for development in its existing processes since they would be the most prone to human error.

For example, by using R code to automatically run validation checks and identify issues for further exploration, quality assurance is now carried out in a more standardised and efficient way than it was before for DfT’s Road Safety statistics. DfT ensures that the R code to produce these statistics is peer reviewed, providing an additional layer of quality assurance. Peer review is often carried out by members of the RAP committee, the group which supports RAP developments in the department.

The committee has developed a template which is used as the basis for all new coding projects. This supports a standardised coding style across the department and results in improved quality, readability and reusability of code.

DfT has a strong community of statisticians and its RAP committee has been instrumental in supporting RAP developments. This includes running internal code clubs, inviting external speakers to share learning, and developing training and tools such as an R project template and an R cookbook which provides comprehensive coding examples (see Case study T5: Developing statisticians’ coding capabilities to meet future organisational needs). DfT has also developed a RAP training session for managers which focusses on quality assurance and gives managers the confidence they need to sign off publications which use a RAP approach.

This example shows how DfT has created a culture that supports RAP developments and continuous improvement. By working openly through GitHub, DfT is transparent about its approach to quality management. It has also established organisational tools that help it to manage quality to appropriate quality standards, strengthening of the quality management approach used in the production of its official statistics.

Leading the development of statistics on transport use during the pandemic

This is a case study for Principle T2: Independent decision making and leadership

To monitor the use of the transport system in Great Britain during the coronavirus (COVID-19) pandemic, the Department for Transport (DfT) rapidly produced new statistics on transport use by mode, from March 2020.

The DfT Head of Profession for Statistics (HoP) and senior statistical leadership team were instrumental in developing the statistics. They led and encouraged collaboration and innovation with organisations outside government to gain access to new data. For example to develop near real-time indicators, the lead for Travel and Safety Statistics worked with a bus technology company to get information about bus use outside of London, and the HoP proactively led discussions with a telecoms provider about the potential application of telecoms data which formed part of the methodology to providing estimates of cycling. The lead for Road Traffic worked with their team to develop a new approach to use existing automated traffic counters.

The production of these statistics involved a coordinated effort across multiple analytical teams, overseen by the HoP and senior leadership team. The HoP put in place a fast but rigorous quality assurance process. Provisional numbers were produced by individual analytical teams and sent to a central team by 3pm each day. The HoP reviewed them, provided feedback as needed and the numbers were then finalised by the individual teams before being signed off by the HoP for inclusion in an updated data dashboard at the end of each day.

The statistics were first presented via slides at a series of press conferences at 10 Downing Street in response to coronavirus. For example, the Transport Secretary presented on the statistics in his statement on coronavirus (COVID-19) on 4 June 2020. The statistics were used to show the change in transport trends across Great Britain and give an indication of compliance with lockdown rules. They proved vital for informing the government, the media and the general public and continued to be valuable as the lockdown rules changed. Statisticians at DfT had continuous close engagement with the Cabinet Office to ensure that the data had been well understood by their policy colleagues.

OSR carried out a rapid review of the statistics, which highlighted that the data was only sometimes included in the daily briefing slides and therefore only available to the public on those days. The DfT HoP then played a key role getting the data published daily each weekday.

The DfT HoP later determined that changes should be made to the frequency and timing of the publication of the statistics, to reflect changes in user demand. This involved weighing up the user need with the resource required to produce the data and the impact on staff and being proactive in anticipating future user interest. As user demand for daily data initially reduced, the decision was made to publish the data weekly on a Wednesday instead. Then later, the HoP determined the timing and frequency of publication should be moved back to daily for a set period, to inform users shortly ahead of schools reopening, before then reverting back to weekly publication when that need again reduced.

This example shows the key roles played by the DfT HoP and wider statistical leadership team in the production of a new and important data source, which has been used for informing the government, the media and the general public during the pandemic. Key aspects of the role the HoP played included encouraging collaboration and innovation with organisations outside government to gain access to an important new data source, ensuring the rigorous quality assurance of new outputs produced to a tight timescale, and determining changes to the frequency and timing of the publication of the statistics as user demand changed, and being proactive in anticipating future user interest, while considering with the resource impact on analytical staff required to produce the data.

Archived: Automating statistical production to free up analytical resources

This is a case study for Principle V4: Innovation and improvement.

The Reproducible Analytical Pipeline (RAP) is an innovation initiated by the Government Digital Service (GDS) that combines techniques from academic research and software development. It aims to automate certain statistical production and publication processes – specifically, the narrative, highlights, graphs and tables. Tailor made functions work raw data up into a statistical release, freeing up resource for further analysis. The benefits of RAP are laid out in the link above, but include:

  • Auditability – the RAP method provides a permanent record of the process used to create the report, moreover, using Git for version control producers have access to all previous iterations of the code. This aids transparency, and the process itself can easily be published
  • Speed – it is quick and easy to update or reproduce the report, producers can implement small changes across multiple outputs simultaneously. The statistician, now free from doing repetitive tasks, has more time to exercise their analytical skills
  • Quality – Producers can build automated validation into the pipeline and produce a validation report, which can be continually augmented. Statisticians can therefore perform more robust quality assurance than would be possible by hand in the timeframe from receiving data to publication.
  • Knowledge transfer – all the information about how the report is produced is embedded in the code and documentation, making handover simple
  • Upskill – RAP is an opportunity to upskill individuals by giving them the opportunity to learn new skills or develop existing ones. This also upskills teams by making use of underused coding skills that may exist within their resource; coding skills are becoming ubiquitous nowadays with many STEM subject students learning to code at university

RAP therefore enables departments to develop and share high-quality reusable components of their statistics processes. This ‘reusability’ enables increased collaboration, greater consistency and quality across government, and reduced duplication of effort.

In June 2018, the Department for Transport (DfT) published its RAP debut with the automation of the Search and Rescue Helicopter (SARH) statistical tables. This was closely followed by the publication of Quarterly traffic estimates (TRA25) produced by DfT’s first bespoke Road Traffic pipeline R package. RAP methods are now being adopted across the department, with other teams building on the code already written for these reports. DfT have begun a dedicated RAP User Group to act as a support network for colleagues interested in RAPping.

DfT’s RAP successes have benefited from the early work and community code sharing approach of other departments, including:

  • Department for Digital, Culture, Media & Sport first published statistics using a custom-made R package, eesectors, in late 2016, with the code itself made freely available on GitHub.
  • Department for Education first published automated statistical tables of initial teacher training census data in November 2016, followed by the automated statistical report of pupil absence in schools in May 2017. DfE are now in the process of rolling out the RAP approach across their statistics publications
  • Ministry of Justice, as well as automating their own reports, have made a huge contribution with the development of the R package xltabr which can be used by RAPpers to easily format tables to meet presentation standards. Xtabr has also been made available to all on the Comprehensive R Archive Network.

The incorporation of data science coding skills with the traditional statistical production process, coupled with an online code sharing approach lends itself to increased collaboration, improved efficiency, and creates opportunities for government statisticians to provide further insights into their data.

Developing statisticians’ coding skills to meet future organisational needs

This is a case study for Principle T5: Professional capability.

The Department for Transport (DfT) has been upskilling its analysts to facilitate the adoption of data science methods in the department. To help with this, DfT has established weekly Coffee and Coding sessions and bespoke R coding workshops, building on successful models used in the Department for Education and Business Enterprise Industry and Skills.

Coffee and Coding sessions aim to nurture and encourage a vibrant, supportive and inclusive coding community. They provide a regular opportunity for people to share coding skills, knowledge and advice, and to network and get to know each other. The format is usually a presentation followed by a Code Surgery. Presentations usually demonstrate a tool or technique and/or a show and tell of new work done within the department. Code Surgeries allow people to raise coding queries or ideas with the coding community; there is no such thing as a silly question and it is understood that the quest for knowledge necessarily includes failure.

The R workshops are a suite of sessions designed to train DfT’s statisticians in the basics of R coding. They are mainly based around the use of tidyverse R libraries to maintain regular standards, and include topics such as data wrangling with dplyr, graphing with ggplot2, and report automation with rmarkdown. DfT’s first cohort graduated in late 2018 and the second is due to start in early 2019.

DfT runs a mentorship programme (akin to the GDS Data Science Accelerator) to provide support to those taking on data science projects using a new tool or method. DfT expects that eventually there will be enough coders in the department that asking for statistical coding advice will be as easy to source as advice on using Excel.

A big part of DfT’s approach is to encourage people to share knowledge, so that pioneers trying methods for the first time generate resources for others to use and adapt. GitHub has become central to this process – DfT uses it to share code and host any materials from DfT’s weekly coding meetings and to signpost to useful resources online. DfT has also developed coding standards, that specify DfT’s minimum requirements for ‘good code’, whilst not burdening the developer with lots of extra work. For example, DfT requires that the master version of a script is not edited without going through a code review and encourage the use of automated testing (Continuous Integration) tools. The document is community edited so standards can evolve as change as needed.

DfT encourages analysts to use similar variants of code and to follow a style guide. For data analysis, R and Python have proved popular language choices, but there are also style differences within R and Python. For this reason, DfT has default suggested packages in DfT’s coding standards and approaches the R workshops with a consistent coding style, encouraging developers to use the Tidyverse syntax style. This means that a relatively new coder only has to learn this syntax style to be able to interpret typical code across the department.

DfT collaborates closely with its Digital Services team to ensure that the core functions of the software development tools work, making sure analysts can install packages for Python and R, use Git to version control their code, and use dependency management tools like packrat.

Senior leaders, including the Head of Profession for Statistics and managers responsible for teams of statisticians, have a good understanding of the benefits of RAP. As a result, staff are strongly supported to take time to develop new skills and improve their statistics. DfT’s RAP developments have been underpinned by a strategic goal to produce most of its statistics using a RAP approach. This has been recognised by the wider department – for example, the RAP committee won the Excellence in Learning award at the DfT 2020 Staff Celebratory Event.

This example shows how DfT staff are provided with the time and resources required to develop new coding skills, knowledge and competencies to meet DfT’s future organisational needs and how DfT is developing new quality strategies and standards.

Innovating across the production and dissemination process

This is a case study for Principle V4: Innovation and improvement.

The National Travel Survey (NTS) team at the Department for Transport (DfT) has implemented a series of innovations and improvements during 2018/19. Some of these have been simple to implement but have a significant impact, while others have provided opportunities for the team to learn new skills that will provide long-term quality and efficiency benefits, for example, learning to use R Studio to automate data processing methods.

Making efficiencies has freed up analytical resource to make improvements in other areas, leading to a positive snowball effect. A user-first approach has been adopted, with all innovations being about how to further meet users’ needs.

Recent NTS innovations and improvements include:

  • Improving the NTS questionnaire following a feedback exercise, to check the relevance of NTS questions and the burden placed on respondents. As many NTS questions are still required by users, to make space for new topics, questions are rotated so that they are asked every other year. This ensures the survey length is not extended whilst still meeting user needs. New questions undergo extensive cognitive and panel testing to ensure participants understand them and that they collect the data users want
  • Setting up an innovative NTS Panel, consisting of NTS participants who agree to be contacted for follow-up research. This allows additional, smaller pieces of research to be conducted while not making the full NTS interview longer. The panel can target a sub-section of the population (e.g. people who cycle) where it would be disproportionately burdensome to ask everyone in the full NTS. Panel responses can also be linked back to original NTS responses, to greatly enhance the utility of the data
  • Collaborating with other analysts, including those outside of Government, to produce NTS analytical reports, demonstrating the breadth of information available in the NTS. By making the dataset accessible via the UK Data Service, and the ONS Secure Research Service, far more analysis can be undertaken than could be done by the NTS team alone
  • Advance letter and incentive experiments investigating how to boost response rates
  • Methodological improvements to collect walking data more accurately
  • Conducting a Discovery to explore whether developing a digital NTS diary could reduce respondent burden and increase data quality
  • Designing interactive tables and revising the data table categories so that it is easier for users to find the data they are searching for on GOV.UK
  • Publishing ad-hoc analyses, so they are accessible to all and enable the reuse of NTS data
  • Using R Studio to provide regular standard errors and confidence intervals for NTS statistics and ad-hoc analyses
  • Producing a user-friendly quality report to inform users about the quality of the NTS data, including sampling, methodology, quality assurance procedures and confidentiality
  • Making efficiency improvements to NTS data processing methods to greatly increase levels of automation using R, SQL and more advanced Excel functions

These improvements have led to increased engagement with a range of NTS stakeholders:

  • The publication of ad-hoc tables has drawn interest from academics and transport planners who have used the data as the basis for conducting further analysis in collaboration with DfT
  • The analytical reports produced in collaboration with external authors have provided a fresh look at what the NTS can provide and received mainstream and specialist press coverage
  • The NTS Panel has resulted in new demand from policy teams, with the team now looking forward to exploring these new research topics

The team is also testing the use of MailChimp as a new way to keep users up-to-date with NTS statistics and developments through a regular newsletter. The team hopes that this will increase its engagement with NTS users even further.

This example shows how the NTS team keeps up to date with developments that might improve NTS statistics for users, is transparent about its forthcoming development plans, and engages with users to get their feedback on plans to better meet their needs. It also shows how the NTS team collaborates with expert analysts to enhance value and insight, creates efficiencies by innovating methods and quality processes, and seeks to improve users’ experience by finding new ways to engage with them and enhancing the range of statistics that it makes available.