Ensuring source data is appropriate for intended uses

This is a case study for Principle Q1: Suitable data sources

Legal aid statistics for England and Wales are published quarterly by the Ministry of Justice (MoJ) and draw on a range of Legal Aid Agency (LAA), an executive agency of the MoJ, administrative data sources. Legal aid statistics were first published independently as Official Statistics in 2013, and were awarded National Statistics status in 2016.  

Legal aid is a complex area and the statistics report on a variety of criminal and civil legal aid schemes, including police station attendance and civil representation. The statistics provide an extensive evidence base on the legal aid system, but the constraints of using administrative data from LAA systems means that there are some things they do not measure precisely, or at all. To enable user understanding, MoJ publishes a comprehensive Guide to Legal Aid Statistics in England and Wales. The user guide includes considerable detail about operational context in which the data are recorded and case studies to show the types of cases where legal aid would be granted and how this would be shown in the statistics  

The guide also provides a summary of the team’s professional judgments around the robustness of each data source and, more generally, a clear steer on the sort of comparisons that the overall statistics allow (e.g. volume and expenditure levels by scheme) or do not permit (e.g. the number of clients or precise geographic distribution of legal aid clients). A detailed account of the individual data sources used is further detailed in a separate ‘index of legal aid data’. The index and user guide both include a flow diagram which presents the data sources for each of the legal aid schemes. 

Many legal aid data sources are subject to minor revisions within each quarterly update from new information being included, or previous information being amended, on the underlying systems. These revisions are clearly flagged in the quarterly statistics.  

The legal aid statistics team were embedded in the LAA until recent years and maintain close links with LAA colleagues, including those responsible for the management and supply of the administrative datasets. These relationships help provide additional insight into the detail of the data sources used and any changes to these. A recent example of this was when a new provider contract for telephone advice services led to a discontinuation of a published time series on costs. These changes were explained by LAA colleagues and subsequently reported in the statistical series. 

There have been numerous other enhancements to the statistics over time, which are also clearly documented in the user guide timeline, and which have continued to improve the comparability and transparency of the data sources used to produce legal aid statistics. 

This example shows how the legal aid statistics team within MoJ ensure that the LAA data they draw on is appropriate for statistical purposes by having a thorough understanding of the operational context within which the administrative source data used to produce the statistics are collected, and by maintaining close links with LAA data suppliers. It also shows the considerable lengths that the statisticians go in explaining the relative strengths and limitations of the various data sources used to ensure the appropriate interpretation of the official statistics, including explaining the impact of changes or revisions to data sources and administrative systems over time. 

Archived: Automating statistical production to free up analytical resources

This is a case study for Principle V4: Innovation and improvement.

The Reproducible Analytical Pipeline (RAP) is an innovation initiated by the Government Digital Service (GDS) that combines techniques from academic research and software development. It aims to automate certain statistical production and publication processes – specifically, the narrative, highlights, graphs and tables. Tailor made functions work raw data up into a statistical release, freeing up resource for further analysis. The benefits of RAP are laid out in the link above, but include:

  • Auditability – the RAP method provides a permanent record of the process used to create the report, moreover, using Git for version control producers have access to all previous iterations of the code. This aids transparency, and the process itself can easily be published
  • Speed – it is quick and easy to update or reproduce the report, producers can implement small changes across multiple outputs simultaneously. The statistician, now free from doing repetitive tasks, has more time to exercise their analytical skills
  • Quality – Producers can build automated validation into the pipeline and produce a validation report, which can be continually augmented. Statisticians can therefore perform more robust quality assurance than would be possible by hand in the timeframe from receiving data to publication.
  • Knowledge transfer – all the information about how the report is produced is embedded in the code and documentation, making handover simple
  • Upskill – RAP is an opportunity to upskill individuals by giving them the opportunity to learn new skills or develop existing ones. This also upskills teams by making use of underused coding skills that may exist within their resource; coding skills are becoming ubiquitous nowadays with many STEM subject students learning to code at university

RAP therefore enables departments to develop and share high-quality reusable components of their statistics processes. This ‘reusability’ enables increased collaboration, greater consistency and quality across government, and reduced duplication of effort.

In June 2018, the Department for Transport (DfT) published its RAP debut with the automation of the Search and Rescue Helicopter (SARH) statistical tables. This was closely followed by the publication of Quarterly traffic estimates (TRA25) produced by DfT’s first bespoke Road Traffic pipeline R package. RAP methods are now being adopted across the department, with other teams building on the code already written for these reports. DfT have begun a dedicated RAP User Group to act as a support network for colleagues interested in RAPping.

DfT’s RAP successes have benefited from the early work and community code sharing approach of other departments, including:

  • Department for Digital, Culture, Media & Sport first published statistics using a custom-made R package, eesectors, in late 2016, with the code itself made freely available on GitHub.
  • Department for Education first published automated statistical tables of initial teacher training census data in November 2016, followed by the automated statistical report of pupil absence in schools in May 2017. DfE are now in the process of rolling out the RAP approach across their statistics publications
  • Ministry of Justice, as well as automating their own reports, have made a huge contribution with the development of the R package xltabr which can be used by RAPpers to easily format tables to meet presentation standards. Xtabr has also been made available to all on the Comprehensive R Archive Network.

The incorporation of data science coding skills with the traditional statistical production process, coupled with an online code sharing approach lends itself to increased collaboration, improved efficiency, and creates opportunities for government statisticians to provide further insights into their data.