Rapidly setting up an automated data collection

Prior to March 2020, the Department for Education (DfE) published termly and annual pupil absence data based on information provided to them through the school census, with a lag of around two terms. However, during the COVID-19 pandemic, school attendance became a key societal issue and there was a strong need for real-time data at a national level.

Initially, DfE introduced a form for schools and colleges in England to complete manually each day. Whilst this approach provided DfE with the key information that was needed, it placed a high burden on schools and so DfE explored options for automating the collection.

DfE rapidly set up a new system which automatically collects daily attendance data from schools. This method of data collection was revolutionary for the department and its stakeholders and, because it is automated, it created no additional burden for schools. This was done on a voluntary basis to start with and reached a rate of 90% of schools choosing to participate, before the collection became mandatory at the start of the 2024/25 academic year.

The first outputs from these collections were published in September 2022 and have been published on a fortnightly basis to meet user needs, which considerably reduces the lag. The information is presented in a bulletin and a dashboard. The figures relate to the attendance of 5-to-15-year-old pupils in state-funded primary, secondary and special schools in England, and includes breakdowns for pupil groups.

This real-time automated collection has enabled policy makers in DfE to respond rapidly to arising issues, identify trends in attendance, and quickly understand and spread practice from areas showing improvements. For example, during the teacher strikes in 2023, DfE was able to produce rapid transparency data on the number of schools that were closed on strike days. Schools and local authorities are also able to use the attendance information operationally to more efficiently monitor absence by identifying pupils who need support earlier and benchmark themselves, saving time and enabling earlier intervention.

In 2023, these statistics won the Royal Statistical Society (RSS) Campion Award for Excellence in Official Statistics. The RSS noted that “the judges considered this to be an example of agile, useful data provision and an exemplar for other to follow. They were also impressed with the efforts made to ensure transparency so the findings could be communicated to a broad audience, as well as the use of new administrative data.”

Demonstrating transparency when linking and publishing data

The Scottish Government’s (SG) health and homelessness in Scotland project linked local authority data about homelessness between 2001 and 2016 with NHS data on hospital admissions, outpatient visits, prescriptions, drugs misuse, and National Records of Scotland information about deaths.

Transparency around the risk assessment process helps to demonstrate a producer’s Trustworthiness to users, suppliers and the public. One of the ways in which SG demonstrated this was by conducting and publishing their data privacy impact assessment alongside the main analysis report. SG also published the original application for the data, the public benefit and privacy panel application and the correspondence documenting its approval, and details of how to access the data. This approach is now standard practice for all SG publications based on linked data.

Since SG carried out this work, a new tool for risk assessment – Data Protection Impact Assessments (DPIAs) – have been introduced following the 2018 Data Protection Act (DPA), as a requirement of GDPR. They are mandatory where data are combined from multiple sources and the Information Commissioner’s Office recommends they are also conducted on a voluntary basis for any large-scale processing of personal data.

The accountability principle in the DPA requires organisations to have appropriate records in place to demonstrate compliance if required. Departments can meet the DPA accountability principle by conducting a DPIA, and publishing them helps to meet the Code’s requirements for transparency (providing that they are accessibly presented). It isn’t essential to publish a DPIA in full, a summary of the process and the lessons learnt would be sufficient to demonstrate transparency.

Another step producers can take to increase transparency is to publish details of all the data share requests made to them and their outcomes. SG publishes details of the data sharing requests submitted to its Statistics Data Access Panel on its website, which also includes details about past decisions made and the justifications for those decisions.

The Department for Education in England has also been publishing details of the data share requests and outcomes in relation to ad hoc National Pupil Data Sharing for several years. In December 2017, the Department for Education broadened the scope to cover all routine sharing of personal data and have recently consulted users about further changes to make this easier to engage with and understand.

These examples show how Trustworthiness can be demonstrated by statistics producers being transparent about their approaches to the management of the data linkage process and data shares, and their relevance to some of the current legislation in this area.