NHS patient data is often used for secondary purposes such as research in the UK, under strict privacy and governance rules. However, medical reports and clinic notes are tricky to anonymise and rarely used for research. A group of researchers are aiming to set up a donated ‘text bank’ made up of clinic notes and letters where patients have consented for their use in research.
Using this text bank, researchers will train computer programs to ‘read’ the text and extract the important clinical information. This would mean clinical information from these parts of patient records could be used for research without a human ever seeing them – protecting patients’ privacy. The research team have just completed a consultation with stakeholders on how this donated text bank should be set up.
Health and social care data from the NHS and local government are now routinely collected, stripped of identifiers and linked together, to help regional NHS organisations plan services and address health inequalities. Almost all of these datasets use only structured data fields – clinical codes – as these can be extracted easily and patient privacy protected. However, a huge amount of useful clinical information is contained in electronic copies of clinic notes, letters and reports, in unstructured ‘free text’. This text is difficult to use for research because special computer programmes are needed to process the text and extract the correct clinical information. To develop these tools, researchers need access to the full patient text, but often cannot get access because of privacy issues – it is time-consuming to anonymise a letter or clinic note and researchers need many examples to build their language processing tools.
A group of researchers across the UK are working together to try to solve this problem by creating a “donated text bank” – where patients can offer to donate the medical notes, letters and reports the NHS holds on them, and this can be made available to computer science researchers with or without anonymisation so that they can develop the natural language processing (NLP) tools. When developed, these tools can be sent into unseen patient records and extract out clinical information in a structured way, which does not threaten patient privacy.
In March 2022 the team, led by Natalie Fitzpatrick at UCL, as well as researchers from King’s College London, University of Manchester, BSMS and Swansea University, held focus groups with four stakeholder groups: patients, clinicians, information governance and ethics leads, and NLP researchers; to understand more about the acceptability of establishing this donated text bank for research. All the stakeholders were in favour of the text bank idea, and gave interesting and valuable suggestions for communication, data gathering, housing and use of the databank.
BSMS researcher Dr Elizabeth Ford helped to plan the focus groups and interpret the results. She said: “It’s brilliant that more and more NHS patient data is being used to identify health inequalities and plan more equitable and accessible services. However, we know that a lot of relevant clinical information is missed if only clinical coding is used, especially in GP and mental health services.
“Creating a donated text bank would be a fantastic way of moving forward with the development of algorithms that can accurately extract diagnoses, symptoms and other information from text without a human ever having to see a patient’s record, circumventing the privacy issues which have blocked developments in this field. There was widespread support for this idea and focus groups participants suggested many ways in which we could communicate transparently about the donated text bank to make sure patients can give fully informed consent. We are now working on how we can set up the text bank in practice, while meeting stakeholder expectations.”
The report has recently been published by UCL here >