Author: Zuzanna Kwiatkowska
Scientific collaboration: Mateusz Rapicki
Have you ever wondered how many things must happen during your 15-minute-long visit to the doctor’s office?
Imagine yourself walking into the room. You sit down, and the doctor asks you what the reason for your visit is. They listen to you and ask about your medical history and any information relevant to your problem. Maybe they already have some of that information in their computer files, so they must read them and recollect everything about your case. Or worse – you bring that information with you in a potentially chaotic stack of documents.
Then, they do a medical check, think about potential solutions to your problem and what consequences for your health each of those solutions has. Last but not least, they need to put all of that information into the system, give you treatment recommendations, and make sure they answer all of your concerns and questions.
When you put all of those steps on paper, it sounds almost impossible for a single person to do all of those tasks during such a short visit. Also, if you multiply it by a couple of hours every day and a few consecutive days of work, you can imagine the work overload that a single doctor may suffer. Such an overload may obviously lead to multiple problems, starting with doctors making mistakes and ending with patients that are not happy about the care they receive.
But there might be a solution to this problem thanks to technological progress and machine learning! In this article, we want to present to you a case study of the project by the telDoc company that we helped with as a subcontractor. We assist in the development of a system to summarise medical documentation. By reading the article, you will learn:
- what is a long-term solution to the overwork problem and which fragment of this solution we created,
- what is summarisation from a machine learning perspective,
- how to create a high-quality dataset of summaries of Polish medical documents,
- what models you can use to obtain satisfying results.
Let’s dive deeper into the issue.
Doctor’s Voicebot Assistant to the Rescue
The core idea to make the process of medical checkups more efficient is to provide assistance to the doctor. Such assistants could collect the necessary information from the patient, ask the most important questions about their medical history and summarise them to the doctor. This way, when a patient comes for a visit, the doctor is already prepared and can fully focus on the problem. Obviously, hiring more staff is impossible due to the medical staff shortages, and this is where machine learning and bot assistants come to the rescue.
This simple, and yet groundbreaking idea to create medical bots for supporting doctors was proposed by a Polish company named telDoc. They are a group of experts in technology and medicine who use artificial intelligence to create custom-made solutions for the healthcare sector.
telDoc team realised that simply creating a voice bot to question the patient is insufficient to make the healthcare sector more efficient. After all, converting medical documentation to similarly long transcripts of voice bot-patient conversations would not make the doctor’s work easier. This is why they decided to create summaries of those conversations and other medical documents.
And here is where all the challenges started…
Machine Learning + Tedious Tasks = Match Made in Heaven
Summarization is a supervised task of natural language processing, in which we want to convert large text into a smaller one consisting of the most important information. We can distinguish 2 types of summarisations: extractive and abstractive.
Extractive summarisation is when we take a large text, find the most important phrases and use those unchanged phrases as a summary. It’s the most straightforward approach, but it has its flaws, for example not taking into account more complex dependencies between phrases. This is particularly problematic for medical texts that often rely on hidden dependencies and the full context of the case.
Alternatively, when we deal with abstractive summarisation, we create a completely new text from a larger one. This allows us to dive into the complexity of the text. However, abstractive summarisations are not perfect, as we need to be careful that the models do not come up with some made-up information in the process and only rely on provided text.
Our task was to create abstractive summarisation for medical documents, but it was additionally challenging, as we wanted to do that for the Polish language. We started with the Polish Summaries Corpus which includes large texts and their summaries in Polish. Unfortunately, the performance of language models for Polish was extremely low when used in medical texts. This created a problem on its own, as we soon realised we lacked high-quality data for this task which would include pairs of large medical texts and their abstractive summaries.
Garbage In, Garbage Out
The idea that we came up with was to use medical PhD dissertations as input to the model to simulate a medical history and their abstracts as output summaries. We downloaded about 2000 dissertations from the Polish Platform of Medical Research (PPM) and we used them together with data provided by telDoc.
All of them were in a .pdf format, so we used the PyMuPDF package to extract raw text. We also cleared the documents of the footer that PPM puts on every page of every document in their database. This way we obtained mappings of medical dissertations to their abstracts to use as our training set.
But this wasn’t the end of preprocessing. Obviously, putting the whole dissertation into any machine learning model is an impossible task due to its size and constraints on model input. This is why the text of each dissertation had to be split into smaller chunks that fit into the models. We tried various chunk sizes to obtain the best possible results.
This approach, although optimal for machine learning models, created another challenge from the data perspective. The abstract, which we used as the ground truth, was initially created based on the whole dissertation and not a single chunk. This is why we couldn’t use the full abstract as ground truth anymore.
To solve this, for each chunk we had to assign new target summaries. We did it by computing Rouge metrics between each dissertation’s chunk and each sentence of the abstract. We selected those pairs that had scores above a certain threshold. As explaining Rogue metric could be an article on its own, you can read more about it in this work. All of those steps gave us a good starting point to work on our summarization problem.
If You Want to Improve Your Model, Firstly Improve Your Data
To further improve the quality of our dataset, we dived even deeper. We noticed that some pairs of (chunk, sentence) have high Rouge metrics only because they have some unimportant words in common. Those words included so-called conjunctions like “and”, “or”, “but”, etc. We called these words “stop words” and filtered them out before calculating the Rouge metric. We also decided to ignore sentences of the abstracts that were very short (less than four words), because they too were easily getting high Rouge scores while not being especially meaningful.
As Polish is more complex than English, we further focused on specifics of the language. We noticed that the original tokenizer from the Rouge implementation we used ignores all letters except A-Z, and therefore works poorly for Polish texts. The Polish alphabet has 9 more unique letters than English, but we don’t use English Q, X and V, so we created our own tokenizer to handle the differences. We also used a stemmer, so that the Rouge metrics work regardless of different grammatical forms. This is particularly important for Polish, because of the many different forms the same word can have.
Last but not least, we also experimented with the metric to make decisions about the mapping. We used different combinations of the Rouge metrics from those implemented in the package and found one that raised the best results by trial and error.
Final Step: The Model
We fine-tuned the Polish language models PL-T5 and RoBERTa on the summarisation task using the created datasets. This is a fairly standard procedure in NLP, because we utilise all the information about language already encoded in the model, but we additionally adjust it to our own task and its specifics. This step was also uniquely rewarding, as telDoc in fact created RoBERTa!
However, we do not want to share the final results and make some bold claims about performance, as we know training the models weren’t totally separate from dataset creation. Although we showed the history of our project in a very structured and linear way, it was in fact iterative. Every time we trained a new model, we used the information about its performance to develop ideas to improve our dataset and then retrain it.
Still, we believe that even without making the scores public, our insights can bring value to the machine-learning community. Believing in trustworthy and reliable machine learning, we hope that the insights and conclusions we have will be equally important as simply sharing a table of numbers with performance.
We can’t wait for the future proposed to us by telDoc. Automated medical assistants could solve many painful problems in the healthcare industry like long queues, doctors with limited time for us during the visit or delays in medical facilities. It’s the best example of how technology can go hand in hand with people to create a better future for us all.
As MIM Solutions, we are proud of the summarisation module we delivered. Even in this article, you can see what the most important steps of the machine learning process are for us. We believe that high-quality data are and will always be a core strength of those systems. They allow us, engineers, to better understand the problem and create trustworthy and reliable solutions. In the near future, telDoc plans a further collaboration with us to test our modules in real-life scenarios.
If you ever need experts like us, who want to revolutionise how we deliver healthcare with artificial intelligence, do not hesitate to contact us! Our expert Adam Dobrakowski is always happy to discuss the collaboration – just send him a message.