Patients share many experiences on forums that are dedicated to their disease. You could think of experiences with side effects, daily obstacles or financial issues. The experiences that are most interesting to us are the ones which describe how they cope with the problems they face. For instance, a patient might feel nauseous every time they take their medication unless they drink milk first. These experiences could be very valuable to other patients, because it could directly improve their quality of life.
They are, however, also very valuable for future clinical research, because they could lead to new insights into the disease or even new treatments.
Unfortunately, these experiences often go unnoticed by other patients and researchers because they disappear under a ton of new messages. In my PhD project, we aim to retrieve the anecdotal experiences and transform them into new knowledge. This knowledge could then be used as input for clinical research.
There are a number of major challenges: the main one is that computers do not understand language. We need to teach them to understand it to such an extent that they can extract the relevant information from the data and differentiate between what is already known by researchers and what is new knowledge.
One of the obstacles is that language on social media contains many abbreviations, typos and spelling mistakes. So, the first step of my project will be to clean the data and correct these as best we can. Another problem is that not all posts contain relevant information, so we will need to design a filter for finding those posts that contain a personal experience and give advice based on it.
Next, we plan to use a medical database to find medical concepts in the text. We can then compare what we find to what is already known. This allows us to figure out what is new knowledge. The last major question of the project will be : how trustworthy is this new knowledge? Can the knowledge be ranked based on how likely it is to be true?