Using machine learning to detect patient experiences

Patients share many experiences on forums that are dedicated to their disease.  These experiences contain so-called experiential knowledge, or knowledge that you acquire from experience. To be able to eventually extract and use this knowledge to drive clinical research, we will first need to identify which forum posts actually contain experiences and to be able to automatically do just that, we used machine learning techniques. By automating this step, it becomes technically possible to apply it on a large scale on every forum irrespective of its size.

Machine learning is basically when you get the computer to figure out for itself when something belongs to category A (it IS an experience) or category B (it is NOT). To do this, you need to feed the computer a bunch of sentences that are experiences and a bunch that are not. It can then for example figure out which words can predict that something is an experience. You can also feed the algorithm more complex features like blocks of 4 letters or more abstract features like how positive a statement is and then it will figure out for those features which ones predict that something is an experience. The model then uses all of the words (or other features) of a forum post to draw a final conclusion about whether a post contains an experience or not.

We found that certain types of words are predictive of sentences that DO contain a personal experience, namely words related to:

  • health like imatinib (a type of cancer medication)
  • first-person narrative like I or my
  • past tense like was

Words that predict that a sentence does NOT contain an experience are related to:

  • emotional support like pray
  • second-person narrative like you
  • future tense like will and may

This is interesting because it shows that when patients share experiences, these experiences are often about themselves (and not someone else) and about their health. On the other hand, the rest of the forum posts are seem focused on giving emotional support to others. These are not necessarily the only characteristics of posts with and without experiences but these are the ones that are most different between the two types of posts.

So how much of the forum is filled with posts in which people share their experiences? About 37% in the case of the GIST International Support Facebook group. But what are they talking about? To figure that out we used specialized topic modelling techniques and we uncovered 14 different topics of conversation.

  1. Location of the tumor
  2. Emotional coping with having the disease
  3. Duration of Treatment
  4. Types of scans (like PET or CT)
  5. Getting diagnosed with GIST
  6. Other medication than the first-line treatment imatinib
  7. Side Effects
  8. Tumor Surgery
  9. No Recurrence of the Tumor
  10. Recurrence of the Tumor or Going back to work or a previous medication
  11. Emotional support
  12. Dosage of Medication
  13. The Timing of Scans
  14. Taking imatinib pills

Of course, this is just the start. Next we would like to figure out automatically what patients are actually saying about these topics. Stay tuned!


This work will be presented and published at the European Conference for Information Retrieval in Cologne, Germany this Sunday (14 April 2019)