Welcome to the Text Mining and Retrieval mailinglist. This list is mainly used for sending invitations and announcements. Also, the group meetings will be announced on this list. I am a researcher on the edge of Natural Language Processing (NLP) and Information Retrieval (IR). My current research focus is on text mining and retrieval in complex domains.
Since 2017, I have been employed as an Assistant Professor at the Leiden Institute of Advanced Computer Science (LIACS). I am affiliated with the Data Science Research Programme of Leiden University. I am group leader of Text Mining and Retrieval Leiden.
I currently supervise projects that implement and evaluate text mining and retrieval methods in a diversity of domains. My group works with a large diversity of textual data: grey literature reports, scientific and legal publications, EU law texts, health records, user-generated content in online patient communities (discussion forums), and news posts on social media.
I collaborate with many different public and private partners such as Sanoma, YoungCapital, the Ministry of Foreign Affairs, The Dutch National Institute for Public Health and the Environment (RIVM), Hogeschool Codarts, and KankerNL.
News and updates
December 2019:
- I am proud that I am one of the nominees for the faculty of science’s Teacher of the Year award!
- I am currently recruiting two international PhD students in the H2020 project DoSSIER, on Domain-specific Information Extraction and Retrieval. Application web page.
- I am a member of the newly established Ethics Review Committee of the Science Faculty
- I am general co-chair of MISDOOM 2020: the 2nd Multidisciplinary International Symposium on Disinformation in Open Online Media, to be held on April 20-22, 2020 in Leiden
November 2019:
I presented two posters at the Dutch-Belgium Information Retrieval workshop: one together with Benjamin van der Burgh about experiments on Dutch data proving the benefits of ULMFiT for classification with small datasets, and one with Alex Brandsen on the release of the Dutch BERT model, BERT-NL.
- My group (in particular Alex Brandsen and Benjamin van der Burgh) has published a number of Dutch-language data sets, together with pre-trained BERT and ULMFiT language models on textdata.nl.
- NWO published the First national research agenda for Artificial Intelligence (AIREA-NL), for which I was a member of the expert committee
- On November 5, I gave an invited lecture in the Lorentz workshop “the future on academic lexicography” on Text Mining for Lexicography. There was a large, interested and engaging crowd. I uploaded the slides here.
October 2019:
- Pre-print just out: “The merits of Universal Language Model Fine-tuning for Small Datasets – a case with Dutch book reviews” by Benjamin van der Burgh and Suzan Verberne on evaluating the effectiveness of ULMFiT for small training sets. Paper on arXiv. Data on GitHub.
- Vacancies! 15 fully-funded PhD Positions on Domain-Specific Search (EU Marie Curie Action project). See the webpage of the DoSSIER project. I am hiring for project 6 and 7, addressing transparency and explainability in legal search. Note that if you have been living in the Netherlands for the last 2 years, you cannot apply for my projects, but you can for those in the other countries.
September 2019:
- On September 10, I was one of the speakers of the event ‘DIT WORDT HET NIEUWS‘, on the future of journalism. My message to the chief editors of three national news papers: Good data journalism is the key
- First Text Mining lecture of the semester with a packed room!
- It is the start of the academic year! We welcome new master students in our group for research projects and thesis projects!
- The course schedule for the Text Mining master course has been published here
August 2019:
- Just published in Multimodal Technologies and Interaction: “Data-Driven Lexical Normalization for Medical Social Media” by Anne Dirkson, Suzan Verberne, Abeed Sarker and Wessel Kraaij
- The result of our interdisciplinary collaboration on junk news was published in PLOS ONE! “The reach of commercially motivated junk news on Facebook” by Peter Burger, Soeradj Kanhai, Alexander Pleijter, Suzan Verberne.
July 2019:
- I presented the paper “Extracting and Matching Patent In-text References to Scientific Publications” (with Ioannis Chios and Jian Wang) in the BIRNDL workshop at SIGIR 2019 in Paris.
- I chaired the Women in IR session during SIGIR 2019 in Paris.
I was interviewed for the Volkskrant, on the topic of Natural Language Processing for Scientific Discovery. “Hoe kunstmatige intelligentie nieuwe kennis opduikt uit miljoenen wetenschappelijke artikelen” (July 19, 2019)
June 2019:
- UMUAI just published our paper Personalized support for well-being at work: an overview of the SWELL project (Wessel Kraaij et al.), the condensed overview of the SWELL project 2011-2017. How to use commodity sensors to interpret behaviour, affect and health status for healthy work-style coaching.
- Our paper “Extracting and matching patent in-text references to scientific publications” (Suzan Verberne, Ioannis Chios and Jian Wang) has been accepted as full paper for the BIRNDL workshop at SIGIR!
On June 18, I was talkshow guest in the Science Café Den Haag, on the theme Big data.
- I am honored to be associate editor for ACM Transactions on Information Systems (TOIS)!
- The first Dutch meeting on Clinical NLP (programme) that I organized in Utrecht on June 12, was a big success! Some photos and summaries can be found here.
- On June 11, I was one of the invited speakers in NEMO Kennislink Life, on the theme “Geen nieuws is goed nieuws”
May 2019:
- Our proposal “Constructing a Unified Knowledge Base by joint Deep Learning from images and text” in the Innovation talent programme Leiden-XJTU joint PhD on AI/Bioscience was accepted! This means that Xue Wang will continue her PhD project in LIACS under supervision of Fons Verbeek and me.
- Our paper “Lexical Normalization of User-Generated Medical Forum Data” (Anne Dirkson, Suzan Verberne, and Wessel Kraaij) was accepted for the workshop on Social Media Mining for Health Applications at ACL 2019. In addition, Anne successfully participated in all shared tasks at the SMM4H workshop, resulting in the paper “Transfer learning for health-related Twitter data”.
- Together with a number of LIACS colleagues, I am part of the project Curriculum Development in Data Science and Artificial Intelligence / DS&AI, funded by Erasmus+
- The spring semester is almost finished! I taught my last lecture in the Data Science bachelor course. The final course schedule can be found here.
- I organize an open meeting in Utrecht on clinical NLP on June 12, with 8 presentations (academic and non-academic) and discussion on the challenges of using text data in health records for knowledge extraction and predictive models. Program and registration.
- I am one of the course coordinators of the SIKS course ‘Advances in Information Retrieval’, together with Arjen de Vries and Djoerd Hiemstra. The course takes place on October 8th and 9th in Utrecht.
April 2019:
- I was at
ECIR 2019 in Cologne together with my PhD students Anne, Gineke, and Juan. They presented their work in two workshops on Sunday.
- We are having a Women in IR meeting at ECIR 2019, on Wednesday April 17. If you plan to attend ECIR then please sign up for the WIR meeting here.
- Together with Michael Emmerich and Frank Takes, I represent LIACS in the RISE_SMA project on Social Media Analytics for Society and Crisis Communication. RISE_SMA is an interdisciplinary, international network combining excellent scholars and practitioners. We have the kick-off meeting on 15 and 16 April in Duisburg, Germany.
- Leiden University published a research dossier on Artificial Intelligence.
March 2019:
- I became co-chair of the worldwide Women in IR network, together with Nazli Goharian (Georgetown University)
- I presented a brief summary of ICT with Industry during ICT.Open 2019 and announced the call for 2020
- Now online: “User Requirement Solicitation for an Information Retrieval System Applied to Dutch Grey Literature in the Archaeology Domain” by Alex Brandsen, Karsten Lambers, Milco Wansleeben, Suzan Verberne.
- Juan Bascur Cifuentes has joined my group with his PhD project “Interactive visual browsing and retrieval of scientific literature”, co-supervised by Ludo Waltman and Nees-Jan van Eck of the CWTS.
- Our paper “Narrative detection in online patient communities” (work by Anne Dirkson) has been accepted for the Text2Story2019 workshop at ECIR!
February 2019:
- The paper “Digging in Documents; applying Information Retrieval techniques to the grey literature problem in Dutch archaeology” by Alex Brandsen, Karsten Lambers, Milco Wansleben, and Suzan Verberne was accepted for publication in the Journal of Computer Applications in Archaeology!
- The paper “Citation Metrics for Legal Information Retrieval Systems” by Gineke Wiggers and Suzan Verberne was accepted for presentation in the Workshop on Bibliometric-enhanced Information Retrieval (BIR 2019) at ECIR!
- The paper “Predicting life expectancy with a recurrent neural network” by Merijn Beeksma, Suzan Verberne, Antal van den Bosch, Iris Hendrickx, Enny Das, Stef Groenewoud was accepted for publication in BMC Medical Informatics and Decision Making.
- On February 13, 2019 I was examiner for the PhD defense of David Maxwell at the university of Glasgow. He defended his thesis titled
“Modelling Search and Stopping in Interactive Information Retrieval”. On the day before the event I presented my work on evaluating complex search tasks in the Interaction in Information Retrieval workshop.
- The semester has started! I am teaching the elective bachelor course on Data Science.
- I was interviewed for the Women in Science day at our university.
January 2019:
- The paper “Query-based summarization of discussion threads” by Suzan Verberne Emiel Krahmer, Sander Wubben, Antal van den Bosch is accepted for publication in Natural Language Engineering!
- ICT with Industry 2019 was a success! As previous organizer, I am now a member of the steering committee for the next three years.
- Our work “The reach of commercially motivated junk news on Facebook” (pre-print on arXiv) received a lot of media attention on 24/25 January.
- The paper “Analyzing Empowerment Processes among Cancer Patients in an Online Community: A text mining approach” by Suzan Verberne, Anika Batenburg, Remco Sanders, Mies van Eenbergen, Enny Das, Mattijs S. Lambooij is accepted for publication in JMIR Cancer!
December 2018:
- I will be Doctoral Consortium chair for ECIR 2020.
- My research group Text Mining & Retrieval became one of the Special Interest Groups of the Data Science Research Programme
- I am invited editor of the special issue “Text Mining in Complex Domains” of the MTI journal.
- The Dutch-Belgium Information Retrieval workshop 2018 (DIR 2018), organized in Leiden in November 2018, was a success! The proceedings are available on arXiv.