Information Foraging summer schools 2011-2013

EU Intensive Programme “Information Foraging” 2011-2013 (Summer School)

In 2011 , 2012 and 2013 IFL @ Radboud University Nijmegen organized a summer school about “Information Foraging” , funded by an Intensive Programme (IP) grant from the EU Lifelong Learning Programme Erasmus. Participating universities in this IP were University of Glasgow, University of Tampere, Université Paul Sabatier, KU Leuven and Universität Duisburg-Essen (2011). Course coordinators were Prof. W. Kraaij and Prof. Th.P. van der Weide. The summer school is endorsed by SIKS, the Dutch research school for Information and Knowledge systems. In 2012 and 2013, University of Amsterdam, Royal School of Library and Information Science (DK), University of Hildesheim, University of Sheffield, University of Milano Bicocca and the University of Strathclyde joint the cooperation.

About the course theme

The goal of the course was to introduce students to theoretical models and technology related to all facets of (professional) interaction with information in an information seeking context. The topic ‘Information Foraging’ reflects the shift of attention in information retrieval research from static document statistics towards i) on-line systems that are designed for user interaction and ii) exploiting the collective information access behavior of communities of users. The course has been designed for master or PhD students in computer science, information science or artificial intelligence. It is assumed that students have some basic knowledge of information retrieval methods and models.

The IP is organized in the context of the IRUN International Research Universities Network , funded through the EU Lifelong Learning programme. In addition the course is part of the advance program of the SIKS research school. Additional student funding has been obtained from the ESF ELIAS programme.
Programme 2011

The course consists of 10 days, starting at 22/8/2011 until 2/9/2011. The typical layout of a course day consists of 3 hours of lectures in the morning, lunch break, practical assignments and exercises in the afternoon. All lectures will be given in English. Participating universities will credit this course with 4 ECTS.

Lectures are grouped around four subthemes:
1. Information seeking behaviour
- Cognitive models of information seeking – Norbert Fuhr (Universität Duisburg-Essen)
- User oriented and task based IR research – Jaana Kekalainen (University of Tampere)
1. Interaction
- Modern interfaces for IR systems – Norbert Fuhr (Universität Duisburg-Essen)
- Monitoring interactionwith eyetracking – Sascha Kriewel (Universität Duisburg-Essen)
- Simulated evaluation methodology – Joemon Jose (Universisty of Glasgow)
- Social signal processing – Alessandro Vinciarelli (University of Glasgow)
- Interactive IR systems for multimedia – Tinne Tuytelaars (Katholieke Universiteit Leuven)
- Question answering – Marie-Francine Moens (Katholieke Universiteit Leuven)
1. Context and personalized search
- Context and personalization – Mohand Boughanem (IRIT – Université Paul Sabatier Toulouse)
- Improved accessibility for target groups – Leif Azzopardi (University of Glasgow)
1. Exploiting explicit and implicit social annotations
- Affective IR – Joemon Jose (University of Glasgow)
- Logfile analysis – Wessel Kraaij (Radboud University Nijmegen)
- Collaborative filtering – Theo van der Weide (Radboud University Nijmegen)
- Social tagging – Sacha Kriewel (Universität Duisburg-Essen)
Lecturers 2012

Pia Borlund, Royal School of Library and Information Science

Ian Ruthven, Strathclyde University

Jaap Kamps, University of Amsterdam

Elaine Toms, University of Sheffield

Mohand Boughanem, Université Paul Sabatier / IRIT

Leif Azzopardi, University of Glasgow

Keith van Rijsbergen, University of Glasgow

Gabriella Pasi, Universita Milano Bicocca

Norbert Fuhr, University of Duisburg-Essen

Sascha Kriewel, University of Duisburg-Essen

Theo van der Weide, Radboud University

Wessel Kraaij, Radboud University

Marie-Francine Moens, KU Leuven

Detailed programme
Programme 2013

Detailed programme

Lecture abstracts and bios

BIRGER LARSEN, Royal School of Library and Information Science

Birger Larsen is Associate Professor and leader of the research group on Information Systems and Interaction Design at the Royal School of Library and Information Science, in Copenhagen, Denmark. He has a passion for research that involves the activities, processes and experiences arising in the meeting between users, information, and information systems in a given context – with the goal of optimising these to empower users in their task and problem solving. His main research interests include Information Retrieval (IR), structured documents in IR, XML IR and user interaction, exploiting context in IR, Informetrics/Bibliometrics, citation analysis and quantitative research evaluation.
He is involved in a number of research projects on information retrieval and information access to large repositories of cultural heritage and scientific documents, and is active in organizing conferences workshops.

LECTURE: INFORMATION RETRIEVAL SYSTEMS AND MODELS

Search is a major enabler in almost all information access and use. Most current search applications, including web search engines, work because of their root in a long line of research in the field Information Retrieval (IR). The aim of IR systems is to retrieve many or all relevant documents for a user’s information need – and at the same time to retrieve as few irrelevant documents as possible. This course will introduce the main components of IR systems: how to build search engine indexes, how to match information needs to documents, and how to rank documents effectively. The main matching and ranking models will be introduced, including Boolean, Vector Space, language modeling and probabilistic models. Incorporation of user feedback and link popularity in web search will also be discussed.

JAAP KAMPS, University of Amsterdam

Jaap Kamps is associate professor of Information Storage and Retrieval at the University of Amsterdam, with over 280 publications on all facets of IR. He is a leading researcher on novel access methods for digital information, active organizer international workshops, conferences, and benchmarking initiatives for the evaluation of novel search technology. He leads several large research projects (funded by the NWO and the EU) in the area of search technology, in particular for digital heritage in collaboration with major archives, libraries, and museums. Jaap’s homepage.

LECTURE: EVALUATION 101: IS IT ANY GOOD AND WHY?

Evaluation is key in information access, yet scientifically much evaluation is focusing on the narrow question of what system is best. All research is guided by research questions, and the choice of method (including evaluation methods) should be determined by the research question at hand. Standard evaluation benchmarks in the Cranfield/TREC paradigm have served our field well by quantifying system effectiveness in a meaningful way and by aligning the research agendas of many groups around the world. But this is currently under challenge by rapid recent developments, requiring us to rethink our methods. Our content is changing, and structured content — both in terms of document structure and annotations, as well as the overall collection structure — is prevalent. Also any action on the Web leaves its trails and rich contextual information is available, especially in a mobile setting. So also the search context is changing — with rich information about the task, the searcher, and prior interactions becoming available. How do such changes factor into the old search problem?

IAN RUTHVEN, University of Strathclyde

Ian Ruthven is a Professor of Information Seeking and Retrieval in the Department of Computer and Information Sciences at the University of Strathclyde. He graduated from the University of Glasgow with a BSc in Computing Science, before completing a Master’s in Cognitive Science at the University of Birmingham and a PhD in Interactive Information Retrieval at Glasgow. He mostly work in the area of information seeking and retrieval; understanding how (and why) people search for information and how electronic systems might help them search more successfully. This brings in a wide range of research including theoretical research on the design and modelling of information access systems, empirical research on interfaces and user interaction and research on the methodology of evaluating information access systems. Currently, he is particularly focussed on information seeking and retrieval for children, alleviating barriers to health information seeking and the use of the Internet in situations of information poverty.

LECTURES: INFORMATION SEEKING AND INFORMATION INTERFACES

Ian Ruthven presents three lectures on information behaviour, information seeking and information interfaces. The first two lectures covered concepts such as relevance, context and ways to study information behaviour before extending into theories and methodologies for studying information seeking and creating information seeking models. The lecture on interfaces started by sketching the history of IIR systems from catalogs to internet systems and presented ways to think about interface design and how creative interface design could change user search behaviour.

PIA BORLUND, Royal School of Library and Information Science

My research focus is on the design and evaluation of systems that support users’ interactive information retrieval (IIR). I am interested in the confluence of information retrieval (IR), human-computer interaction, and information seeking behaviour from a task-based perspective. I have conducted research on frameworks and guidelines for performance evaluation of IIR systems centred on the concept of simulated work task situation by involvement of users. My current research focuses on methodological issues, experimental design and requirements for user-based performance evaluation. Specifically, I am interested in the concept of relevance, including users’ relevance assessment behaviours of different work and search tasks, as well as the nature of subjective, non-binary relevance. I am a full Professor (since 2009) at the Royal School of Library and Information Science, Denmark. I am a member of the Research Board of the Royal School and represent the Royal School in the National think tank concerning collaboration of social and humanistic sciences. I serve on the Editorial Boards of several international top journals and conference programme committees. I have published numerous papers, articles, and book chapters in top journals/conferences in the field of IIR. In addition to these professional activities, I have supervised over 40 Bachelor, Master or PhD students. I received my PhD in 2000 from Åbo Akademi University, Finland. I completed a MLISc degree (1995) and a Degree in Librarianship (1993) from the Royal School of Library and Information Science, Denmark. In the period 2006-2010 I had various trusted administrative duties and tasks at the Royal School of Library and Information Science as, e.g., Head of Department, Vice-rector, Head of Research, and Dean of Research.

LECTURES: INTERACTIVE, TASK-BASED IR EVALUATION, PART I & II

Part I starts out by establishing the focus of interactive (I)IR by relating it to the related research areas of information seeking and information behavior. Hereafter the concept and test instrument of a ‘simulated work task situation’ is introduced, the guidelines and requirements for its design and use are presented and discussed, and illustrative examples of successful and less successful designed and tailored simulated work task situations are shown and discussed in order to gain an understanding of what makes a good simulated work task situation. After this the students are given assignments on how to design and tailor simulated work task situations according to specific user groups. Part I closes with plenum discussions of the students’ simulated work task situations and the difficulties they had experienced in making the simulated work task situations.

Part II concerns the planning and design of IIR evaluations studies. In this course the ‘tool box’ of IIR evaluation studies is presented, and considerations concerning the planning of test design (such as rotation and counterbalancing of search tasks, purpose of protocols, the function and types of tutorials, and the importance of pilot testing) are discussed with strong emphasis on how IIR evaluation studies can vary in focus and hence must be designed according to the research focus of the study in question. Hereafter the students are introduced to assignments on how to plan and design an IIR evaluation study based on self-defined research problem. Like with Part I, Part II closes with plenum discussions of the students’ experiences and difficulties with the planning and design of an IIR evaluation study.

ELAINE TOMS, University of Sheffield

Elaine Toms commenced her current appointment at Sheffield in 2011 after holding positions at two universities in Canada. She is currently Head of the Information Retrieval Research Group, and Director of Research at the Information School. Her research focuses on understanding why information systems fail users and the design of systems for optimum human. This involves understanding how people work and use information and how people use existing systems to accomplish their work where work may be work as we typically know or any human-initiative information activity. It also includes evaluating novel tools that facilitate access to and use of information. As a result her research lies at the intersection of human computer interaction, information retrieval and the representation and presentation of information.

LECTURES: LECTURES: INFORMATION FORAGING & EVALUATION OF NON GOAL BASED APPS

She will do two lectures that augment the ones that deal with information seeking and information interfaces, and task-based IR evaluation. The focus will be not be on the explicit search task, but on one in which the user in immersed in an information space without any explicit goals, or whose goals are so diffuse that an explicit search may be impossible to accomplish – that is they are “foraging.” Lecture 1 will examine how this concept fits with existing information needs, seeking and use models, how this process is conceptualized at the interface, and which aspects of an interface support (or not) this type of interactivity.
Lecture 2 will start from the perspective of the task-based IR evaluation and examine which aspects inform the evaluation of systems that support this type of interaction. It will also discuss several real world scenarios and map them to parsimonious but effect evaluation protocols that best serve this novel type of user interactivity and “task” domain.

SASCHA KRIEWEL, University of Duisburg-Essen

Sascha Kriewel received a diploma in computer science at the University of Dortmund in Germany before joining the research group of Norbert Fuhr in Duisburg. He holds a Doctorate of Engineering (Dr.-Ing.) from the University of Duisburg-Essen, where he is currently working as a team leader within the European Khresmoi project. Khresmoi is developing an information access system for the European public as well as medical professionals. His other main research interests are strategic support of users during the information seeking process and useroriented interface design for information retrieval (IR) and digital library systems.

LECTURE: EYETRACKING AND ATTENTION METADATA

The course on “Eyetracking and attention metadata” will cover the use of eye tracking in IR research. It will explain how eye movements or pupil dilation relate to perception and cognition, and the techniques and concepts used by eye tracking equipment for gaze capturing, such as the bright/dark pupil methods in point of regard systems for remote eye tracking. The course will then discuss different methods for analyzing captured data and which parameters and measures are most suitable for various purposes in information retrieval research. Applications and current research will be presented and the use of eye tracking in the context of other methods of collecting implicit feedback, such as click data, will be discussed.

THEO VAN DER WEIDE, Radboud University Nijmegen

Th.P. van der Weide received his masters degree Mathematics at the Technical University Eindhoven, the Netherlands in 1975, and the degree of Ph.D in Mathematics and Physics from the University of Leiden, the Netherlands in 1980. He is currently full professor in Information

Retrieval and Information Systems (IRIS) in the section Digital Security of the Institute for Computing and Information Sciences (ICIS) at Faculty of Science from the Radboud University in Nijmegen, the oldest city in the Netherlands. His main research interests include information

systems, information retrieval, hypertext and knowledge based systems. He is involved in the Information Foraging Lab, an interdepartmental research group focusing on the development of models and techniques supporting modern knowledge workers. IFL is a collaboration between researchers from the Institute of Computing and Information Sciences (iCIS) from the Faculty of Science and the Language and Speech Unit of the Faculty of Arts, both at Radboud University Nijmegen.

LECTURE: BIG DATA

In Information Retrieval applications, we may be confronted with the analysis of very large data sets (for example analyzing log files to detect user behavior).
This lecture discusses how such very large files are stored in a reliable way in a distributed file system such as the Hadoop file system. Then the MapReduce programming methodology is discussed as a mechanism for distributed processing of these files in an efficient way. We discuss some elementary applications. After that we focus on the MapReduce style of indexing. We also discuss the computation of the page rank using this mechanism.

Very large computations require some awareness of numerical stable methods of computation. We show that even the calculation of the inner product (the most popular similarity function) may lead to numerical instability when addition is implemented in the conventional naive way. We discuss how this instability problem may be overcome by an advanced algorithm for addition.

LEIF AZZOPARDI. University of Glasgow

Dr. Leif Azzopardi is a Lecturer within the Glasgow Information Retrieval Group and a full time academic member of staff within the School of Computing Science, at the University of Glasgow. His research focuses on developing formal models for the search and retrieval of information in both traditional and interactive settings. His latest research draws upon microeconomics to explain how and why users interact with information retrieval systems, while his work on retrievability utilizes Transportation Theory to understand the impact and influence of search technology on users, systems and society. Previously, Dr Azzopardi was a Post Doctoral Researcher within iLab at the University of Strathclyde in 2006 under the direction of Prof. Fabio Crestani and Prof. Ian Ruthven, and ILPS at the University of Amsterdam in 2005 under the supervision of Prof. Maarten de Rijke. He received his Ph.D. in Computer Science from the University of Paisley in 2005, where he was supervised by Prof. Mark Girolami, Prof. Malcolm Crowe and Prof. Keith van Rijsbergen. Prior to this he received First Class Honours Degree in Information Science from the University of Newcastle, Australia, 2001. He was also the past Chair of the BCS Information Retrieval Specialist Group (2006-2008) and currently sits on the committee. He is a lifetime member of the ACM and a member of the BCS and IEEE.

LECTURES: SIMULATION, FORMAL MODELS AND THEORY OF IIR 1 & 2

This course will provide an overview of the main theories and formal models for Interactive Information Retrieval. The course will cover Information Foraging Theory, Search Economics and the Interactive Probability Ranking Principle. There will be an emphasis on applying such theories to Information Seeking scenarios and how to build your own formal models for IIR.

MOUNIA LALMAS, Yahoo! Labs

Mounia Lalmas is a visiting principal scientist at Yahoo! Labs Barcelona, which she joined in January 2011. Prior to this, she held a Microsoft Research/RAEng Research Chair at the School of Computing Science, University of Glasgow. Before that, she was Professor of Information

Retrieval at the Department of Computer Science at Queen Mary, University of London, which she joined in 1999 as a lecturer (aka assistant professor). From 2002 until 2007, she co-led the Evaluation Initiative for XML Retrieval (INEX), a large-scale project with over 80 participating organizations worldwide, which was responsible for defining the nature of XML retrieval, and how it should be evaluated. Her current research focuses on three main areas: user engagement, social media and search.

See also Mounia’s blog

LECTURES: USER ENGAGEMENT 1 & 2

In the online world, user engagement refers to the quality of the user experience that emphasizes the phenomena associated with wanting to use a web application longer and frequently. User engagement is a multifaceted, complex phenomenon, giving rise to a number of approaches for its measurement: self-reporting (e.g., questionnaires); observational methods (e.g., facial expression analysis, desktop actions); and web analytics using online behavior metrics. These methods represent various trade-offs between the scale of the data analyzed and the depth of understanding. For instance, surveys are hardly scalable but offer rich, qualitative insights, whereas click data can be collected on a large-scale but are more difficult to analyze. Still, the core research questions each type of measurement is able to answer are unclear. This lecture will present various efforts aiming at combining approaches to measure engagement and seeking to provide insights into what questions to ask when measuring engagement. The lecture will emphasise those aspects impacting the development and deployment of information foraging approaches and their evaluation.

MARIE-FRANCINE MOENS, KU Leuven

Marie-Francine Moens is a professor at KU Leuven, Belgium and received a PhD in Computer Science (1999) from this university. She currently leads the Language Intelligence and Information Retrieval group. She is author of more than 240 international publications among which are two monographs published by Springer. She is (co-)editor of 12 books or proceedings including a recent edited book on mining of user generated content soon to be published by Tailor & Francis (CRC Press), (co)-author of 38 international journal articles and 29 book chapters. She is involved in the organization or program committee (as PC chair, area chair or reviewer) of major conferences on computational linguistics, information retrieval and machine learning (ACL, COLING, EACL, SIGIR, ECIR, CORIA, CIKM, ECML-PKDD). She teaches the courses Text Based Information Retrieval and Natural Language Processing at KU Leuven. She has given several invited tutorials in summer schools and international conferences (e.g., tutorial Linking Content in Unstructured Sources at the 19th International World Wide Web Conference – WWW 2010), and keynotes at international conferences on the topic of information extraction from text. She participates or has participated as partner or coordinator of numerous European and international projects, which focus on text mining or the development of language technology (FP-6: AntiPhish, 2006-2009 and CLASS, 2006-2009; FP-7: PuppyIR, 2009-2012, TERENCE, 2010-2013, TOSCA-MP, 2011-2013 and MUSE, 2012-2015; ITEA 2 LINDO project, 2007-2010). She is a member of the coordinating committee of the world conferences on computational linguistics: ACL 2010 and ACL 2013. In 2011 and 2012 she was the chair of the European Chapter of the Association for Computational Linguistics and was member of the executive committee of the Association for Computational Linguistics.

LECTURE: INFORMATION EXTRACTION AND LINKING 1 & 2

The tutorial focuses on the tasks of information extraction from and linking in text and multimedia. We witness a growing interest and capabilities of automatic content recognition in various unstructured media sources that identify entities (e.g., persons, locations and products) and their semantic attributes and relations (e.g., opinions expressed towards persons or products, relations between entities). These extraction techniques are most advanced for text sources, but they are also researched for other media, for instance, for recognizing persons and objects and their relations in images or video. An important challenge is to automatically link equivalent and complementary content allowing for improved joint recognition of the information and reasoning across documents, Web pages and other information sources. The World Wide Web is very diverse covering many different languages, media and disciplines. Challenges are the development of generic algorithms for extraction of knowledge and ontology population, and for linking and aligning content across documents, languages and media. The extracted information enriches and adds semantic meaning to documents and queries in a search setting. The tutorial goes deeper into current approaches of information extraction and linking with an emphasis on probabilistic models, structured output learning, inferencing and approximate inferencing, and the incorporation of the extracted and linked information in retrieval models. The results of the extraction and linking have many applications such as mining, question answering search, visualization and summarization.

GABRIELLA PASI, University Milano Bicocca

Gabriella Pasi received a PhD in Computer Science at the Université de Rennes, France. She has been working at the National Council of Research in Italy till 2005. Actually she is Associate Professor at the Università Degli Studi di Milano Bicocca, Milano, Italy, where she leads the

Information Retrieval Research Laboratory. Her research mainly focuses on modelling and development of techniques for flexible and personalised/contextual access to information, and on the problem of aggregation in search. She served as the Program Chair of several international conferences and workshops related to her research areas, and she was the chair or co-chair of several International events among which the IEEE / WIC / ACM Intenational Joint Conference on Web Intelligence and Intelligent Agent Technology, Università degli Studi di

Milano Bicocca, 15-18 September 2009, the PhD School on Web Information Retrieval (WebBar 2007), the Seventh International Conference on Flexible Query Answering Systems (FQAS 2006), the European Summer school in Information Retrieval (ESSIR 2000), and the annual track “Information Access and Retrieval” within the ACM Symposium on Applied Computing. She has published more than 180 papers on International Journals and Books, and on the Proceeding of International Conferences, and she is member of the Editorial Board of the several International Journals.

LECTURES: CONTEXT AND PERSONALIZATION 1 & 2

To overcome the “one size fits all” behavior of most search engines, in recent years a great deal of research has addressed the problem of defining techniques aimed at tailoring the search outcome to the user context to the aim of improving the quality of search. The main idea is to produce context-dependent and user-tailored search results. Search tasks are subjective, and often complex; the user-system interaction based on keyword-based querying and on the presentation of search results as a list of web pages ordered according to their estimated relevance is often unsatisfactory. In this lecture a short overview of the main issues related to contextual search are outlined.

WESSEL KRAAIJ, Radboud University Nijmegen

Wessel Kraaij is a senior scientist working for TNO, Delft, the Netherlands since 1995 and currently leader of the media mining group. He is also a part-time professor in ‘information filtering and aggregation’ at Radboud University Nijmegen. Since his master’s degree at the Institute for Perception Research (Philips/Tue) he has been interested in both the system’s and the human aspect of making sense from large quantities of unstructured data. After a period of working in computational linguistics, he switched to the information retrieval domain, with a focus on textual data. At TNO he has developed research projects concerning the application of data analytics in several domains (e.g. patents, social media, scientific literature, law enforcement etc) . In his PhD thesis (Twente University) he investigated various applications of statistical language modeling. He is also active in the field of multimedia retrieval (co-coordinating the NIST TRECVID multimedia retrieval benchmark). Recently he refocused his attention on the human aspect, looking at query intent and the recognition of human behaviour using a variety of sensors. Wessel has initiated the Information Summer School with Theo van der Weide in 2011. He has published over 150 research papers and has been a member of many technical program committees, such as ACM SIGIR. He has been co-chair or IIIiX 2012, SIGIR 2007 and deveral DIR workshops.

LECTURE: DERIVING ANALYTICS FROM LOG INFORMATION

Users of information services leave traces in the system and this information can be used for various purposes. We will review selected papers discussing the potential of deriving relevance information from this type of usage information. We discuss some applications of query log analysis and its privacy implications. Some recent experiments at Radboud university will be presented as

EU Intensive Programme “Information Foraging” 2011-2013 (Summer School)

About the course theme

Programme 2011

Lecturers 2012

Programme 2013

Lecture abstracts and bios

BIRGER LARSEN, Royal School of Library and Information Science

JAAP KAMPS, University of Amsterdam

IAN RUTHVEN, University of Strathclyde

PIA BORLUND, Royal School of Library and Information Science

ELAINE TOMS, University of Sheffield

SASCHA KRIEWEL, University of Duisburg-Essen

THEO VAN DER WEIDE, Radboud University Nijmegen

LEIF AZZOPARDI. University of Glasgow

MOUNIA LALMAS, Yahoo! Labs

MARIE-FRANCINE MOENS, KU Leuven

GABRIELLA PASI, University Milano Bicocca

WESSEL KRAAIJ, Radboud University Nijmegen