{"id":42,"date":"2013-10-24T08:04:07","date_gmt":"2013-10-24T08:04:07","guid":{"rendered":"http:\/\/sverberne.ruhosting.nl\/wordpress\/?page_id=42"},"modified":"2017-07-10T19:00:04","modified_gmt":"2017-07-10T19:00:04","slug":"data-download","status":"publish","type":"page","link":"https:\/\/liacs.leidenuniv.nl\/~verbernes\/wordpress\/research\/data-download\/","title":{"rendered":"Data download"},"content":{"rendered":"<p>The data collections that we created and published about are available for researchers in the field.<\/p>\n<h3><img decoding=\"async\" src=\"http:\/\/liacs.leidenuniv.nl\/~verbernes\/\/img\/folder.gif\" alt=\"\" align=\"top\" border=\"0\" \/>The Nijmegen 2011 query intent data set<\/h3>\n<p>(added March 2013)<\/p>\n<p>The dataset consists of:<\/p>\n<ul>\n<li>A txt-file containing 605 queries with annotations according to our multi-dimensional intent classification scheme;<\/li>\n<li>A txt-file (README.txt) containing documentation.<\/li>\n<\/ul>\n<p>You can download the data as a zipped archive <a href=\"http:\/\/liacs.leidenuniv.nl\/~verbernes\/\/download\/Query-intent-data-for-download.zip\" target=\"_blank\" rel=\"noopener\">here<\/a>. If you use the data, please refer to this paper:<\/p>\n<ul>\n<li>Suzan Verberne, Maarten van der Heijden, Max Hinne, Maya Sappelli, Saskia Koldijk, Eduard Hoenkamp and Wessel Kraaij (2013). <a href=\"http:\/\/onlinelibrary.wiley.com\/doi\/10.1002\/asi.22948\/abstract\" target=\"_blank\" rel=\"noopener\">Reliability and Validity of Query Intent Assessments<\/a>. <a href=\"http:\/\/www.asis.org\/jasist.html\" target=\"_blank\" rel=\"noopener\">Journal of the American Society for Information Science and Technology (JASIST)<\/a>, Volume 64, Issue 11, pages 2224\u20132237 (<small>Impact Factor: 2.081<\/small>)<\/li>\n<\/ul>\n<h3><img decoding=\"async\" src=\"http:\/\/liacs.leidenuniv.nl\/~verbernes\/\/img\/folder.gif\" alt=\"\" align=\"top\" border=\"0\" \/><i>Why<\/i>-questions with snippets from Bing and relevance assessments for each snippet<\/h3>\n<p>(added March 2011)<\/p>\n<p>This download set (as described in <a href=\"http:\/\/liacs.leidenuniv.nl\/~verbernes\/\/papers\/ECIR_2011_Verberne.pdf\" target=\"_blank\" rel=\"noopener\">Verberne et al., 2011<\/a>) consists of:<\/p>\n<ul>\n<li>A txt-file containing 238 questions with 10 snippets per question + relevance assessments on a 3-point scale;<\/li>\n<li>A txt-file (00about.txt) containing documentation.<\/li>\n<\/ul>\n<p>You can download the data as gzipped tar archive <a href=\"http:\/\/liacs.leidenuniv.nl\/~verbernes\/\/download\/MS-Bing-why-data.tar.gz\" target=\"_blank\" rel=\"noopener\">here<\/a><\/p>\n<h3><img decoding=\"async\" src=\"http:\/\/liacs.leidenuniv.nl\/~verbernes\/\/img\/folder.gif\" alt=\"\" align=\"top\" border=\"0\" \/><i>Why<\/i>-questions and answers with relevance labels for machine learning (learning-to-rank) purposes<\/h3>\n<p>(added 2010)<\/p>\n<p>This download set (as described in <a href=\"http:\/\/www.springerlink.com\/content\/58876l12hjx60m73\" target=\"_blank\" rel=\"noopener\">Verberne et al., 2010<\/a>) consists of:<\/p>\n<ul>\n<li>A txt-file containing 186 questions with 150 candidate answers per question + labels for your own feature extraction;<\/li>\n<li>A txt-file containing 37 feature values for 150*186 answers + labels in SVMlight format for machine learning purposes.<\/li>\n<li>A txt-file containing documentation.<\/li>\n<\/ul>\n<p>You can download the data as gzipped tar archive <a href=\"http:\/\/liacs.leidenuniv.nl\/~verbernes\/\/download\/WHY_186x150.new.tar.gz\" target=\"_blank\" rel=\"noopener\">here<\/a><\/p>\n<h3><img decoding=\"async\" src=\"http:\/\/liacs.leidenuniv.nl\/~verbernes\/\/img\/folder.gif\" alt=\"\" align=\"top\" border=\"0\" \/><i>Why<\/i>-questions from the Webclopedia collection with Wikipedia answer fragments<\/h3>\n<p>(added March 2007)<\/p>\n<p>This download set (as described in <a href=\"http:\/\/liacs.leidenuniv.nl\/~verbernes\/\/papers\/Verberne_2007_Evaluating%20discourse-based%20answer%20extraction%20for%20why-QA.pdf\" target=\"_blank\" rel=\"noopener\">Verberne et al., 2007b<\/a>) consists of:<\/p>\n<ul>\n<li>An Excel sheet with 400 randomly selected <i>why<\/i>-questions from the Webclopedia set (questions asked to the online QA system answers.com, gathered by Hovy et al.) and for each question a Wikipedia text fragment giving the answer and a pointer to the complete Wikipedia document;<\/li>\n<li>A zip-file containing all complete Wikipedia documents that is referred to<br \/>\nin the Excel sheet;<\/li>\n<li>A zip-file containing all answer fragments in context (complete paragraph and sometimes also the previous paragraph or heading), manually annotated with RST structures (Carlson et al. 2003);<\/li>\n<li>A readme file.<\/li>\n<\/ul>\n<p>You can download the data <a href=\"http:\/\/liacs.leidenuniv.nl\/~verbernes\/\/download\/webclo-wiki.zip\" target=\"_blank\" rel=\"noopener\">here<\/a><\/p>\n<h3><img decoding=\"async\" src=\"http:\/\/liacs.leidenuniv.nl\/~verbernes\/\/img\/folder.gif\" alt=\"\" align=\"top\" border=\"0\" \/><i>Why<\/i>-questions and answers formulated to RST-annotated WSJ-texts<\/h3>\n<p>(added January 2007)<\/p>\n<p>This download set (as described in <a href=\"http:\/\/liacs.leidenuniv.nl\/~verbernes\/\/papers\/Verberne_2007_Discourse-based%20answering%20of%20why-questions.pdf\" target=\"_blank\" rel=\"noopener\">Verberne et al., 2007<\/a>) consists of:<\/p>\n<ul>\n<li>Seven documents selected from the RST Treebank (Carlson et al., 2003), both the annotated and the unannotated versions, used for elicitation;<\/li>\n<li>All 372 <i>why<\/i>-questions and the corresponding answers, formulated by native speakers;<\/li>\n<li>A readme file.<\/li>\n<\/ul>\n<p>All files are plain text files.<\/p>\n<p>You can download the data <a href=\"http:\/\/liacs.leidenuniv.nl\/~verbernes\/\/download\/RSTtreebank.zip\" target=\"_blank\" rel=\"noopener\">here<\/a><\/p>\n<h3><img decoding=\"async\" src=\"http:\/\/liacs.leidenuniv.nl\/~verbernes\/\/img\/folder.gif\" alt=\"\" align=\"top\" border=\"0\" \/><i>Why<\/i>-questions and answers formulated to newspaper texts<\/h3>\n<p>(added March 2006)<\/p>\n<p>This download set (as described in <a href=\"http:\/\/liacs.leidenuniv.nl\/~verbernes\/\/papers\/Verberne_2006_DataForWhy-QA.pdf\" target=\"_blank\" rel=\"noopener\">Verberne et al., 2006<\/a>) consists of:<\/p>\n<ul>\n<li>The source documents from Reuters and Guardian news archives, used for elicitation;<\/li>\n<li>All 395 <i>why<\/i>-questions and the corresponding answers, formulated by native speakers;<\/li>\n<li>211 user-formulated paraphrases and the 166 corresponding questions;<\/li>\n<li>A readme file.<\/li>\n<\/ul>\n<p>All files are plain text files.<\/p>\n<p>You can download the data <a href=\"http:\/\/liacs.leidenuniv.nl\/~verbernes\/\/download\/newspaper.zip\" target=\"_blank\" rel=\"noopener\">here<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>The data collections that we created and published about are available for researchers in the field. The Nijmegen 2011 query&#8230;<\/p>\n","protected":false},"author":1,"featured_media":0,"parent":14,"menu_order":3,"comment_status":"closed","ping_status":"open","template":"","meta":{"footnotes":""},"class_list":["post-42","page","type-page","status-publish","hentry","post-archive"],"_links":{"self":[{"href":"https:\/\/liacs.leidenuniv.nl\/~verbernes\/wordpress\/wp-json\/wp\/v2\/pages\/42","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/liacs.leidenuniv.nl\/~verbernes\/wordpress\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/liacs.leidenuniv.nl\/~verbernes\/wordpress\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/liacs.leidenuniv.nl\/~verbernes\/wordpress\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/liacs.leidenuniv.nl\/~verbernes\/wordpress\/wp-json\/wp\/v2\/comments?post=42"}],"version-history":[{"count":16,"href":"https:\/\/liacs.leidenuniv.nl\/~verbernes\/wordpress\/wp-json\/wp\/v2\/pages\/42\/revisions"}],"predecessor-version":[{"id":625,"href":"https:\/\/liacs.leidenuniv.nl\/~verbernes\/wordpress\/wp-json\/wp\/v2\/pages\/42\/revisions\/625"}],"up":[{"embeddable":true,"href":"https:\/\/liacs.leidenuniv.nl\/~verbernes\/wordpress\/wp-json\/wp\/v2\/pages\/14"}],"wp:attachment":[{"href":"https:\/\/liacs.leidenuniv.nl\/~verbernes\/wordpress\/wp-json\/wp\/v2\/media?parent=42"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}