The 40th European Conference on Information Retrieval 2018 – Grenoble, France
At the end of March 2018 I attended the annual European Conference on Information Retrieval (now in its 40th year), at The Minatec, in the beautiful city of Grenoble, France. Information retrieval, of course, is the field which explores the crawling, analysis (and attempted understanding) and indexing of text and information (including the exploration of entities and entity-mapping), so is therefore extremely relevant to those working in the digital marketing industry.
It considers the two main sides of information retrieval. ‘Push’ (recommender systems), where the platform or application has prior knowledge of the user, or similar users, and pushes recommendations to meet users’ informational needs, and ‘Pull’ (search engines) where the user steers the course mostly alone by querying and browsing. That’s not to say search engines do not use recommender systems or push information retrieval of course, but the point is to illustrate that it was not only search engines who were the area of interest but also any system which uses these two main methods to meet informational needs of users within their systems. For example. e-commerce applications or social media platforms.
The conference is one of the main events in the information retrieval research calendar.
Others major ones to watch out for throughout the year include CHIR (pronounced ‘cheer’), WSDM (pronounced ‘wisdom’) (Web Search and Data Mining Conference), and SIGIR (The ACM’s ‘Special Interest Group on Information Retrieval’).
A more comprehensive list of some of the other related conferences can also be found here.
ECIR hosts presentations and discussions of relevant current and emerging research papers from academia and industry with specialist tracks such as deep learning, neural networks, exploration of user behaviour in social media and broad topic analysis, natural language processing, and information retrieval for news dispersal and interpretation.
Long and short papers were presented, along with keynotes and invited talks, workshops and tutorials focusing on key areas of research. Much of the researchers’ works and presentations go beyond the arena of online marketing toward exploring solutions for societal problems; such as understanding and detecting social media behaviour in the run-up to violence surrounding elections, or identifying ways to utilise information retrieval and deep learning in health research.
Topics overall could broadly be classified as such:
- Topic modelling
- Information retrieval in the medical and health space
- Search engine results evaluation
- Social media text mining
- Information retrieval for news
- The analysis of broad, dynamic topics in social media
- Search engine user behaviour analysis
- Recommender systems (RecSys)
- Social aspects and personalised search (SOAPS)
- Neural networks for information retrieval (NN4IR)
- Deep learning
Given these workshops and talks are delivered by PhD researchers furthering the body of knowledge, leading academics and industry pioneers in this field we can get some idea of the direction of trends, current knowledge, and understand more of the still ‘open’ problems. We know there are still many challenging issues in the world of search with natural language understanding still presenting a difficult problem for instance. However, the research papers illustrate some of the steps in place toward a greater understanding in the field.
Some of the presenting researchers provided slides, and several papers are available to view for either indefinite periods or for a full month following the conference. These papers might otherwise be locked behind academic paywalls (and some will be, once the month is out).
A major focus was around intent-understanding, and how the most relevant result might be retrieved quickly, and in the right context-led ‘moment’ (contextual-search) for the users. Therefore, it is worth taking up the opportunity to read some of the papers and gain more understanding of where the buck is headed to next as search engines and IR researchers alike seek to increase understanding of semantics, and interpretation of user interaction with results; and their informational needs.
There is certainly way too much to go through in any great depth in this blog post but I have added further resources throughout so you can continue your own exploration into this intriguing area. This is more of an overview with a little more focus in some areas where I either attended the talk or had the opportunity of speaking with the researcher directly to get more information on their work. It is certainly not comprehensive by any means. Hopefully the many links to further reading materials will provide a bridge for future learning.
Note: this post contains many summaries of paper extracts. For a complete understanding, it’s recommended to read the full extract and paper.
Co-located workshops and tutorials ahead of the main ECIR 2018 conference
Ahead of the main conference there were several workshops and tutorials covering specific areas of interest.
These included:
BroDyn 2018 (Broad Dynamic Topics over Social Media)
BroDyn 2018
This workshop considered topics in social-media attracting long-standing user interest as opposed to quickly emerging, then disappearing, short-term interest topics. These broad, dynamic topics might include such social media conversations around “Brexit”, “Syrian crisis” or “North Korea”. Topics which may be of interest to social media users for months, or even years, but also may include topics lasting for only a few weeks such as “Hurricane Irma”. The study of sub-topics and conversations which emerge within these major broad-topics is also covered in this area of research.
The BroDyn workshop is explained in more detailed overview here.
The workshop was made up of a keynote and several papers explained by the researchers behind them.
BroDyn Keynote
A keynote was delivered by Professor Michalis Vazirgiannis
Graph-Based Event Detection in Streams: The Twitter Case (Professor Michalis Vazirgiannis)
Michalis Vazirgiannis is giving the keynote speech at #brodyn2018 workshop @ecir2018 pic.twitter.com/2UufQ6Cw3E
— BroDyn 2018 Workshop (@BroDyn2018) March 26, 2018
Full Paper: http://ceur-ws.org/Vol-2078/keynote.pdf
Paper Abstract Overview: Exploration of solutions to dissect and map the often real-time nature of tweets around real-world major events such as natural disasters, political campaigns, sporting events and terrorist attacks. The work presented looks at modelling a stream of tweets related to the event as an evolving graph of words and then identifying the major events as their evolutionary patterns emerge. Identifying these important moments is achieved via detection of rapid graph changes. The events are then summarised via the extraction of a few tweets solely from Twitter, describing the chain of events. The researchers aimed to illustrate their proposed system was able to also capture sub-events and outperforms current dominant sub-event detection methods.
The full keynote for BroDyn is available to read here.
BroDyn Workshop 2018
‘Real-time collection of reliable and representative tweets datasets related to news events‘ (B´eatrice Mazoyer1, Julia Cag´e2, C´eline Hudelot3, and Marie-Luce Viaud1, 2018)
Collecting reliable tweet datasets#brodyn2018 #ECIR2018 pic.twitter.com/FnM8Mh6wCF
— BroDyn 2018 Workshop (@BroDyn2018) March 26, 2018
Full Paper: http://ceur-ws.org/Vol-2078/paper2.pdf
Paper Abstract Summary: This paper looks to extract both the tweets from Twitter around news events whilst also extracting information from the more traditional journalistic news reporting to gain a dual-sided view from both Twitter’s social media users (via the Twitter API) and the reported news simultaneously.
‘Contradiction in Reviews: Is it strong or low?‘ (Ismail Badache, S´ebastien Fournier, and Adrian-Gabriel Chifu, 2018, Marseille University France)
Full Paper: http://ceur-ws.org/Vol-2078/paper1.pdf
Paper Abstract Summary: Aims to detect and measure the strength of polarities (opposing opinions) in contradictory user reviews in text online.
Analysis of contradictions in reviews #brodyn2018 #ECIR2018 pic.twitter.com/8j80VLeJEi
— BroDyn 2018 Workshop (@BroDyn2018) March 26, 2018
Social Media Based Analysis of Refugees in Turkey (Abdullah Bulbul, Cagri Kaplan, and Salah Haj Ismail, 2018)
Full paper: http://ceur-ws.org/Vol-2078/paper3.pdf
Paper by research team at Ankara University, Turkey for ‘Social Media Based Analysis of Refugees in Turkey’ is here. (Abdullah Bulbul, Cagri Kaplan, and Salah Haj Ismail, 2018)
Paper Abstract Summary: “A method is proposed to identify refugee’s public facing social media accounts with a view to understanding their needs for potential future solution planning, and tracing back events. The paper aims to gain understanding which might otherwise not be available due to refugee’s fears in recalling or expressing experiences and needs during inquests or interviews. This first paper initially looks at discussion of the retrieval method and ways to analyse the data and a discussion around future uses and solutions.” (Abdullah Bulbul, Cagri Kaplan, and Salah Haj Ismail, 2018)
Analysis of refugees in Turkey …#brodyn2018 #ECIR2018 pic.twitter.com/CMqdxzIyFO
— BroDyn 2018 Workshop (@BroDyn2018) March 26, 2018
Two datasets were also released by the BroDyn 2018 workshop for those looking to carry out research with some social media broad topics. The two topic datasets covered the UK election and the US election.
The two datasets (one for the UK election and one for the US election) can be downloaded here
Along #BroDyn2018, we released 2 datasets (+ manual judgments for one) to help in your research targeting retrieval from social media for topics that are broad (i.e., cover many sub-topics) and dynamically changing. The datasets are further described here: https://t.co/eWM3Q8dOa4
— BroDyn 2018 Workshop (@BroDyn2018) December 12, 2017
The BroDyn 2018 Workshop papers are all published online here.
You can also download the full workshop proceedings here.
NewsIR 2018
Another workshop covered the vertical of news information retrieval and again comprised of a keynote and invited talks, along with paper presentations.
NewsIR Keynote & Invited Talks
Fantastic start for the #NewsIR18 workshop with a full house for the keynote by @edgarmeij about credibility and automatic journalism #ECIR2018 pic.twitter.com/kLSDrGpkr9
— Miguel Martinez (@miguelmalvarez) March 26, 2018
NewsIR 2018 Keynote
The keynote for NewsIR was around AI and automated news and the implications this held for issues around trust, bias and credibility. The keynote was delivered by Edgar Meij of Bloomberg.
Full Paper: http://ceur-ws.org/Vol-2079/intro1.pdf
AI & Automated News: Implications on Trust, Bias, and Credibility (Edgar Meij, Bloomberg)
Paper Abstract Summary: Potential societal implications around (either partly or fully) automatic and algorithmically generated news and the related matters around trust and bias (as algorithms cannot be held accountable) are discussed in the context of news search and recommendations. Sentiment analysis and detection of polarity in opinion and automatic media monitoring is also explored.
AI & Automated News: Implications on Trust, Bias, and Credibility (Edgar Meij, Bloomberg)
@bpoblete in the panel on #newsIR at #ECIR2018 talking about involving journalists (human-in-the-loop) in the data and algorithmic aspects of automated news analysis and retrieval pic.twitter.com/8adIZZ4iYJ
— Denis Alejandro (@denisparra) March 26, 2018
Every tool is better than nothing”?: The use of dashboards in journalistic work (Peter Tolmie)
Paper Abstract Summary: This paper looked at the many tools which are available to journalists as dashboards in the news industry and the often ‘magpie-like’ effect from the adoption and use of these tools, whereby the tool is used once or for a while and then set aside.
NewsIR Workshop
All proceedings for the NewsIR 2018 workshop are here
A Plan for Ancillary Copyright: Original Snippets (Martin Potthast, Wei-Fan Chen, Matthias Hagen, Benno Stein)
Paper Abstract Summary: “This paper looked at a method by which search engines could potentially create unique text snippets (original snippets) from web pages for search engine results without breaching otherwise problematic copyright when the content is otherwise taken directly from the web pages themselves.”
Visualizing Polarity-based Stances of News Websites (Yoshioka, M., Allan, M.J.J. and Kando, N., 2018)
Paper Abstract Summary: This paper looks at the development of a framework which helps to identify a bias in a news website toward a particular political leaning based on whether news is published with a positive or negative stance regarding a particular topic. The framework utility was demonstrated in the paper via a case study of the recent US Presidential election.
Shaping the Information Nutrition Label (Tim Gollub, Martin Potthast, Benno Stein)
Paper Abstract Summary This paper looks at a method to simplify the nutrition labels on products so they are unambiguous.
Estimating Credibility of News Authors from their WIKI Validated Predictions (Yarrabelly, N., DSAC, I. and Karlapalem, K., 2018)
Paper Abstract Summary This paper looks at understanding and estimating the credibility of news authors based on their predictions coming true. The news events which they report on are validated with this proposed model via Wikipedia to gain a measure of the percentage of correct predictions or reported incidents being accurate.
Social Media and Information Consumption Diversity (José Devezas, Sérgio Nunes)
Paper Abstract Summary: This paper considers whether users of social media still consume a diverse range of information given their ability to personalise and create individual feeds versus random news consumers and reveals research investigating this issue.
Cross-Reading News (Shahbaz Syed, Tim Gollub, Marcel Gohsen, Nikolay Kolyada, Benno Stein, Matthias Hagen)
Paper Abstract Summary: This paper proposes an application called CrossReading News which aims to provide a means by which journalists can look to find easily related news pieces curated using formulaic information retrieval methods from a range of sources quickly and easily to get a more rounded view.
Qlusty: Quick and Dirty Generation of Event Videos from Written Media Coverage (Alberto Barr´on-Cede˜no, Giovanni Da San Martino, Yifan Zhang, Ahmed Ali, and Fahim Dalvi)
Paper Abstract Summary: This paper presents Qlusty which is a video application built with an aim of moving toward breaking the news information bubble by generating a video of news from various sources and four individual modules collated.”
Named Entity Recognition for Telugu News Articles using Naïve Bayes Classifier (SaiKiranmai Gorla Sriharshitha Velivelli N L Bhanu Murthy Aruna Malapati)
Paper Abstract Summary: Proposes to use Named Entity Recognition of ‘personal, location, organisation in sentences or documents in the Telugu language using part-of-speech (POS) tagging and classifying of textual content.
Exploring Significant Interactions in Live News (Erich Schubert, Andreas Spitz, Michael Gertz)
Paper Abstract Summary: This paper seeks to detect significant events appearing in news (live) as a result of the identification of co-occurrences of terms when compared to a background corpus (normal). The researchers visualised the resulting semantic word cloud between related terms as significant events emerge. They crawled dozens of news sites to give examples of their prototype.
Neural Content-Collaborative Filtering for News Recommendation (Dhruv Khattar, Vaibhav Kumar∗, Manish Gupta†, Vasudeva Varma)
Paper Abstract Summary: This paper looks at utilising neural networks to provide recommendations for user news reading combining past user interactions and past content preferences. The paper claims to beat state of the art recommender systems for user news recommendations.
On Temporally Sensitive Word Embeddings for News Information Retrieval (Yoon, T.W., Myaeng, S.H., Woo, H.W., Lee, S.W. and Kim, S.B., 2018)
Paper Abstract Summary: This paper argues, explores and considers the word embeddings in news information retrieval and claims that co-occurrence vectors in news information retrieval are different and change when compared with other types of word embedding scenarios. Research was carried out from data provided by Naver and the researchers claim findings were that word embeddings need to be expanded and built for news IR.
SoAPS 2018 – (Social Aspects in Personalization and Search)
Social Aspects in Personalization and Search explores the emerging areas around user-influenced recommender systems – for example, reviews and ratings, and influencing of opinion via social media with likes and shares. This aspect of IR also looks at the ways in which users interact with social media and recommendations. There are some good overviews on how recommender systems have evolved in crowded marketplaces and search.
SoAPS Keynote
Denis Parra explored some of the social aspects in recommender systems in his SoAPS keynote and provided his slides via Slideshare below.
Listening to @denisparra's keynote at the #soaps2018 workshop in #ECIR2018 pic.twitter.com/wcqpuzq6PY
— Bárbara P (@bpoblete) March 26, 2018
Slides of my invited talk yesterday about “Social Aspects of Interactive Recommender Systems” are available online https://t.co/19v1zQuZxv #SoAPS workshop at #ECIR2018
— Denis Alejandro (@denisparra) March 27, 2018
All the slides can be viewed here:
Here are some more of the slides from a presentation which looked at time-aware evaluations, implicit feedback from users and the measurement of freshness in results evaluation:
Very interesting talks at the RecSys session in #ECIR2018 : Session-based RecSys, alternative symmetrical losses for implicit feedback, time-aware evaluations for measuring freshness pic.twitter.com/vVUS1ZPn06
— Denis Alejandro (@denisparra) March 28, 2018
Lorena Recalde shared her work undertaken with Ricardo Baeza-Yates on the different types of content which users on Twitter are prone to tweet.
Lorena Recalde presenting her work with @PolarBearby on what kind of content are users prone to tweet #soaps2018 #ECIR2018 https://t.co/k7gn158RJS pic.twitter.com/f2nG0YAWZR
— Denis Alejandro (@denisparra) March 26, 2018
Venue Suggestion Using Social Centric Scores (Aliannejadi, M. and Crestani, F., 2018.)
Professor Fabio Crestani presented work on using past visited locations by users to recommend future venues. A set of relevance scores was presented based on gathered data from the location and venue preferences of users.
Starting #ecir2018 workshops day. Here Prof. Fabio Crestani presenting about venue recommendations using social-centric scores pic.twitter.com/NLhOZmkE90
— Denis Alejandro (@denisparra) March 26, 2018
Text2Story 2018
The Text2Story workshop looks at mapping text in corpora to events in order to map storylines and understand the intent and needs behind queries.
The workshop aimed to explore ways of understanding emerging stories and event timelines, and the mapping of these to text and semantics as well as understanding sentiment and dual-sided arguments. This area is complicated much further because data is received from many sources almost simultaneously. Fact and credibility of authors and sources of storylines in text bodies is considered an important part of this area, as well as the personalisation of events and stories and the recommendation of information based upon personalisation.
Here are some of the areas explored in this workshop:
- Event Identification
- Narrative Representation Language
- Sentiment and Opinion Detection
- Argumentation Mining
- Narrative Summarization
- Storyline Visualization
- Temporal Aspects of Storylines
- Evaluation Methodologies for Narrative Extraction
- Big data applied to Narrative Extraction
- Resources and Dataset showcase
- Personalization and Recommendation
- User Profiling and User Behavior Modeling
- Credibility
- Fact Checking
- Bots Influence
All the papers presented from this workshop can be found here.
Text2Story Keynote
Users2Story – On the Importance of Understanding Searchers’ Information Needs (Udo Kruschwitz, University of Essex)
Paper Abstract Summary: This keynote paper looked at the challenging problem of trying to understand the intent and needs behind queries in both web search and professional search and emphasised the importance of understanding searchers’ information needs.
Word embeddings, information retrieval and textual entailment (Eric Gaussier, University of Grenoble)
Paper Abstract Summary: This keynote paper and talk looked to review current types of word embeddings popularly used in natural language processing and information retrieval and discussed the potential of extending word embeddings with further syntactic information and explored whether this had an improvement for information retrieval.
Listening to Keynote by @egaussier on “Word embeddings, information retrieval and textual entailment” at #ecir2018 @cosmographers pic.twitter.com/LAGoVm969u
— Nihal Hussain (@nihalhussain) March 26, 2018
IREvent2Story: A Novel Mediation Ontology and Narrative Generation (Kattagoni, V. and Singh, N., 2018)
Paper Abstract Summary: This paper looks at a means of detecting and classifying events using an ontology and narrative entity mapping process, along with identification of international actors in the events.
Gossip is more than just story telling Topic modeling and quantitative analysis on a spontaneous speech corpus (Pápay, B., Kubik, B.G., Cleverbridge, A.G. and Galántai, J.)
Paper Abstract Summary: This paper aims to identify gossip taking place in bodies of text corpora as well as identifying the number of participants and the type of topics discussed and sentiment displayed. It also explores how gossip evolves.”
Job Recommendation based on Job Seeker Skills: An Empirical Study (Valverde-Rebaza, J., Puma, R., Bustios, P. and Silva, N.C., 2018)
Paper Abstract Summary: This paper proposes an improved framework application for job recommendations based on the job seeker skills.
Neural Networks for Information Retrieval Tutorial 2018
This was a full workshop / tutorial which looked at many aspects around current deep-learning practices. including some industry insights and semantic matching using co-occurrence vectors.
A learning to rank tutorial was presented by Bhaskar Mitra of Microsoft. Entities were explored by Tom Kenter and Christophe Van Gysel of the University of Amsterdam.
Maarten De Rijke presented work analysing various user click models in search which he had developed alongside other researchers (Ilya Markov, University of Amsterdam) and Alexander Chuklin (Google Switzerland and University of Amsterdam)
You can find the slides from not only this tutorial at European Conference on Information Retrieval 2018 (ECIR) but also slides from the Neural Networks for Information Retrieval tutorials and workshops at Web Search and Data Mining Conference 2018 (WSDM), and SIGIR (Special Interest Group on Information Retrieval) 2017 at the NN4IR website. You can also find the link to the Click Models book further down in this post.
There was a huge amount of information presented and the session was very well attended, as you can imagine.
@UnderdogGeek on neural learning to rank #ECIR2018 #NN4IR pic.twitter.com/WCfVml5CkJ
— Tom Kenter (@TomKenter) March 26, 2018
Direct links to each of the sectional slides and tutorials from ECIR 2018 tutorial are below and I would recommend going through ALL of these more than once. There are some great learnings here around research into click models, and several 2Vec areas explored, including Word2Vec, User2Vec, Prod2Vec and more. Industry insights are also provided by Bhaskar Mitra and relate primarily to Bing but we can presume to some extent these cover some ‘industry favourites’:
- Semantic matching tutorial – presented by Christophe Van Gysel – research fellow at University of Amsterdam
- Learning to rank tutorial – presented by Bhaskar Mitra – Principle applied scientist at Microsoft
- Entities – presented by Tom Kenter and Christophe Van Gysel – University of Amsterdam
- Modelling user behaviour – presented by Maarten de Rijke – University of Amsterdam
- Click models for web search
- Generating responses – presented by Tom Kenter – University of Amsterdam
- Recommender systems – presented by Maarten de Rijke – University of Amsterdam
- Industry insights – presented by Tom Kenter and Bhaskar Mitra – University of Amsterdam and Microsoft
‘Neural Networks For Information Retrieval’ tutorial all slides are available here.
Extreme Multi-label Classification for Large-scale Text Mining Tutorial (XMLC-LSTC)
I did not attend this workshop but the tutorial page and slides are available below.
Main Conference
With regards to the main conference, papers on a wide range of IR topics were presented and only 23% of those submitted were accepted. Papers on word embeddings were accepted most as a percentage of papers submitted, indicating a ‘hot topic’ nature. Neural network was also very well received.
The main conference papers covered the following topics:
- Neural network
- Word embedding
- Recommender system
- Collaborative filtering
- Computational linguistics
- Web search
- Natural language processing
- News articles
- Search tasks
- Evaluation metrics
- Query terms
- Sentiment analysis
- Social medium
- Deep learning
- Topic models
- Knowledge base
- Deep neural network
- User study
- Retrieval performance
- Learning to rank
If you want(ed) to maximize your chances of getting your paper into #ECIR2018, work on word embeddings and #NLProc (yellow accepted, red rejected) pic.twitter.com/5dGzjpf2xl
— Matthias Gallé (@mgalle) March 27, 2018
There were, of course, many papers presented over the course of the three conference days and I have listed many of them toward the end of the document. I’ve focused on providing a bit more of an overview on a few which I was present for and which were of particular interest to me, but that is not to detract from the impressive work of all of the presenters. As previously mentioned, I also had opportunity to speak to some of the researchers about their work and I have provided more detail here on those.
Authorship Verification in the Absence of Explicit Features and Thresholds (Oren Halvani, Lukas Graner and Inna Vogel, 2018)
One interesting paper in particular was by Oren Halvani from Fraunhofer Institute for Secure Information Technology SIT
Oren has been in the cyber security and fraud-detection space for a number of years.
His paper looks at the important area around author credibility and authenticity-checking using deep-learning by recognising the unique writing style of one author over another.
Oren’s paper, is entitled ‘Authorship Verification in the Absence of Explicit Features and Thresholds’ (Halvani, 2018). Oren explained Authorship verification (AV) technique can be used to link authors to papers across different platforms and help to authors who generate hoaxes and deliberate misinformation in news. The method appears to not rely on the same levels of training data other systems need in order to determine when two papers or more are produced by the same author. Oren’s team tested their results across a range of different text corpora (including recipes) and found their results were competitive against current state of the art noteworthy authorship verification baselines. Oren’s algorithm is also a very lightweight code (only around 8 lines of code).
Information Scent, Searching and Stopping: Modelling SERP Level Stopping Behaviour (David Maxwell and Leif Azzopardi, 2018)
Another interesting paper was David Maxwell and Leif Azzopardi’s paper on search engine user behaviour. Presented on colourful animated slides. The paper looks at user behaviour based on the ‘information foraging theory’.
David Maxwell proposed current user models don’t consider people skipping search engine results, and this should be taken into account. It explored the ‘stopping during search’ behaviour. It’s recommended the slides are explored in more detail:
@maxwelld90 explains how current user models that underpin our measures don't consider that people can skip SERPS – and that this should be taken into account – especially if we are developing sessions based measures! @ecir2018 #ecir2018
— Leif Azzopardi (@leifos) March 27, 2018
Thanks for all the nice comments about my #ECIR2018 talk w/@leifos. Here is a copy of the slides: https://t.co/3lLgkeDSRr — paper also available at https://t.co/t8oSEmrXMM ??? pic.twitter.com/6hb4XTvF0o
— David Maxwell ??????????? (@maxwelld90) March 27, 2018
This full paper looked at the different scent following behaviours of naive versus more experienced searchers and the scent paths these users take. David Maxwell suggested a revision of the Complex Searcher Simulation User model.
@maxwelld90 suggests a revision of the Complex Searcher Simulation User Model – which explicitly models a SERP stopping component – where a user can decide whether to inspect the page or not. @ecir2018 #ecir2018 pic.twitter.com/dIXhHf2y5G
— Leif Azzopardi (@leifos) March 27, 2018
David Maxwell and Leif Azzopardi also have developed an interactive information retrieval framework for simulation so you can run experiments for yourself.
That can be accessed here on Github.
@maxwelld90 and @leifos have written an interactive information retrieval framework for simulation https://t.co/s75f3sZzQu which you can use to reproduce our experiments or even better create and run your own!! @ecir2018 #ecir2018
— Leif Azzopardi (@leifos) March 27, 2018
A Keynote on one of the conference days was given by Radim Rehurek, founder of popular open source Gensim Python Library. He shared his experiences, realities, challenges and learnings taken from founding, and leading the building of the Open Source application and talked about the reasons why it is still relevant as a platform in 2018.
Keynote from @RadimRehurek about open source, research and industrial impact has just started. You cannot miss this! #ecir2018 pic.twitter.com/OAHzs43vO7
— Miguel Martinez (@miguelmalvarez) March 29, 2018
Ready to watch @RadimRehurek keynote at #ecir2018 pic.twitter.com/WenOgIWR41
— Denis Alejandro (@denisparra) March 29, 2018
Listening to a keynote by @RadimRehurek the creator of the Python Gensim OS Software at #ecir2018 pic.twitter.com/jFRDVmJbb6
— Iadh Ounis (@iadh) March 29, 2018
Academic impact is based on citations, real impact is based on demos. @RadimRehurek #ECIR2018 pic.twitter.com/GMkGnf40EA
— Gabriella Kazai (@gkazai) March 29, 2018
How you think #opensource works vs. how it actually works – by @RadimRehurek #ECIR2018 pic.twitter.com/0IahDV4vfv
— Bhaskar Mitra (@UnderdogGeek) March 29, 2018
Fabrizio Silvestri, Software Engineer at Facebook also spoke on Industry Day of the nature of problem driven research on very very big machines using Search2Vec.
Industry Day first talk by @fabreetseo Problem driven research on very very very big machines. #ECIR2018 pic.twitter.com/oApVm7iCIM
— Gabriella Kazai (@gkazai) March 29, 2018
Web2Text: Deep Structured Boilerplate Removal (Thijs Vogels, Octavian Eugen Ganea and Carsten Eickhoff, 2018)
Another interesting paper was that delivered by Thijs Vogels who looked at a method for the removal of boilerplate areas in web page text.
Paper Abstract Summary: This paper looks at improving ways to detect and remove boilerplate text from web pages such as header, footer, advertisements by classifying html blocks as either main content or boilerplate. – Full Paper: https://arxiv.org/pdf/1801.02607.pdf
Local is Good: A Fast Citation Recommendation Approach (Haofeng Jia and Erik Saule, 2018)
Another researcher presenting was Erik Saule who presented an alternative academic paper search engine to Google Scholar.
I spoke with Erik Saule, researcher from UNC Charlotte, US who is behind ‘Local is Good: A Fast Citation Recommendation Approach’, and here is what Erik had to say of his paper:
“We developed several years ago a paper search engine (for academic research papers), called ‘TheAdvisor’ as a starting point for our research. It was designed to order and retrieve academic papers. It was based on seed-lists based on papers which the searcher already knows of, rather than ‘keyword driven’, like the more traditional search engines, and like Google Scholar. The search engine works by performing a ‘random walk’ (modelled on how users are likely to traverse the search engine) process on the citation network of papers. Edges are citations and reference and vertices are papers.”
This search engine is targeted at academics and researchers who it is presumed have some knowledge of papers in their space in the first place. The intended use case of this system is to speed up focused research in a particular direction to exclude irrelevant and unconnected but to ensure anything which is relevant (and connected via citations or references is not missed).
The target users usually do three types of things:
- When they have a paper in hand, either they will read one of the references of the paper (in the reference section on the paper)
- They will read a citation of the paper
- They will go back to a paper they already know
Our paper at ECIR2018 is about improving upon ‘TheAdvisor’ from a speed of retrieval perspective, without compromising the quality of the results. This is because previously the existing algorithm was a bit slow and we wanted to make it near real-time.
Now the search engine is still using random walks but instead of retrieving everything it works on a pruned citation graph only presenting papers which are connected to the papers the user already knows.
The way in which the speed is improved is by preventing the machine from considering any paper that does not have a direct connection to the paper you already you know. Essentially a recommender system based on papers already known to the user. Therefore pruning the considered search set of papers and discounting irrelevant edges.
This enabled a dramatic increase in speed of retrieval which enabled real-time queries. The pruning took speed down from circa 2.5 seconds to 0.2 seconds. Around a x 15 increased improvement in retrieval time.
Erik and his team plan to bring the system back online with the new real-time retrieval capabilities and wants to not only work on retrieving academic papers but also in organising the papers into some type of relevance clusters rather than lists of titles. They are planning to return around 100 papers but these will be organised in such a way as to avoid information overload issues by clustering and also so that the researcher can look at different directions within the topical area of study they could take, or different approaches given the ordering of in-topic themes.
Erik is also hoping the system will become used by three types of people:
- Researchers who are trying to discover new papers
- Editors looking for experts to review papers
- Academic paper reviewers who wish to do a cursory check to ensure they have not missed any crucial or important existing work which they should consider in their paper review process
The full paper is available here: https://link.springer.com/chapter/10.1007/978-3-319-76941-7_73
There were many more papers delivered and these are listed further in the post. All in all, a huge amount of learnings and it was clear deep-learning and semantic understanding using co-occurrence and word embeddings as well as the verification of facts and authenticity in news was a strong focus. This was evident in the many papers in the news IR section looking to cross-reference and gain valuable double-sided or multi-sided perspectives and in the social media areas explored.
I also met with researchers from the University of Glasgow, who were presenting a range of talks and whose work is behind an interesting project around the use information retrieval and datasets to explain and mitigate Election Violence. As part of their work they mapped tweets from one language to another via state of the art convolutional neural networks (CNN) without having to build additional training datasets. Their work aims to be able to build a view of what the build up to election violence may look like in social media with a view to assisting with mitigation in the future.
Their work can be explored further here: http://www.electoralviolenceproject.com
Some of their papers presented at ECIR are:
On Refining Twitter Lists as Ground-Truth Data for Multi-Community User Classification (Ting Su, Anjie Fang, Richard McCreadie, Craig Macdonald and Iadh Ounis, 2018) – Social Media, Deep Learning, Natural Language Processing
Next in the #ecir2018 poster session, come and speak to @ting_s_ and @graham_mcdonald talking about Twitter Multi-Community Classification and Sensitivity Review, resp. pic.twitter.com/Ocv8VeqVxL
— TerrierTeam (@TerrierTeam) March 27, 2018
On the Reproducibility and Generalisation of the Linear Transformation of Word Embeddings (Xiao Yang, Iadh Ounis, Richard Mccreadie, Craig Macdonald and Anjie Fang, 2018) – Natural Language Processing
Now at #ecir2018 @XYangChef presenting on the reproducibility of the linear transformation of word embeddings pic.twitter.com/A6psjYlQPM
— graham mcdonald (@graham_mcdonald) March 28, 2018
Active Learning Strategies for Technology Assisted Sensitivity Review (Graham Mcdonald, Craig Macdonald and Iadh Ounis, 2018)
Active learning for sensitive review by @graham_mcdonald #ecir2018 pic.twitter.com/OAmYyDd2Ul
— Anjie Fang (@anjiefang) March 29, 2018
Dinner Above The City of Grenoble
It was not all formal paper presentations however. There was time for some socialising and networking with a city tour of Grenoble (fun fact: Grenoble has 60,000 students in residence from an overall population of 160,000), and dinner high up on the mountain – accessed via this cable car.
Not great for me due to my fear of heights and even escalators. I closed my eyes throughout the ascent and it was well worth the trip as a lovely dinner with the most interesting of company was the reward.
In case you wonder what the #ECIR2018 logo means. pic.twitter.com/xEefLMkvnT
— Ingo Frommholz (@iFromm) March 25, 2018
Here are a few of the tweets from the conference attendees at the dinner above the mountains.
A nice conference dinner at #ecir2018 in Grenoble with @bpoblete @abellogin @bpiwowar and lots of other amis ? pic.twitter.com/CXHhPaN0Bx
— Denis Alejandro (@denisparra) March 28, 2018
2019 ECIR
The location for next year’s ECIR was announced and is confirmed as Cologne, Germany in April 2019. I hope to attend this as well as CHIR which will be in Glasgow next year and SIGIR which will be in Paris.
.@ecir2019 https://t.co/8PhhmfcL8I is productive now. Intro slides https://t.co/teKd8YJ8Yp thanks to @ecir2018 #ecir2018 for a wonderful conference in Grenoble @lorrainegrt
— Philipp Mayr (@Philipp_Mayr) March 29, 2018
Other Presented Papers & Resources ECIR 2018
Here are some of the many papers and images shared via social media (I have added some labels to give some guidance as to the topical natural of the papers for ease when perusing:
Deep Learning for Detecting Cyberbullying Across Multiple Social Media Platforms (Agrawal & Awekar, 2017) – Deep Learning & Social Media
Paper Abstract – “This paper looked at a method for overcoming data bottlenecks in identifying cyberbullying using deep learning across multiple social media platforms simultaneously.” (Agrawal & Awekar, 2017)
To Cite, or Not to Cite? Detecting Citation Contexts in Text (Färber, M., Thiemann, A. and Jatowt, A., 2018) – Natural Language Processing, Deep Learning
Affective Neural Response Generation (Nabiha Asghar, Pascal Poupart, Jesse Hoey, Xin Jiang and Lili Mou, 2018) – Deep Learning, Natural Language Processing
Attention-based Neural Text Segmentation (Pinkesh Badjatiya, Litton J Kurisinkel, Manish Gupta and Vasudeva Varma, 2018) – Deep Learning, Natural Language Processing, User Behaviour
Predicting Topics in Scholarly Papers (Seyed Ali Bahrainian, Ida Mele and Fabio Crestani, 2018) – Deep Learning, Natural Language Processing
Now, @ABH878 is presenting his paper entitled “Predicting Topics in Scholary Papers” #ECIR2018 #ecir2018 pic.twitter.com/ZHpDRBur2s
— Fattane Zarrinkalam (@FattaneZ) March 27, 2018
Cross-lingual Document Retrieval using Regularized Wasserstein Distance (Georgios Balikas, Charlotte Laclau, Ievgen Redko and Massih-Reza Amini, 2018)
Learning to Leverage Microblog for QA Retrieval (Jose Miguel Herrera, Barbara Poblete and Denis Parra, 2018) – Natural Language Processing, Deep Learning
#ECIR2018 José Herrera presenting our QA study using microblogs! (Joint work with @denisparra) @ciwschile @Postgrado_FCFM @dccuchile pic.twitter.com/YKGjsBrv1O
— Bárbara P (@bpoblete) March 29, 2018
Employing Document Embeddings to Solve the “New Catalog” Problem in User Targeting, and Provide Explanations to the Users (Ludovico Boratto, Salvatore Carta, Gianni Fenu and Luca Piras, 2018) – Deep Learning, Natural Language Processing
Employing Document Embeddings to Solve the “New Catalog” Problem in User Targeting, and Provide Explanations to the Users https://t.co/iEiUoQPMjZ #ECIR2018 long title but great paper @iFromm
— Philipp Mayr (@Philipp_Mayr) March 28, 2018
And to finish the RecSys session, @ludovicoboratto presents a proposal to deal with the "new catalog" problem, introducing also explanations #ecir2018 pic.twitter.com/3oqS0l4bFn
— Denis Alejandro (@denisparra) March 28, 2018
Spatial Statistics of Term Co-occurrences for Location Prediction of Tweets (Özer Özdikiş, Heri Ramampiaro and Kjetil Nørvåg, 2018) – Natural Language Processing, Social Media, Deep Learning
@oozdikis showed the importance of co-occurring terms for location prediction of tweets #ECIR2018 pic.twitter.com/qt3dGQeJY2
— Darío Garigliotti (@DGarigliotti) March 29, 2018
Towards Maximising Openness in Digital Sensitivity Review using Reviewing Time Predictions (Graham Mcdonald, Craig Macdonald and Iadh Ounis, 2018)
Inverted List Caching for Topical Index Shards (Zhuyun Dai and Jamie Callan, 2018) – Topic Modelling
Topical Stance Detection for Twitter: A Two-Phase LSTM Model Using Attention (Kuntal Dey, Ritvik Shrivastava and Saroj Kaushik, 2018) – Social Media
Generating High-Quality Query Suggestion Candidates for Task-Based Search (Heng Ding, Shuo Zhang, Darío Garigliotti and Krisztian Balog, 2018) – Query Formulation
A Comparative Study of Native and Non-Native Information Seeking Behaviours (David Brazier and Morgan Harvey, 2018) – User Behaviour
Indiscriminateness in representation spaces of terms and documents (Vincent Claveau, 2018)
A Hybrid Embedding Approach to Noisy Answer Passage Retrieval (Daniel Cohen and W. Bruce Croft, 2018) – Deep Learning, Natural Language Processing
A Neural Passage Model for Ad-hoc Document Retrieval (Ai, Q., O’Connor, B. and Croft, W.B., 2018) – Deep Learning, Natural Language Processing
A Text Feature Based Automatic Keyword Extraction Method for Single Documents (Campos, R., Mangaravite, V., Pasquali, A., Jorge, A.M., Nunes, C. and Jatowt, A., 2018) – Text Mining & Information Extraction
Concept Embedding for Information Retrieval (Abdulahhad, K., 2018) – Deep Learning, Natural Language Processing
Inverted List Caching for Topical Index Shards (Dai, Z. and Callan, J., 2018) – Topic Modelling
Topical Stance Detection for Twitter: A Two-Phase LSTM Model Using Attention (Dey, K., Shrivastava, R. and Kaushik, S) – Topic Modelling, Deep Learning and Social Media
Generating High-Quality Query Suggestion Candidates for Task-Based Search (Ding, H., Zhang, S., Garigliotti, D. and Balog, K., 2018) – Task-Based Search, User-Behaviour & User-modelling
Stopword Detection for Streaming Content (Fani, H., Bashari, M., Zarrinkalam, F., Bagheri, E. and Al-Obeidat, F., 2018) – Social Media, Natural Language Processing, Deep Learning
Topic Lifecycle on Social Networks: Analyzing the Effects of Semantic Continuity and Social Communities (Kuntal Dey, Saroj Kaushik, Kritika Garg and Ritvik Shrivastava, 2018) – Social Media, Natural Language Processing
Reproducing a Neural Question Answering Architecture applied to the SQuAD Benchmark Dataset: Challenges and Lessons Learned (Alexander Dür, Andreas Rauber and Peter Filzmoser, 2018) – Deep Learning, Natural Language Processing
Modelling Randomness in Relevance Judgments and Evaluation Measures (Marco Ferrante, Nicola Ferro and Silvia Pontarollo, 2018) – Results Evaluation, Topic Modelling
Explicit Modelling of the Implicit Short Term User Preferences for Music Recommendation (Kartik Gupta, Noveen Sachdeva and Vikram Pudi, 2018) – User Behaviour, Recommender Systems, Personalization
Multi-Task Learning for Extraction of Adverse Drug Reaction Mentions from Tweets (Shashank Gupta, Manish Gupta, Vasudeva Varma, Sachin Pawar, Nitin Ramrakhiyani and Girish Keshav Palshikar, 2018) – Health IR, Social Media, Natural Language Processing, Deep Learning
Efficient Context-Aware K-Nearest Neighbor Search (Mostafa Haghir Chehreghani and Morteza Haghir Chehreghani, 2018) – Deep Learning, Natural Language Processing
Stopword Detection for Streaming Content (Hossein Fani, Masoud Bashari, Fattane Zarrinkalam, Ebrahim Bagheri and Feras Al-Obeidat, 2018)
To Cite, or Not to Cite? Detecting Citation Contexts in Text (Michael Färber, Alexander Thiemann and Adam Jatowt, 2018)
Biomedical Question Answering via Weighted Neural Network Passage Retrieval (Ferenc Galkó and Carsten Eickhoff, 2018) – Health IR, Neural Networks
Towards an Understanding of Entity-Oriented Search Intents (Dario Garigliotti and Krisztian Balog, 2018) – Entities, User Intent
Proposing Contextually Relevant Quotes for Images (Shivali Goel, Rishi Madhok and Shweta Garg, 2018) – Image Search
Co-training for Extraction of Adverse Drug Reaction Mentions from Tweets (Shashank Gupta, Manish Gupta, Vasudeva Varma, Sachin Pawar, Nitin Ramrakhiyani and Girish Keshav Palshikar, 2018) – Social Media, Health IR
Neural Multi-Step Reasoning for Question Answering on Semi-Structured Tables (Till Haug, Octavian-Eugen Ganea and Paulina Grnarova, 2018) – Neural Networks
Medical Forum Question Classification Using Deep Learning (Raksha Jalan, Manish Gupta and Vasudeva Varma, 2018) – Health IR, Deep Learning, Natural Language Processing
Choices in Knowledge-Base Retrieval for Consumer Health Search (Jimmy Jimmy, Guido Zuccon and Bevan Koopman, 2018) – Health IR, Entities, Knowledge Graph
Investigating Result Usefulness in Mobile Search (Jiazin Mao, Yiqun Liu, Noriko Kando, Cheng Luo, Min Zhang and Shaoping Ma, 2018) – Mobile Search, Results Evaluation
Bringing Back Structure to Free Text Email Conversations with Recurrent Neural Networks (Tim Repke and Ralf Krestel, 2018) – Deep Learning, Natural Language Processing, Email Text Analysis
An Optimization Approach for Sub-event Detection and Summarization in Twitter (Giannis Nikolentzos, Christos Ksipolopoulos, Polykarpos Meladianos and Michalis Vazirgiannis, 2018) – Social Media, Deep Learning, Natural Language Processing
Time-aware novelty metrics for recommender systems (Pablo Sanchez and Alejandro Bellogin, 2018) – Recommender Systems
Benefits of using Symmetric Loss in Recommender Systems (Gaurav Singh and Sandra Mitrovic. Ben, 2018) – Recommender Systems
Topic-Association Mining for User Interest Detection (Trikha, A.K., Zarrinkalam, F. and Bagheri, E., 2018) – Recommender Systems, Topic Modelling
Document Ranking Applied to Second Language Learning (Wilkens, R., Zilio, L. and Fairon, C., 2018)
Discriminative Path-based Knowledge Graph Embedding for Precise Link Prediction (Maoyuan Zhang, Qi Wang, Wukui Xu, Wei Li and Shuyuan Sun, 2018) – Entities, Deep Learning
Aggregating Neural Word Embeddings for Document Representation (Ruqing Zhang, Jiafeng Guo, Yanyan Lan, Jun Xu and Xueqi Cheng, 2018) – Deep Learning, Natural Language Processing
Spherical Paragraph Model (Ruqing Zhang, Jiafeng Guo, Yanyan Lan, Jun Xu and Xueqi Cheng, 2018) – Natural Language Processing
Unsupervised Sentiment Analysis of Twitter Posts Using Density Matrix Representation (Yazhou Zhang, Dawei Song, Xiang Li and Peng Zhang, 2018) – Natural Language Processing, Social Media
Concept Embedding for Information Retrieval (Karam Abdulahhad, 2018) – Deep Learning, Natural Language Processing
A Neural Passage Model for Ad-hoc Document Retrieval (Qingyao Ai, Brendan O’Connor and W. Bruce Croft, 2018) – Deep Learning, Natural Language Processing
A Text Feature Based Automatic Keyword Extraction Method for Single Documents (Ricardo Campos, Vítor Mangaravite, Arian Pasquali, Alipio M. Jorge, Célia Nunes and Adam Jatowt, 2018)
Collection-Document Summaries (Witt, N., Granitzer, M. and Seifert, C., 2018) – Natural Language Processing