Portrait of Sebastian Hofstätter

Sebastian Hofstätter

Information Retrieval Retrieval Augmented Generation

I'm an engineer and researcher at Cohere teaching LLMs about relevance.

I received my PhD from TU Vienna supervised by Prof. Allan Hanbury. During my PhD I worked in the field of Information Retrieval on efficient & interpretable neural re-ranking and effective dense retrieval models. I studied and tried to optimize the cost-effectiveness tradeoff from multiple angles, so that everyone can deploy those techniques. In addition, I created an award-winning master-level course to teach neural advances in IR – all of which is open-source 🎉

Find me on Twitter, GitHub, and HuggingFace!

Research

During my PhD internship at Google Research I worked on efficient & effective retrieval augmented generation.

Retrieval Augmented Generation

2022
2023
arXiv
SIGIR
(Full)
FiD-Light: Efficient and Effective Retrieval-Augmented Text Generation
S. Hofstätter, J. Chen, K. Raman, H. Zamani
2022KRLM@ICML
(Workshop)
Multi-Task Retrieval-Augmented Text Generation with Relevance Sampling
S. Hofstätter, J. Chen, K. Raman, H. Zamani

During my PhD I worked on the topic of "Optimizing the Cost-Effectiveness Tradeoff in Neural Ranking" from multiple angles, split into two main parts:

Neural Retrieval & Knowledge Distillation

Dense retrieval (using a nearest neighbor vector search) gained quick popularity as a promising future of search – I am focused on utilizing knowledge distillation from stronger, but slower teacher models to improve the dense retrieval quality.

2022arXiv
(Full)
Are We There Yet? A Decision Framework for Replacing Term Based Retrieval with Dense Retrieval Systems
S. Hofstätter, N. Craswell, B. Mitra, H. Zamani, A. Hanbury
2022CIKM
(Full)
Introducing Neural Bag of Whole-Words with ColBERTer: Contextualized Late Interactions using Enhanced Reduction
S. Hofstätter, O. Khattab, S. Althammer, M. Sertkan, A. Hanbury
2021SIGIR
(Full)
Efficiently Teaching an Effective Dense Retriever with Balanced Topic Aware Sampling
S. Hofstätter, S.-C. Lin, J.-H. Yang, J. Lin, A. Hanbury
2020arXiv
(Full)
Improving Efficient Neural Ranking Models with Cross-Architecture Knowledge Distillation
S. Hofstätter, S. Althammer, M. Schröder, M. Sertkan, A. Hanbury
2019ECIR
(Short)
Enriching Word Embeddings for Patent Retrieval with Global Context
S. Hofstätter, N. Rekabsaz, M. Lupu, C. Eickhoff, A. Hanbury
⭐ Won best systems short paper award


Efficient & Interpretable Neural Re-Ranking

Neural re-ranking models always add time to the query latency, therefore I focus on improving the efficiency for short and long text neural re-ranking models.

2022ECIR
(Short)
Establishing Strong Baselines for TripClick Health Retrieval
S. Hofstätter, S. Althammer, M. Sertkan, A. Hanbury
2021TREC TU Wien at TREC DL and Podcast 2021: Simple Compression for Dense Retrieval
S. Hofstätter, M. Sertkan, A. Hanbury
2021SIGIR
(Full)
Intra-Document Cascading: Learning to Select Passages for Neural Document Ranking
S. Hofstätter, B. Mitra, H. Zamani, N. Craswell, A. Hanbury
2021ECIR
(Full)
Mitigating the Position Bias of Transformer Models in Passage Re-Ranking
S. Hofstätter, A. Lipani, S. Althammer, M. Zlabinger, A. Hanbury
2020TREC Evaluating Transformer-Kernel Models at TREC Deep Learning 2020
S. Hofstätter, A. Hanbury
2020SIGIR
(Short)
Local Self-Attention over Long Text for Efficient Document Retrieval
S. Hofstätter, B. Mitra, H. Zamani, N. Craswell, A. Hanbury
2020CIKM
(Short)
Learning to Re-Rank with Contextualized Stopwords
S. Hofstätter, A. Lipani, M. Zlabinger, A. Hanbury
2020CIKM
(Resource)
Fine-Grained Relevance Annotations for Multi-Task Document Ranking and Question Answering
S. Hofstätter, M. Zlabinger, M. Sertkan, M. Schröder, A. Hanbury
2020ECAI
(Full)
Interpretable & Time-Budget-Constrained Contextualization for Re-Ranking
S. Hofstätter, M. Zlabinger, A. Hanbury
2020ECIR
(Demo)
Neural-IR-Explorer: A Content-Focused Tool to Explore Neural Re-Ranking Results
S. Hofstätter, M. Zlabinger, A. Hanbury
2019TREC TU Wien @ TREC Deep Learning ’19 – Simple Contextualization for Re-ranking
S. Hofstätter, M. Zlabinger, A. Hanbury
2019SIGIR
(Short)
On the Effect of Low-Frequency Terms on Neural-IR Models
S. Hofstätter, N. Rekabsaz, C. Eickhoff, A. Hanbury
2019OSIRRC
(Workshop)
Let’s measure runtime!
S. Hofstätter, A. Hanbury

For all my publications (including collaborations) visit Google or Semantic Scholar.

Teaching

All my materials are available open-source on GitHub. For a detailed description of our workflow for remote teaching see:

2022SIGCSE
(Full)
A Time-Optimized Content Creation Workflow for Remote Teaching
S. Hofstätter, S. Althammer, M. Sertkan, A. Hanbury

Advanced Information Retrieval (Summer 2022, 2021, 2020, 2019)

Full responsibility and main lecturer for the master-level course with > 100 students on neural methods for IR. Including designing, conducting, and grading of lectures, exercises, and exams.
A playlist of all lecture recordings from 2021 is available on Youtube.

Lectures: Basics of IR; Sequence modelling; Neural retrieval & re-ranking
(10 lectures total)
Exercise: Implement neural re-ranking models in PyTorch

🏆 Won Best Distance Learning Lecture & Best Teacher Award 2021 @ TU Wien


Introduction to Information Retrieval (Winter 2019, 2018)

Lectures:Inverted Index, Scoring models, Efficient & fast text processing Exercise:Implement an efficient search engine from scratch

Experience & Education

2022Research Internship - Google Research (3.5 months) 2021Visiting Scholar - UMass Amherst (2 months) 2018 - 2022PhD - Computer Science - TU Wien 2016 - 2018Master's - Software Engineering - TU Wien 2012 - 2016Bachelor's - Computer Science and Economics - TU Wien