At this site, I gather and categorize information about my research, which focuses on text indices and string-algorithms in general. Here, you can find publications, presentations, code, and more grouped by topic. A chronologically sorted list is of all my activities is available at the home page.
A wavelet trees is a compact data structure that can be used to answer rank and select queries on texts of arbitrary size (among others). We are interested in the efficient construction of wavelet trees (and wavelet matrices) in different models of computation, i.e., distributed, shared, and external memory. [read more]
We present different suffix sorting algorithms that work in distributed memory. To this end, we use the Message Passing Interface (MPI) and Thrill (an experimental distributed bit data batch processing framework). [read more]
It was common belief that the naive scan approach is the fastest algorithm to answer longest common extension queries. We show that there exist data structures that can answer those queries faster while requiring just a small memory overhead (of around 10%-20% of the input size. [read more]
The distributed Patricia trie is a distributed full-text index that can answer existential, counting, and enumeration queries. It is based on the suffix array and longest common prefix (LCP) array. [read more]
Suffix sorting is the foundation of the suffix array, one of the most well-researched text data structures. While there are different implementations of multiple suffix sorting algorithms, there is no benchmark comparing them all. We provide an easily extendable benchmark framework for suffix sorting. [read more]
The suffix array is often accompanied by the longest common prefix (LCP) array. We present a new LCP array construction algorithm and provide an extensive benchmark for all LCP array construction algorithms that we are aware of. [read more]
Everything that does not fit into any description above is collected here. [read more]