Florian Kurpicz

Distributed Suffix Array Construction

We present different suffix sorting algorithms that work in distributed memory. To this end, we use the Message Passing Interface (MPI) and Thrill (an experimental distributed bit data batch processing framework).

Our Thrill implementations are based on prefix doubling and recursion, which are two of the three main techniques used for suffix sorting. The third technique is called induced copying, and we present the first distributed suffix sorting algorithm based on this technique (implemented using MPI).


The implementations are part of two different repositories. First, the suffix sorting algorithms that we implemented using Thrill are part of the Thrill repository, which is available at https://github.com/thrill/thrill. Here, it still is the largest example of distributed algorithms implemented using Thrill.

Second, the suffix sorting algorithms that we implemented using MPI are available at https://github.com/kurpicz/dsss. Additionally, this repository also contains the implementations of our distributed string sorting algorithms.

Posts Related to Distributed Suffix Array Construction

Presentation at ALENEX 2019

Lightweight Distributed Suffix Array Construction
Paper Accepted at Big Data 2018

Scalable Construction of Text Indexes with Thrill