Category: Substring indices

Compressed suffix array
In computer science, a compressed suffix array is a compressed data structure for pattern matching. Compressed suffix arrays are a general class of data structure that improve on the suffix array. The
Suffix tree
In computer science, a suffix tree (also called PAT tree or, in an earlier form, position tree) is a compressed trie containing all the suffixes of the given text as their keys and positions in the te
Generalized suffix tree
In computer science, a generalized suffix tree is a suffix tree for a set of strings. Given the set of strings of total length , it is a Patricia tree containing all suffixes of the strings. It is mos
Substring index
In computer science, a substring index is a data structure which gives substring search in a text or text collection in sublinear time. If you have a document of length , or a set of documents of tota
Generalized suffix array
In computer science, a generalized suffix array (or GSA) is a suffix array containing all suffixes for a set of strings. Given the set of strings of total length , it is a lexicographically sorted arr
LCP array
In computer science, the longest common prefix array (LCP array) is an auxiliary data structure to the suffix array. It stores the lengths of the longest common prefixes (LCPs) between all pairs of co
FM-index
In computer science, an FM-index is a compressed full-text substring index based on the Burrows–Wheeler transform, with some similarities to the suffix array. It was created by Paolo Ferragina and Gio
Ukkonen's algorithm
In computer science, Ukkonen's algorithm is a linear-time, online algorithm for constructing suffix trees, proposed by Esko Ukkonen in 1995. The algorithm begins with an implicit suffix tree containin
Suffix array
In computer science, a suffix array is a sorted array of all suffixes of a string. It is a data structure used in, among others, full-text indices, data-compression algorithms, and the field of biblio
Document retrieval
Document retrieval is defined as the matching of some stated user query against a set of free-text records. These records could be any type of mainly unstructured text, such as newspaper articles, rea
Factor oracle
A factor oracle is a finite state automaton that can efficiently search for factors (substrings) in a body of text. Older techniques, such as suffix trees, were time-efficient but required significant
Inverted index
In computer science, an inverted index (also referred to as a postings list, postings file, or inverted file) is a database index storing a mapping from content, such as words or numbers, to its locat