Category: String data structures

GADDAG

A GADDAG is a data structure presented by Steven Gordon in 1994, for use in generating moves for Scrabble and other word-generation games where such moves require words that "hook into" existing words

Deterministic acyclic finite state automaton

In computer science, a deterministic acyclic finite state automaton (DAFSA),also called a directed acyclic word graph (DAWG; though that name also refers to a related data structure that functions as

Compressed suffix array

In computer science, a compressed suffix array is a compressed data structure for pattern matching. Compressed suffix arrays are a general class of data structure that improve on the suffix array. The

Suffix tree

In computer science, a suffix tree (also called PAT tree or, in an earlier form, position tree) is a compressed trie containing all the suffixes of the given text as their keys and positions in the te

Generalized suffix tree

In computer science, a generalized suffix tree is a suffix tree for a set of strings. Given the set of strings of total length , it is a Patricia tree containing all suffixes of the strings. It is mos

Hollerith constant

Hollerith constants, named in honor of Herman Hollerith, were used in early FORTRAN programs to allow manipulation of character data. Early FORTRAN had no CHARACTER data type, only numeric types. In o

Substring index

In computer science, a substring index is a data structure which gives substring search in a text or text collection in sublinear time. If you have a document of length , or a set of documents of tota

Piece table

In computing, a piece table is a data structure typically used to represent a series of edits on a text document. An initial reference (or 'span') to the whole of the original file is created, with su

LCP array

In computer science, the longest common prefix array (LCP array) is an auxiliary data structure to the suffix array. It stores the lengths of the longest common prefixes (LCPs) between all pairs of co

FM-index

In computer science, an FM-index is a compressed full-text substring index based on the Burrows–Wheeler transform, with some similarities to the suffix array. It was created by Paolo Ferragina and Gio

Wavelet Tree

The Wavelet Tree is a succinct data structure to store strings in compressed space. It generalizes the and operations defined on bitvectors to arbitrary alphabets. Originally introduced to represent c

Radix tree

In computer science, a radix tree (also radix trie or compact prefix tree or compressed trie) is a data structure that represents a space-optimized trie (prefix tree) in which each node that is the on

Netstring

In computer programming, a netstring is a formatting method for byte strings that uses a declarative notation to indicate the size of the string. Netstrings store the byte length of the data that foll

Suffix array

In computer science, a suffix array is a sorted array of all suffixes of a string. It is a data structure used in, among others, full-text indices, data-compression algorithms, and the field of biblio

Null-terminated string

In computer programming, a null-terminated string is a character string stored as an array containing the characters and terminated with a null character (a character with a value of zero, called NUL

Rope (data structure)

In computer programming, a rope, or cord, is a data structure composed of smaller strings that is used to efficiently store and manipulate a very long string. For example, a text editing program may u

Suffix automaton

In computer science, a suffix automaton is an efficient data structure for representing the substring index of a given string which allows the storage, processing, and retrieval of compressed informat