Posting

Addenda for Chapter 6: Index Compression
Group VarInt
In Chapter 6, we discussed the byte-aligned vByte method as an example of an index compression
technique with high decoding performance. After we had written the chapter, Jeff Dean gave a
keynote talk at WSDM 2009 in which he discussed a few compression algorithms used by
Google, including the byte-aligned Group VarInt method.
Group VarInt is named after VarInt (variable-size integer coding), Google's version of the vByte
algorithm. vByte's decoding routine is very efficient, but often suffers from branch misprediction.
Consider the vByte representation of the postings list L = 80, 400, 431, 686:
(L) = 80, 320, 31, 255
vByte(L) = 01010000 11000000 00000010 00011111 11111111 00000001
(where the continuation bits are underlined). When working through the encoded postings
sequence, the decoder needs to inspect every single continuation bit and, depending on whether it
is 1 or 0, either continue with the current posting or skip to the next. This means there is a branch
instruction, and thus the possibility of a branch misprediction, for every byte in the encoded
postings sequence.
Group VarInt addresses this problem in two steps:
1. Replacing the continuation bits with a 2-bit selector value that appears before the body of
the encoded posting. This allows us to use simple arithmetic (or a lookup table) to
determine how many bits to consume for the current posting.
2. Combining four consecutive postings into a group for better space utilization and
alignment of the selecotr values (or to reduce the number of table lookups in case a lookup
table is used).
For the same postings list as before, the Group VarInt representation is:
GroupVarInt(L) = 00010000 0101000 01000000 00000001 00011111 11111111
(the selector bits corresponding to each encoded posting are shown in the same color as the
respective posting). For example, the second -value, 320, can be encoded using 2 bytes, so its
selector value is 01.
Publicly available implementations of Group VarInt typically use a precomputed lookup table to
determine the position of each posting in the encoded data chunk and the bit mask to apply when
extracting it. In our own experiments, we found that a little bit shifting works just as well.
The following decoder implementation assumes a little-endian architecture that supports
unaligned memory access (e.g., Intel/AMD); on other architectures, the decoding operations may
be slightly more complicated.
void DecodeGroupVarInt(const byte* compressed, int size, uint32_t*
uncompressed) {
const uint32_t MASK[4] = { 0xFF, 0xFFFF, 0xFFFFFF, 0xFFFFFFFF };
const byte* limit = compressed + size;
uint32_t current_value = 0;
while (compressed != limit) {
const uint32_t selector = *compressed++;
const uint32_t selector1 = (selector & 3);
current_value += *((uint32_t*)(compressed))
*uncompressed++ = current_value;
compressed += selector1 + 1;
const uint32_t selector2 = ((selector >> 2)
const uint32_t selector3 = ((selector >> 4)
const uint32_t selector4 = (selector >> 6);
}
& MASK[selector1];
& 3);
& MASK[selector2];
& 3);
& MASK[selector3];
& MASK[selector4];
If the number of postings in the compressed list is not a multiple of 4, then special-case handling
may be necessary for the final 13 postings in the list for instance, by encoding those postings
using vByte.
Decoding Performance
The following table is identical with Table 6.9 (page 213) in the book, except for the addition of
Group VarInt:
Compression
Decoding
Cumulative Overhead
Compression Method
(bits per docid)
(ns per docid)
(decoding + disk I/O)
Gamma
4.56
7.73
13.98 ns
Golomb
3.78
10.85
16.03 ns
Rice
3.81
6.46
11.68 ns
LLRUN
3.89
7.04
12.37 ns
Interpolative
3.87
26.50
31.80 ns
vByte
8.11
1.39
12.50 ns
Simple-9
4.91
2.76
9.49 ns
Group VarInt
10.45
1.09
14.84 ns
Perhaps surprisingly, the performance difference between vByte and Group VarInt in the above
table is very small. The reason for this is that the vast majority of -values in the docid index for
GOV2 require 7 bits or less, in which case vByte incurs no branch mispredictions. To evaluate the
performance difference between vByte and Group VarInt on an index in which the -values vary
over a larger range, we repeated the experiment from Table 6.9 for a schema-independent index
instead of a docid index:
Compression Method
Gamma
Golomb
Rice
LLRUN
Compression
(bits per position)
14.09
12.14
12.41
10.07
Decoding
(ns per position)
12.81
12.74
7.32
8.80
Cumulative Overhead
(decoding + disk I/O)
32.11 ns
29.37 ns
24.32 ns
22.60 ns
Interpolative
10.34
31.63
45.80 ns
vByte
12.03
4.34
20.82 ns
Simple-9
13.14
4.21
22.21 ns
Group VarInt
13.10
1.90
19.85 ns
Compared to the docid index, the decoding overhead for vByte increases by 3.0 ns per list
element (+221%). The decoding overhead for Group VarInt, on the other hand, grows by only 0.8
ns per list element (+76%). Thus, while vByte may be an appropriate compression method for
docid indices, since it is almost as fast as Group VarInt and slightly more compact, it is not ideal
for schema-independent postings lists, where Group VarInt delivers much better decoding
performance. If postings lists are stored on disk, one may still argue that vByte is the superior
compression method, as it achieves better compression rates. However, if the index is kept in
memory, then the factor-2.3 difference in decoding performance makes Group VarInt the clear
winner.

Posting

Cargado por

Información del documento

Derechos de autor

Formatos disponibles

Compartir este documento

Compartir o incrustar documentos

Opciones para compartir

¿Le pareció útil este documento?

¿Este contenido es inapropiado?

Copyright:

Formatos disponibles

Posting

Cargado por

Copyright:

Formatos disponibles

Addenda for Chapter 6: Index Compression

También podría gustarte