Which Areas Influence the Compression Factor?

Updated May 19, 2018

The following factors influence the compression factor, i.e. the ratio between the size of the uncompressed data and the compressed representation:

Area	Details
Duplicate column values	All compression types can take advantage of duplicate column values. Each distinct value has to be stored only once in the dictionary, and also the value ID encodings benefit.
Length of source column values	Dictionary compression maps column values to shorter value IDs. The longer the original column values are, the more efficient is the dictionary compression.
Sorting of the table records	The way how the table records are sorted can significantly impact the compression ratio. During compression optimization SAP HANA sorts the table records in a way to achieve the overall best compression ratio. Be aware that sorting the table records optimally for one column can reduce the compression ratio for other columns, so often a compromise is required.
UDIV records	UDIV records are linked to multi-version concurrency control and can result in unnecessary space allocation. The column MAX_UDIV in monitoring view M_CS_TABLES provides information about the overall number UDIV records. If it is much higher than the actual number of rows in the table, the memory consumption can be higher than necessary.
Type of compression	The implemented type of the compression can also influence the compression factor. For example, SAP 2105761 describes a scenario where tables permanently remain with compression type DEFAULT, so that the compression factor is much worse than it could be.