Tuning the performance of INSERTs and Data loads

INSERTs and data loads write new data, while other operations like SELECT, UPDATE or DELETE have to work on existing data.

Therefore typical performance factors are partially different. If you want to improve the performance of INSERTs and data loads, you can consider the following areas:

Area

Details

Lock waits

See 1999998 and optimize lock wait situations if required. Typical situation when an INSERT has to wait for a lock are:

Critical savepoint phase
Concurrent INSERT of same primary key
SAP HANA internal locks
DDL operation on same table active

Columns

During an INSERT every column has to be maintained individually, so the INSERT time significantly depends on the number of table columns.

Indexes

Every existing index slows down an INSERT operation. Check if you can reduce the number of indexes during mass INSERTs and data loads. SAP BW provides possibilities to automatically drop and recreate indexes during data loads. Primary index normally mustn't be dropped.

Bulk load

If a high number of records is loaded, you shouldn't perform an INSERT for every individual record. Instead you should take advantage of bulk loading options (i.e. inserting multiple records with a single INSERT operation) whenever possible.

Parallelism

If a high number of records is loaded, you should consider parallelism on client side, so that multiple connections to SAP HANA are used to load the data.

Commits

Make sure that a COMMIT is executed on a regular basis when mass INSERTs are done (e.g after each bulk of a bulk load).

Delta merge

A large delta storage can reduce the load performance, so make sure that delta merges are performed on a regular basis.

Avoid repeated merges of small delta storages or with a high amount of uncommitted data in order to avoid unnecessary overhead.

Table vs. record lock

In cases where only a single, non-parallelized INSERT is possible and concurrent changes to the underlying table aren't required, it can be useful to use a global table lock instead of a high number of individual record locks. The table lock can be set via:

LOCK TABLE "<table_name>" IN EXCLUSIVE MODE

Afterwards SAP HANA no longer needs to maintain individual record locks. This approach is also valid for INSERT ... SELECT operations which may be parallelized internally.

Savepoints

Savepoints are required to write modified data down to disk. Normally it is the main intention to shorten the blocking savepoint phase as much as possible and accept longer savepoint durations at the same time. During mass imports the opposite can be better: Shorter savepoints with the risk of increased blocking phases. Shorter savepoints can reduce the amount of data written to disk and they can reduce the amount of logs that need to be kept, reducing the risk of file system overflows.

During mass changes the following parameter adjustments can be considered to reduce the overall savepoint duration:

lower values for global.ini -> [persistence] -> savepoint_max_pre_critical_flush_duration (e.g. 300 instead of 900)
higher values for global.ini -> [persistence] -> savepoint_pre_critical_flush_retry_threshold (e.g. 10000 instead of 3000)

Bugs

The following SAP HANA bugs can have a negative impact on INSERT performance:

Impacted Revisions	Details
1.00.120 - 1.00.122.11 2.00.000 - 2.00.012.00	If a lot of spatial data is inserted row-by-row without commit, the performance can be quite bad due to a SAP HANA bug and a lot of time is spent in call stack module AttributeEngine::spatialae::DeltaComponents::reserveDocid. As a secondary effect contention on "GeometryDeltaAttribute Lock" is possible.

Typical INSERT throughputs are:

Constellation	Typical throughput
Problem situations like long critical savepoint phases or other locks	< 500 records / second
Normal, sequential single-row INSERTs	1,000 - 10,000 records / second
Highly parallelized bulk loads	1,000,000 records / second