Register Login

SAP HANA Misconceptions

Updated Oct 14, 2019

What are some common SAP HANA Misconceptions?

SOLUTION:

This table explains details about various common SAP HANA Misconceptions:

Misconception

Details

All the tables in SAP HANA are processed column-oriented.

SAP HANA allows the user to either put a table in column store or in the row store. Quite like traditional relational databases, the tables in row store can be accessed record. Only the tables in column store can be processed column-wise. Is it possible to decide on an individual basis which table works better in row and in column store, based on the access pattern.

The column store typically offers a much better performance than the row store.

Mostly in cases, the column store delivers great performance, however still there are situations where the row store has quite a few advantages. Typically, as a trend it has been observed that SAP tries to locate as many tables in column store as many possible.

Indexes are not required.

It is fact that SAP HANA can scan huge amount of data even without the existence of an index. Still, there are many situations where indexes are considered useful:

  • Support of a unique or a primary key constraint (for instance primary indexes in SAP ABAP scenarios)
  • Actual optimization of central, performance-critical queries
  • Optimization of queries which consume a lot of CPU resources because of highly parallelized operations
  • Tables in row store require indexes like tables in the classic relational databases

Indexes cannot be continued to disk.

This is false. For instance, the main storage of column store notion attributes which relates to indexes is ideally persisted to disk.

SAP HANA is purely an in-memory database.
Disk I/O doesn't matter.

SAP HANA is an in-memory database, however, there are various areas in which the disk I/O affects the performance, e.g.:

  • During startup, row store load.
  • Column store load
  • Hybrid LOB accesses
  • Snapshots and save points
  • Backups
  • Commits

All table data is stored permanently in memory.

Due to various reasons not all SAP HANA tables are permanent and completely preserved in memory:

  • Both row store and column store large hybrid LOB column values and are never loaded into memory.
  • In column store, tables are not necessarily in memory in case they are not used or if they are unloaded because of memory issues.
  • Historic, the date which is rarely accessed can be saved on extended storage and nearline storage outside of the actual SAP HANA database.

Row store and column store functionality is stringently separated.

It has been historically shown that row store and column store come from separate databases, however during the integration in SAP HANA functionality was consolidated and adjusted. Hence, it is not uncommon to view e e.g. row store modules are involved in the processing of column store table accesses.

MOTIVATION = 'AUTO' in M_DELTA_MERGE_STATISTICS always specifies an auto merge.

Up to SAP HANA Rev. 121 the 'AUTO' entry in M_DELTA_MERGE_STATISTICS is not only used by auto merges, but it is also utilized by critical merges which follow completely different set of rules. As a result of this, the user can view 'AUTO' entries for tables with an auto merge which is explicitly disabled.
Beginning with SAP HANA Rev. 122 critical merges are duly recorded with MOTIVATION = 'CRITICAL' in M_DELTA_MERGE_STATISTICS.

M_DELTA_MERGE_STATISTICS encompasses only the information related to delta merge.

No, M_DELTA_MERGE_STATISTICS entails the information for other operations such as compression optimizations (TYPE = 'SPARSE') or delta storage garbage collection (TYPE = 'RECLAIM').

During a restart, monitoring view information is ideally lost.

Many SAP HANA monitoring views (having names starting with 'M_') entail runtime information which is lost during a restart. However, this is not true for all monitoring views, e.g.:

  • M_CS_UNLOADS is generally based on the unload trace files which entail the data of many weeks and months irrespective of restarts.
  • M_MERGED_TRACES is typically based on the existing SAP HANA trace files on file system level.

TOTAL_EXECUTION_TIME in M_SQL_PLAN_CACHE contains the total elapsed time of a SQL statement.

The total elapsed time comprises of both the actual and preparation time (i.e. parse time) and the execution time. The preparation time can be important in few cases.Therefore, the following formula is correct:
"Total elapsed time of SQL statement" = TOTAL_PREPARATION_TIME + TOTAL_EXECUTION_TIME

The unload trace files comprise of only information regarding column unloads.

This is not always true. With SPS 08 both unloads and loads are recorded in the unload trace files. As a consequence it can happen that large unload trace files exist but at the same time the monitoring view M_CS_UNLOADS doesn't contain any record.

Fragmentation is no issue.
No reorganizations are required.

SAP HANA suffers from less fragmentation because the merges in the central column store are obliquely recreating tables. Hence, reorganizations of the column store in memory are not required. Inspite of this there is the risk of fragmentation in other areas:

  • Data volumes
  • Heap allocators
  • Row store

A column is the finest granularity of a column store load

By default, this is correct, however it is possible to define the paged attributes so that columns can be loaded in small chunks.

There is no SQL plan cache.
There is no impact of parsing for the first execution of a SQL statement.

SAP HANA takes advantage of a SQL cache. This refers that parsing or preparing of a SQL statement is only required during the execution of the first time. If parsing consumes a significant amount of time, subsequent executions of the SQL statement can be significantly faster than the first execution.

No column store tables are located on the master node, in the scale-out scenarios.

It is normal that many tables are still located in the column store of the master node in a scale-out system. It is very critical that large and critical tables are transferred to the slave nodes (e.g. infocube, DSO and PSA tables in BW environments).

The row store is only on the master node, in scale-out scenarios.

From an application table perspective it is true that only the master node is populated with row store tables. The other nodes entail a minimal technical implementation of the row store, so that the row store dictionary tables also persist on the slave nodes.

The join engine only processes join requests.

The join engine is not only used for joins. It is also utilized for certain queries on single tables, e.g. if an aggregation like SUM or GROUP BY is performed.

UPDATE statements are always DML operations

This is not generally true. SAP HANA utilizes UPDATE operations also as DDL operations. For instance, the following UPDATE triggers an optimization of the column store compression:
UPDATE "<table_name>" WITH PARAMETERS ('OPTIMIZE_COMPRESSION' = 'YES')

For other databases, there are no popular misconceptions.

There are many popular misconceptions also available for other databases.

The amount of records per table is limited to 2 billion.

Tables can have an unlimited amount of records:

  • In the column store each table partition can hold 2 billion records. If a non-partitioned table exceeds 2 billion records, the user can activate partitioning for eliminating the limitation.
  • In the row store there is ideally no 2 billion record limit.

In scale-out scenarios, table partitioning is very helpful.

SAP HANA table partitioning is helpful for several purposes, e.g.:

  • Load distribution in scale-out scenarios
  • Elimination of the 2 billion record limit
  • SQL statement optimization (e.g. due to partition pruning)
  • Optimization of merge operations

Restore and complete recovery can never eradicate the existing corruptions.

It all depends on the root cause of the corruption:

  • If corruptions are caused by SAP HANA bugs, a restore and complete recovery ideally re-introduces them, because the bug is a part of both the log area and the data.
  • If corruptions are caused due to hardware issues (e.g. "wrong checksum"), a restore and complete recovery will generally correct them.

Tables which have a partition specification have PART_IDs greater than 0

generally the assumption is correct that a non-partitioned table is displayed with PART_ID 0 in views like M_CS_TABLES and the table partitions have PART_ID values greater than 0. However it is possible that a table has only PART_ID 0 though a partition specification is utilized: A partitioning specification of HASH 1 or ROUNDROBIN 1 will eventually result in a non-partitioned table and so the PART_ID is 0. .

The lock mode INTENTIONAL EXCLUSIVE indicates an exclusive lock.

The name is misleading: INTENTIONAL EXCLUSIVE can be considered as a shared lock. This means that e.g. multiple concurrent changes of different records of the same table can happen in various transactions, though an INTENTIONAL EXCLUSIVE object lock can be viewed on the table level. Real exclusive locks, which doesn’t permit concurrent changes, have the lock type EXCLUSIVE. See SAP

The unload priority of row store tables is typically 0.

Tables in row store are ideally kept in memory, so the user can assume that the unload priority is always 0, i.e. never unload the table. If the user verifies the UNLOAD_PRIORITY information in TABLES user can instead typically see the default value of 5 for application tables in row store, so a normal unload behavior.
The unloads are linked to the column store and the unload priority is not evaluated for the row store. The unload priority is still very useful, e.g. in case the table is transferred to column store in the future.

When all table columns (from M_CS_COLUMNS) are loaded, the table will have LOADED = 'TRUE' in M_CS_TABLES.

Even if all normal table columns displayed in M_CS_COLUMNS are loaded, the table is not necessarily fully loaded. It can still happen that it is displayed up with LOADED = 'PARTIALLY'. This constellation specifies that at least one internal column (e.g. $trex_udiv$ or a concat attribute) is not loaded into memory. User can find information about internal columns and their load state in M_CS_ALL_COLUMNS.
For the same reason a "SELECT *" does not necessarily result in a fully loaded table.

Query optimization takes place only during plan generation

During the preparation a generic execution plan is created, however during actual execution some more optimizations (e.g. column access order) take place . As a result user can see the activities of the column store query optimizer (qo3 / trex_qo) or other SAP HANA optimizers also during the actual execution of a SQL statement.

The table attribute NO LOGGING on SAP HANA can be compared with NOLOGGING options of the other relational databases.

NOLOGGING on databases such as Oracle shows that no redo log information is generated. The table data is properly persisted. Incase a table in SAP HANA is created with NO LOGGING, neither log nor data is persisted to disk. Therefore NO LOGGING cannot be utilized for efficient data loads without generating log information. Instead user should use ALTER TABLE ... DISABLE DELTA LOG for that purpose.

M_TABLE_VIRTUAL_FILES entail the disk size information for all tables.

Virtual files only persist for column store tables, therefore the row store table are uncovered in this view. Incase user is interested in global disk size information, user can check M_TABLE_PERSISTENCE_STATISTICS.

The column MODIFY_TIME of M_CS_TABLES indicates the timestamp of the last DDL operation on the table.

MODIFY_TIME of M_CS_TABLES neither corresponds to the last DDL time nor to the last DML time. It is only updated if a CONCAT attribute (e.g. a multi column index) is adjusted (e.g. created, modified or dropped).
As of SAP HANA SPS 10 user can utilize the column LAST_MODIFY_TIME of M_TABLE_STATISTICS to check when the last DML operation has taken place on a table.

The SAP HANA Studio load graph and the information in M_LOAD_HISTORY_HOST and M_LOAD_HISTORY_SERVICE is preserved by the statistics server

Though the load graph information is historical, it is not collected by the statistics server. Instead the nameserver process is in charge of gathering this information and continuing it to the nameserver_history.trc files. See SAP

The statement_memory_limit parameter controls the maximum memory which a single SQL statement can allocate.

The statement_memory_limit parameter actually defines the maximum memory a SQL statement can allocate on a single host (in a single service). In scale-out systems a SQL statement can allocate this memory on various hosts and so the overall memory allocation of the SQL statement will be typically much higher.

SELECT statements cannot be blocked by the SAP HANA locks.

This statement is absolutely correct for transactional locks, however it is not true for internal locks. Refer SAP Document 1999998 where user can e.g. see that SELECTs may wait for "BTree GuardContainer" locks in certain scenarios.

The parameter indexserver.ini -> [transaction] -> idle_cursor_lifetime controls the maximum acceptable idle time of cursors.

The idle_cursor_lifetime parameter is certainly not restricted to idle cursors, it also covers the active SQL statements.

The EXCLUSIVE_ALLOCATED_SIZE in M_HEAP_MEMORY indicates how much memory is currently allocated by a heap allocator.

The column EXCLUSIVE_ALLOCATED_SIZE in monitoring view M_HEAP_MEMORY (respectively HOST_HEAP_ALLOCATORS) contains the sum of all allocations in this heap allocator since the last startup. Normally also a lot of deallocations happen, so the EXCLUSIVE_ALLOCATED_SIZE can be much higher than the currently allocated size. For example, if over time 100 MB are allocated and deallocated 10 times, the actual allocated size is 0, but EXCLUSIVE_ALLOCATED_SIZE would show 1 GB (10 * 100 MB).

All tables which are approaching 2 billion records should typically be divided.

There are some very important exceptions right from the rule to partition tables which are coming close to the record limit of 2 billion. For instance, the SID tables in BW (/BI*/S*) should not be partitioned as they can never exceed the 2 billion records and also the two unique indexes can utilize if the table has been divided.

The first argument of ALTER SYSTEM ALTER CONFIGURATION command always stipulates the parameter file.

Ideally the first argument of ALTER SYSTEM ALTER CONFIGURATION is the parameter file, however there is one exception: incase topology.ini is listed as first argument, the change has been made in the topology within the nameserver persistence. Then there is no parameter file topology.ini.

The Delta joins are either triggered by the mergedog or manually by the application.

This statement holds true mostly however it is very critical to understand that joins can also be triggered SAP HANA internally with regard to specific contexts, e.g. during the recovery or during the replay of a log. Mergedog adjustments cannot control these implicit merges.

A recovery related merge can e.g. be specified by the following modules listed  in the call stack: DataRecovery::RecoveryQueue::run, UnifiedTable::RedoHandler::redo and UnifiedTable::MergeJob::execute.

SAP HANA automatically returns the records in the intended sort order.

For as long as the user does not specify an ORDER BY, the order of the result set of a database request is left undefined. User cannot assume a specific order,and it can be altered any time. SAP HANA indexes do not support the sort order, so for instance e.g. accessing a table via the primary key does not refer to the result has been returned in the order of the primary key.

Therefore it is very important that the user specifies the explicit ORDER BY whenever he requires a specific sort order..

Columns are recompressed only during optimize compression runs

Generally the columns are recompressed during optimize compression runs, however in specific scenarios it may happen that the compression of a column are modified during a delta merge.  

The [internal_hostname_resolution] settings are only used when listeninterface is set to .internal

The user can set the listeninterface parameter to .internal in order to ensure that SAP HANA only listens for incoming requests which are related to the IP addresses configured in the [internal_hostname_resolution] section. So having proper values configured in [internal_hostname_resolution] is a prerequisite for using the .internal value for listen interface

The opposite is false. not true: If user configures an [internal_hostname_resolution] but leaves the listeninterface on .global, the settings in internal_hostname_resolution] are still preffered to be used for SAP HANA server communication. Adding to this, if SAP HANA also listens for other incoming requests and – if the host is not maintained in the [internal_hostname_resolution] - it will utilize a different IP address for communication.

The executed SQL statements are visible in the monitoring view M_EXECUTED_STATEMENTS

M_EXECUTED_STATEMENTS only entails the DDL operations such as CREATE or TRUNCATE, normal SQL types such as UPDATE, SELECT, DELETE and INSERT are not displayed in this view. The user can verify M_SQL_PLAN_CACHE for instance.  


×