Online Tutorials & Training Materials | STechies.com
Register Login

Oracle National Language Support (NLS)

|| || 1

Oracle National Language Support (NLS)
Stechies

1. What is National Language Support?

National Language Support (NLS) is necessary for the following tasks:

  • Converting character sets between client and server
  • Supporting different languages
  • Region-dependent formatting of date and currency data
  • Sort sequence

2. What is Oracle Globalization Support?

Oracle Globalization Support is the new name for the National Language Support as of Oracle 9i.

3. What is a character set?

Internally, all characters on the computer (letters, figures, special characters) are represented by figures. Code pages on the operating system determine which character is represented by which figure.

Oracle also uses this kind of internal representation which is defined by character sets. You can therefore compare a character set to a code page which defines the Oracle-internal representation of characters.

4. Why do I need different character sets?

The number of characters that can be represented by a character set is limited. For example, you can only represent 2^8, that is, 256, different characters with an 8-bit character set. This number is not sufficient to represent all characters of all languages.

5. How many different characters can be represented by the character sets?

There are four groups of character sets:

  • 7-bit character sets (for example, US7ASCII): 2^7, that is, 128, characters can be represented.
  • 8-bit character sets (for example, WE8DEC): 2^8, that is, 256, characters can be represented.
  • Multibyte character sets (for example, KO16KSC5601): more than 256 characters can be represented
  • UNICODE character sets (for example, UTF8): In theory, over a million characters can be represented.

6. What, from an Oracle point of view, is a proper setup of the character sets?

From an Oracle point of view, the following requirements should be met (Caution: These conditions are NOT met in R/3 environments, see the next question):

  • You should set up the Oracle database with a character set that supports all characters used by the application.
  • The Oracle clients should use a character set that matches the code page used in the operating system.

For example:

  • A Windows client with code page WIN1252 uses the WE8MSWIN1252 character set.
  • A UNIX client with code page ROMAN8 uses the WE8ROMAN8 character set.
  • The relevant database server is defined with the UTF8 character set that contains the characters of both client code pages.

7. How does the R/3 system use character sets?

 For the R/3 system, you should always choose a character set for the client that matches the character set of the installed database. If, for example, the database was installed with the WE8DEC character set, you must also use the WE8DEC character set on all client hosts - irrespective of the code page used in the operating system.

 Here, the R/3 system makes use of the fact that a conversion of the input data is generally not performed if the client and server character sets are identical. Oracle assumes that the input data matches the defined character set and writes the input data to the database without performing an additional check.

 Strictly speaking, this method causes data that does not match the database character set to be written into the database. For R/3 systems, however, this is irrelevant because they use the same method for all subsequent read access operations which will retrieve the same data that were written to the database.

 This means that the R/3 system uses Oracle as a byte container - a conversion CANNOT be performed. The binary data stored in the database is independent of the Oracle character set. Whether you choose US7ASCII, WE8DEC, WE8ISO8859P1 or any other 7 or 8-byte character set does not influence the written binary data. However, as of Oracle 9i, we recommend that you avoid using US7ASCII.

A side effect of this procedure, however, is that some characters (German umlauts, for example) will be displayed incorrectly when you read the data directly from the Oracle database with tools such as sqlplus. This, however, is not a problem - the important issue is that the R/3 system retrieves the data from the database in the same format as it was written into the database.

8. Why can the R/3 system store 8-bit characters even if it only uses the 7-bit US7ASCII character set?

Due to the special NLS setup in the R/3 environment, the data is generally not converted. This means that Oracle writes all 8 bits of the data received from the R/3 system into the database without performing an additional check, even if only a 7-bit character set such as US7ASCII was defined. This condition is not guaranteed or documented by Oracle, which is why you should avoid using US7ASCII as of Oracle 9i.

9. How do I define the server and client character set?

  • The character set of the server is transferred in the CREATE DATABASE" statement when you create the database.
  • The character set on the client is defined with the NLS_LANG environment variable. This variable consists of three components:

NLS_LANG = <sprache>_<gebiet>.<characterset>

              In this case, the <language> and < area> components only affect the representation of number formats and date formats and the sort sequence, but do not influence the internal representation of characters.

              The <characterset> component, which defines the character set used for the client, is responsible for the actual representation mechanism.

              For example: NLS_LANG = AMERICAN_AMERICA. WE8DEC

              If no NLS_LANG entry is set in the environment on Windows platforms, the setting is taken from the registry. However, in the SAP environment, NLS_LANG is always set by default in the environment.

10. How can I determine the character set used by a database?

On an open database, you can use the following SELECT to determine which character set the database is using:

SELECT VALUE FROM V$NLS_PARAMETERS
   WHERE PARAMETER = 'NLS_CHARACTERSET';

11. Where are the character set definitions stored?

 On the server, the character set definitions are written to

<oracle_home>/ocommon/nls/admin/data

 . On the client, the character set definitions are stored as part of the client software installation. On Windows, the directory corresponds to the directory of the server installation. On UNIX, the files are unpacked in directories such as the following, depending on the release:

$ORACLE_HOME/ocommon/NLS_805/admin/data
/oracle/805_32/ocommon/nls/admin/data
/oracle/client/92x_64/ocommon/nls/admin/data

  For details on installing the client software on UNIX, see note 180430.

12. Which environment settings do I need for NLS?

The following environment variables play a role in conjunction with NLS:

  • NLS_LANG: Definition of the language, region and character set of the client (for example, AMERICAN_AMERICA.WE8DEC, see the information above);
  • ORA_NLS: Directory of the NLS files for Oracle 7.2;
  • ORA_NLS32: Directory of the NLS files for Oracle 7.3;
  • ORA_NLS33: Directory of the NLS files for Oracle 8.0 and higher.

On WINDOWS, the ORA_NLS* variables must be set in the Registry and not in the environment.

In current R/3 environments, only NLS_LANG and ORA_NLS33 are required. Refer to notes 556232 (Windows) or 602843 (UNIX) to determine which directory ORA_NLS33 should point to.

13. Which character sets are used by default in the R/3 environment?

 For installations with Oracle Version 7 or earlier, the 7-bit US7ASCII character set was used. For installations with Oracle 8, the 8-bit WE8DEC character set is used by default.

14. Which character sets are supported in the R/3 system?

As long as the character set on the server matches the character set on the client, a conversion is not performed, that is, regardless of the character sets used, all data is stored without conversion in the database. This means that, in principle, you can use each 7-bit and 8-bit character set with a single byte R/3 system. We recommend, however, that you follow the standard procedure, if possible.

              For UNICODE systems, refer to the information provided on http://service.sap.com/unicode.

15. Which character sets are known to the Oracle database?

You can use the following command to check which character sets are known to the Oracle database:

SELECT VALUE FROM V$NLS_VALID_VALUES
WHERE PARAMETER = 'CHARACTERSET';

16. What errors may occur if NLS is configured incorrectly?

Severe problems may occur if, in the R/3 environment, NLS_LANG does not match the database code page. Due to the consequently performed conversion, certain characters (for example, German Umlauts) may be inadvertently corrupted. Therefore, the current kernel checks whether NLS_LANG matches the database code page before each database Connect.

              A large number of problems can occur if ORA_NLS33 points to a non-existent or incorrect directory. For more information, see Notes 592657 and 393620.

17. How can I retrieve detailed information about a character set?

As of Oracle 9i, you can retrieve detailed information about individual character sets with the Locale Builder. You can start the Locale Builder as follows:

Windows: c:oracleora92ocommonnlslbuilderlbuilder.bat
UNIX:    $ORACLE_HOME/ocommon/nls/lbuilder

18. Which methods do I have to use to exchange data between databases with different character sets in the R/3 environment?

You can use tools that use a character set-independent representation to exchange data to connect two databases with different character sets without problems:

  • tp, R3trans (transports)
  • R3load
  • SAP-exclusive functions such as RFC
  • DBConnect (note 518241) and secondary database connections (note 323151) as of Kernel 6.40 (based on the prerequisites from note 808505)

Note, however, that there are restrictions on TP and R3trans if you want to exchange data between a Unicode and a non-Unicode system (see note 638357).

19. Which methods can I NOT use to exchange data between databases with different character sets in the R/3 environment?

Generally, you should NOT use tools that perform an automatic conversion due to different character sets to exchange data between databases in the R/3 environment because characters (for example, German umlauts or other special characters) may be transferred to the target system incorrectly. These tools are, for example:

  • exp, imp
  • SQL*Loader
  • Database connections
  • DBConnect (note 518241) and secondary database connections (note 323151) up to and including Kernel 6.30

You should apply these tools and mechanisms if the databases use identical character sets or if it was previously determined that the conversion does not play any role (for example, because the converted characters are not in the data to be swapped).

20. How can I change the character set of the database?

You can easily adjust the character set if the previously used character set is a subset of the new character set. Refer to note 456968 for information on which character set combinations fulfill the requirements. In the SAP environment, you should especially bear in mind that US7ASCII is an actual subset of WE8DEC, which means that you can easily switch from US7ASCII to WE8DEC. Refer to note 102402 for information on the steps required to perform the change. For the change, the following command is the essential step:

ALTER DATABASE CHARACTER SET <new_characterset>;

Since NO data conversion occurs as part of the character set change, the change is completed within seconds. Long runtimes are not expected for this step. Note that you cannot use the above command to switch to a UNICODE code page.

It is considerably more complicated to change to a character set that is not a superset of the character set you have been using. In this case, your only option is to rebuild the database. Note that the method recommended in the Oracle documentation, which involves an export/import, is NOT permitted in the R/3 environment, since data would be converted by mistake in this case. However, you can use R3load as an alternative (see the guidelines on homogeneous system copies).

Note that SAP only supports a change to UNICODE in the form of an actual migration using R3load. In the SAP environment, you are not permitted to simply execute "ALTER DATABASE CHARACTER SET ...". The SAP kernel must also be Unicode-enabled.

21. What has to be taken into account with UNICODE?

If you are using an R/3 system with UNICODE or the SAP J2EE Application Server, the database must also run with a UNICODE character set. The SAP environment also supports UTF8. The National Character Set is also important in this case. Its value is determined as part of the database installation and must be UTF8. You can use the following SELECT to determine the value used while the database is running:

SELECT VALUE FROM V$NLS_PARAMETERS
   WHERE PARAMETER = 'NLS_NCHAR_CHARACTERSET';

              If this statement does not return UTF8 on a system with an SAP J2EE application server, see Note 669902.

              Since byte semantics is active by default (NLS_LENGTH_SEMANTICS = BYTE), you must specify the maximum number of possible bytes instead of the maximum number of characters when you create VARCHAR2 columns. For this reason, the SAP Data Dictionary automatically triples the specified string length when you create columns on the database (for example SE11 -> string length 12; Oracle level -> Column length 60).

              In the case of UTF8, the additional disk space, memory and CPU resources that Oracle requires essentially depend on the number of characters that are not among the 127 characters and that are therefore represented by two or three bytes. These include German umlauts or the characters of Asian languages, for example. While this does not result in increased resource requirements in a system that has Western European contents, the requirements can increase significantly in a system that contains a large number of Asian characters. However, the definite requirements can only be determined on an individual basis.

              To operate an SAP system with UNICODE, see also the information provided at http://service.sap.com/unicode.

22. What has to be taken into account with multibyte character sets?

In the SAP environment, only the following multibyte character sets are supported (see Note 695899): ZHT16BIG5, ZHS16CGB231280, JA16SJIS and KO16KSC5601

If you use the 6.40 kernel, the 9.2.0. 7 client or higher must be installed in your system (Note 886784).

As of SAP Web Application Server 7. 00 or Oracle Client 10g, multibyte character sets are no longer supported by SAP (Note 858869). However, a Unicode character set can be used as an alternative.

23. From where does the NLS_LANG value come in the Registry under Oracle (WINDOWS)?

As part of the Oracle installation or upgrade, NLS_LANG in the Registry under

HKEY_LOCAL_MACHINE -> Software -> Oracle -> Home

is set to a value that matches the settings (language, code page) of the WINDOWS server (for example, GERMAN_GERMANY.WE8MSWIN1252 or AMERICAN_AMERICA.WE8ISO8859P1). Since this value is overwritten in the environment by the required NLS_LANG entry, it is ineffective and therefore not critical. For a better overview, however, it should be adjusted to the correct NLS_LANG value as it appears in the environment.

24. Where can I find more information on NLS and character sets?

Refer to the "Globalization Support guidelines" (contained in the Oracle online documentation) for extensive information on this topic.


Comments

  • 11 Nov 2009 4:21 pm
    Thanks for copy+paste Sap Note 606359! Very good work! (ironic)

Related Articles