Unicode Related Technical Interview Questions and FAQ

Unicode - Technical FAQs

Question 1: What is Unicode?

Unicode is an international standard that assigns characters from virtually every language and script a unique Unicode Scalar Value, which is a number written in hexadecimal notation. As a notational convention "U+" is prefixed to the Scalar Value. For example, the character A has the Unicode Scalar Value U+0041 and Ä is U+00C4. In addition, every character has a unique character name (although the CJK ideographs do not always have a character name listed).

Unicode contains all of the characters used in the following scripts: Latin; Greek; Cyrillic; Armenian; Hebrew; Arabic; Syriac; Thaana; Devanagari; Bengali; Gurmukhi; Oriya; Tamil; Telegu; Kannada; Malayalam; Sinhala; Thai; Lao; Tibetan; Myanmar; Georgian; Hangul; Ethiopic; Cherokee; Canadian-Aboriginal Syllabics; Ogham; Runic; Khmer; Mongolian; Han (Japanese, Chinese, Korean ideographs); Hiragana; Katakana; Bopomofo and Yi.

Question 2: What are the advantages of Unicode?

Unicode provides the solution to the problem of multiple, possibly incompatible code pages:

Unicode currently defines over 100,000 characters, with room for over 1 million characters.
Unicode defines each character only once
Unicode can be used for the system code page, front end, and printing
In a Unicode SAP system you can display and maintain character data from any language with any logon language. For example, you can logon to your system in Japanese and maintain Russian data.
The size and scope of Unicode have made it the default character encoding of the Internet communication, such as XML, Java, and HTML, because Internet communication has to function no matter what the platform, no matter what the program, no matter what the language.

Question 3: Where can I find an overview of Unicode support at SAP?

https://service.sap.com/unicode
http://scn.sap.com/community/internationalization-and-unicode

Qustion 4: What scripts does Unicode support?

Unicode attempts to support all of the scripts used in the world. Where more than one language shares a set of symbols that have a historically related derivation, the union of the set of symbols of each such language is unified into a single collection identified as a single script. These scripts then serve as inventories of symbols which are drawn upon to write particular languages. In many cases, a single script may serve to write tens or even hundreds of languages. The Latin script is used for example by English, German and Vietnamese, the Arabic script is used by Arabic, Farsi and Urdu. In other cases only one language employs a particular script (e.g., Hangul, which is used only for the Korean language). The writing systems for some languages may also make use of more than one script; for example, Japanese traditionally makes use of the Han (Kanji), Hiragana, and Katakana scripts, and modern Japanese usage commonly mixes in the Latin script as well.

Question 5: If all Unicode characters are 2-bytes long, doesn't that double all hardware requirements?

All Unicode characters are not 2 bytes long. The Unicode encoding determines the length of a character. A character in one of the Unicode Encodings can be bigger than 1 byte, and therefore Unicode characters can be longer than characters defined in other standard code pages. This leads to larger hardware demands.

The additional database space requirements depend on the script and the Unicode encoding you use. The following diagram shows the length of A and Ä in four different code pages:

(i) 1100, the SAP code page corresponding to ISO8859-1

(ii) 8000, the SAP code page corresponding to SJIS

(iii) UTF-8

(iv) UTF-16.

In 1100 and 8000 as well as CESU-8/UTF-8 all 7-bit ASCII characters are one byte long in both non-Unicode and Unicode systems. Other characters from single-byte code pages are however twice as long, for example Ä.

1100 8000 UTF-8 UTF-16

A 1 1 2

Ä 1 2 2

If the database contains only characters from a single byte code page, then the length of all characters can double if the code page UTF-16 is used on the database. In all other cases, the increase will depend on the encoding and the code pages used.

Question 6: Why is there more than one Unicode encoding?

ISO/IEC 10646 and Unicode consortium define the character set that is supported in Unicode. Various encoding methods are suggested for the current set of supported characters and scripts. There are 8 bit, 16 bit and 32 bit encodings for Unicode Characters.

UTF-8 : Unicode Transformation Format based on 8 bit representation

CESU-8 : Compatibility Encoding Scheme of UTF-16 on an 8-bit base.

UTF-16 : Unicode Transformation Format based on 16 bit representation.

UTF-32 : Unicode Transformation Format based on 32 bit representation.

UCS-2 : Universal Character Set 2 byte variation

UCS-4 : Universal Character Set 4 byte variation

Each encoding offers advantages and disadvantages. The 8-bit encodings are well-suited for data transfer, because all 7-bit US ASCII characters retain the same code points and this makes communication with legacy, non-Unicode systems easier. The downside is variable character length. In the 32-bit encodings, UTF-32/UCS-4, all characters have a fixed length; this advantage is currently outweighed by the extensive memory requirements that result. The 16-bit encodings offer a compromise, because they does not require as much memory as UTF-32, but offer quasi-fixed character length. UCS-2 has a fixed character length, but it cannot define more than 65,636 characters (216); UTF-16 on the other hand can access all of the characters in Version 4.0 of the Unicode Standard by using the Surrogate Area. Both UTF-32 and UTF-16 are byte order, or "endian" dependent.

Each encoding uses a different base length, and the length of a character in a Unicode encoding can be either variable or fixed.

UTF-8, CESU-8 and UTF-16 contain the same character set and therefore no characters can get "lost" during conversion from one encoding to the other.

Question 7:If I still have to convert between encodings, what is the advantage of Unicode?

UTF-8, CESU-8 and UTF-16 are simply different ways of encoding Unicode characters and they contain the same character set. In other words all Unicode characters are in all of these encodings and therefore no characters can get "lost" during conversion from one encoding to the other.The conversion between these encodings is done algorithmically and is therefore fast.

Question 8: Is it true that I have to convert to Unicode when upgrading to SAP ECC 6.0 ?

If you upgrade MDMP or Blended Code Page systems to SAP ECC 6.0, you must convert them to Unicode first or as part of the Upgrade procedure. Please read the FAQ for Upgrade to SAP ERP 6.0 for more information.

New SAP NetWeaver Releases (i.e. releases following SAP NetWeaver 7.40) and products based on it will be Unicode only. Upgrades of non-Unicode systems to releases higher than SAP NetWeaver 7.40 without prior Unicode conversion will not be supported.

Question 9: Which Unicode Standard does SAP support?

SAP supports all Unicode encodings. Of all of the Unicode Encodings SAP uses UTF-16 on the application server and either UTF-8, CESU-8, or UTF-16 on the database.

UTF-16 includes the support for surrogates, that is, characters represented by a pair of code points where the first code point is located within the Hex interval [0xD800, 0xDBFF] and the second code point is located within the interval [0xDC00, 0xDFFF]. The conversion occurs algorithmically; see If I still have to convert between encodings, what is the advantage of Unicode?

Question 10: Which languages does SAP support?

SAP can technically support all 2-character language keys in the ISO-639 standard. Technical support means that these languages can be used as language keys. It is possible to fill the language data with English menus etc when no translation exists. Note that all data is encoded as Unicode characters, and all Unicode characters can be used in a Unicode System, regardless of the language key.You can find a detailed list of all supported languages and code pages in SAP 73606.

Question 11: On which platforms does SAP support Unicode systems?

Please see SAP 379940.

Question 12: Which SAP Business Suite components are available as Unicode-based version?

All ABAP based SAP solutions in standard maintenance are supported.

For older solutions:

SAP CRM 4. 0: The Unicode version of SAP CRM 4.0 is available within unrestricted shipment.

SAP SCM 4. 0: The Unicode version of SAP SCM 4.X is available within unrestricted shipment.

SAP SEM 4. 0: The Unicode version of SAP SEM 4.X is available within unrestricted shipment.

SAP SRM 4.0: The Unicode version of SAP SRM 4.0 is available within unrestricted shipment. See also SAP 819426.

SAP R/3 Enterprise 4.70 Extension Set 2. 00: The Unicode version of SAP R/3 Enterprise 4.70 Ext. 2.00 is available within unrestricted shipment.

Please note that once a solution is released for Unicode, all successor releases will be available in Unicode(for example all ERP releases).

Question 13: Which SAP NetWeaver components are available as Unicode-based version?

All ABAP based SAP solutions in standard maintenance are supported.

For older solutions:

SAP BW 3.5: The Unicode version of SAP BW 3. 5 is available with SAP NetWeaver within unrestricted shipment.Read SAP 588480 for release-specific restrictions

SAP XI 2.0: SAP Exchange Infrastructure 2. 0 is available within unrestricted shipment. Note that SAP XI 2.0 is only available as Unicode version!

SAP EP 6.0: SAP Enterprise Portal 6. 0 is available within unrestricted shipment. Note that SAP EP 6.0 is only available as Unicode version!

SAP KW 7.0: SAP Knowledge Warehouse 7. 0 available within unrestricted shipment.

Question 14: How can I convert to Unicode?

If your non-Unicode system is on a Unicode-enabled release (SAP_BASIS 6.20 onwards) and you do not plan to upgrade to a higher release, you can use the standard Unicode conversion path. This applies to all code page configurations (Single Code Page, Blended Code Page, MDMP).

Please read the Unicode Conversion Guide which is the official documentation for this method. You can find the Unicode Conversion Guides as attachments of SAP 551344 and 1051576.

If your non-Unicode system is on release SAP_BASIS 4.6C or higher and you plan to upgrade to SAP ERP 6.0 , you can use the Combined Upgrade & Unicode Conversion path. This applies for all code page configurations (Single Code Page, Blended Code Page, MDMP).

Please read the Combined Upgrade & Unicode Conversion Guide which is the official documentation for this method.This Guide and further information is available in SAP 928729.

If your non-Unicode system is on SAP ERP 6. 0 EhPx and and you plan to upgrade to SAP ERP 6.0 EhPy in combination with a Unicode conversion, then the CU&UC process is not supported (see SAP 928729). However a sequential approach (Upgrade and then Unicode conversion) should be possible in most cases during one downtime.

Question 15: What documentation is required for a Unicode Conversion?

It depends on the Conversion Path you choose (conversion with or without upgrade), on the code page configuration (MDMP, Blended Code Page or Single Code Page), on the SAP_BASIS release, on the source release (if you decide to do a combined upgrade and conversion to Unicode), and on the platform, the DB settings, and some other factors.

For a Unicode Conversion without Upgrade you need:

A version of the Unicode Conversion Guide - choose the applicable version according to your code page configuration, and your SAP_BASIS release/support package.

A Homogeneous and Heterogeneous System Copy Guide - choose the applicable version according to your SAP_BASIS release/SR.

An Installation Guide for the installation of the target system after the System Copy - choose the applicable version according to the System Copy Guide.

If you decided to do a Combined Upgrade & Unicode Conversion (CU&UC) project, you need:

The CU& UC Guide - choose applicable version according to your SAP_BASIS source release. Note: There is no Single Code Page Conversion Guide for CU&UC.

An Upgrade Guide - choose the applicable version according to your SAP_BASIS target release.

A Homogeneous and Heterogeneous System Copy Guide - choose the applicable version according to your SAP_BASIS release/SR.

An Installation Guide for the installation of the target system after the System Copy - choose the applicable version according to the System Copy Guide.

In addition, there are a number of SAP to be taken into account. They are all mentioned in the documentation wherever required, as is the Unicode Conversion Troubleshooting Guide.

Question 16: Where can I find the documentation?

Unicode Conversion Guide: SAP 551344 and 1051576

CU&UC Guide: SAP 928729

System Copy and Installation Guide: SAP Service Marketplace Quick Link /installnw70.

Upgrade Guide: SAP Service Marketplace Quick Link /upgradenw70.

Unicode Conversion Troubleshooting Guide: SAP 765475.

Question 17: System Downtime is a critical issue for me. Are there any optimization methods/tools?

Yes, there are a number of methods tools for downtime estimation and optimization. You can find a general overview in SAP Developer Network. and in SAP 857081.

Question 18: Can I store Unicode data in a non-Unicode System?

It is not possible to store Unicode data in a non-Unicode system. Therefore it is not possible to store Java data in the database without first converting the data to the code page(s) in the non-Unicode system.

Question 19: What happens to my archived data during a Unicode conversion?

Data which have been archived in a non-Unicode system are not touched during the conversion. They are processed after the conversion is finished while being read in the Unicode system. If you read in archived data in the Unicode system, an automatical conversion is performed based on the code page which is saved in the header information of the file. For additional information about reading data which have been archived in MDMP systems, see SAP 449918.

Question 20: Does SAP warrant data consistency during and after a Unicode conversion? Is there an official statement on SOX-compliance?

Please read SAP statement: Data consistency in a Unicode Conversion for more information.

Link: http://service.sap. com/~form/sapnet?_SHORTKEY=01100035870000380759&_OBJECT=011000358700000748442007D

Question 21: Will my ABAP programs work in a Unicode system?

Most programs should work without any modification, but you need to ensure that all programs comply with the stricter ABAP 6.10 syntax and semantics, which improve program efficiency and enable Unicode support. Note that all programs must be 6.10 compliant to run in a Unicode system and 6.10 compliant programs will also run in a non-Unicode system as well. In a non-Unicode system, programs do not have to be 6.10 compliant.

To check your program, use the transaction UCCHECK to determine if your programs are ABAP 6.10 compliant; In additional, programs should be tested to catch non-static errors that appear at run-time. Use the transaction SCOV to monitor the testing. See the Media Library for more information, as well as the ABAP documentation.

Question 22: How do I find and enter Unicode characters?

When you log on to a Unicode system, the correct GUI settings ensure that you are entering Unicode characters. How to set the settings for SAP GUI for Windows is described in the I18N User's Guide. See SAP 508854 for more information. If you have a system with different release levels, see SAP 195490 for information about how to set the front end code page in older releases.

There are several ways to find a Unicode character:

With a MS Front End

WINDOWS 7: Start button > All Programs > Accessories > System Tools > Character Map. Select character set Unicode, and then enter/find the character you want. Place the cursor over the character, and the Unicode name and number will be displayed.

If you know the code point for the character in another code page, you can determine the Unicode ID in the transaction SCP. For example, you know that a character has the byte value 0xA9 in the code page ISO8859-2 ( SAP Code Page 1401) and want to know the Unicode Scalar Value:

Selection: "Code page = 1401" and Byte sequence = "A9"

Show or Test: Checkbox: Display as table The table will show the Unicode ID: U+0160

If you know a portion of the name, you can find a character within SCP. Place the cursor in the field character and select F4. Enter the portion of the name you know and add a wildcard * before and after. For example, you are looking for a character that is an A "with a little tail", enter *CAPITAL LETTTER*A*WITH* and then return.

Look at the character charts on the Unicode. org homepage.

Question 23: How do I print Unicode characters?

You can use all SAP device types that SAP supports. Several device types also support UTF-8 natively, but generally only a subset of all Unicode characters (for example only SJIS characters). Use a Lexmark or HP printer with Unicode support to print characters together that belong to one or more non-Unicode code page, or in order to print all of the characters that are defined in Unicode.

Question 24: Which fonts can I use?

Currently, there is no Unicode font available and for the best appearance, you should use any font that corresponds to the character set you want to enter. If characters are not available in the font you select, the system automatically selects another font to display the remaining characters. Therefore, regardless of the font you select, all characters will appear correctly on the screen. Due to font substitution, some characters may not be as attractive as others, however.

As a solution the Unicode SAP team developed the Cascading Font Configurator (CFC) - a tool for printing Unicode text which normally requires different fonts in order to be displayed properly (for example German and Chinese characters in one text string). With the CFC you get a predefined set of fonts - scripts mappings. This mechanism (which is customizable) enables your Unicode SAP systems to automatically switch between the most useful fonts. The Cascading Font Configurator is available with

Web AS 6.20 Support Package 57
Web AS 6.40 Support Package 15
Web AS 7.00 Support Package 06.

Question 25: Can a Unicode system communicate with a non-Unicode system?

RFC communication between a Unicode and non-Unicode system requires a code page conversion between Unicode and the code page used by the RFC communication partner. TABLES parameters of an RFC involving an MDMP system are evaluated line-by-line to ensure that the correct conversion takes place; cf 547444 for details. If a character can not be converted, a hash mark # U+0023 is the default replacement character.

All conversion takes place in the Unicode system, and non-Unicode systems do not need to be modified. A non-Unicode system acting as RFC client passes the list of languages and corresponding code pages to the Unicode server system at connect time. When the Unicode system is RFC client it uses a list of languages and corresponding code pages stored in the RFC destination pointing to the server. For RFC communication with external, non-SAP, partners see the RFC documentation for more information. Go to SAP Service Marketplace -> Quick Link /rfc-library. Select Media Library -> RFC Library Guide.

Transports between Unicode and non-Unicode SAP systems are technically supported, however they are subject to certain restrictions. See SAP 638357 for detailed information.