FAQ: Oracle I/O configuration
1. What does I/O configuration mean?
I/O configuration comprises the following components:
Layout of the Oracle database files (data files, redo logs, control files) on the I/O subsystem
Configuration of the I/O subsystem
Operating system parameter settings to control the I/O behavior
Oracle parameter settings to control the I/O behavior
2. What are the advantages associated with a good I/O configuration?
If you have a good I/O configuration, you can largely avoid bottlenecks at operating system level or at hardware level and it makes optimal use of existing resources. This results in improved database performance and a reduction in the average database response times.
3. What are the general recommendations in relation to Oracle's hard disk layout?
When you set up a database, note the following points in relation to the hard disk layout:
Redo logs and data files should be in different disk areas.
The two redo log copies of a group should be in different disk areas.
Two consecutive online redo logs should not be in the same disk area.
You should 'stripe' data across as many disks as possible, to distribute the load evenly (for example, all Oracle data files in one large stripe set across many disks).
Since the online redo logs are the most critical for database performance, make sure that these are on the fast disks and that no other change-intensive data areas are stored on the same disks. For online redo logs, you should also avoid using functions that have a negative effect on performance such as classic RAID
5. For high-end systems, we recommend that you use RAW devices, and hardware mirroring instead of software mirroring (see Note 888626).
The rollback tablespace data files or the undo tablespace data files are just as change-intensive, and in many cases the temporary tablespace files are also change-intensive. Therefore, you should also make sure that these tablespace files are distributed across fast disks and that areas of the disk are used intensively.
4. Which operating system mechanisms play a role in I/O?
The following mechanisms makes the operating system available for I/O:
File system cache: The file system cache acts as a buffer between the I/O subsystem and the application. Any blocks that are read or written are stored in the file system cache, which means that subsequent operations find the data in the cache and do not have to read the data from the I/O subsystem.
File lock: When you change a block in a file, operating systems set an exclusive lock on the file, which means that you cannot make a parallel change to the same data. On UNIX, this lock is also called an inode lock.
Cache synchronization: Dirty blocks are written from the file system cache to the hard disk at regular intervals. On UNIX, the sync daemon performs this task.
5. What are the different I/O methods?
The following types of I/O exist:
Synchronous I/O: A process must first wait for the result of an I/O request that it sent before it can continue. This means that a process cannot send more than one I/O request at the same time.
Asynchronous I/O: A process does not have to wait for the I/O request that it sent to be confirmed, but rather it can continue processing the program logic. This means that several asynchronous I/O requests can be active at the same time.
- Advantage: better performance than with synchronous I/O
Cached I/O: The file system cache is active for caching data between the application and the I/O subsystem.
- Advantage: In nondatabase environments, the file system cache allows faster access to the blocks that are frequently required.
Direct I/O: Data is read directly from the disk with no interim buffering at operating system level.
- Advantage: In the Oracle environment, this prevents blocks from being unnecessarily buffered twice (once in the file system cache and once in the Oracle buffer pool). This means that more memory is available for other tasks. In addition, the absence of interim buffering has a positive impact on performance.
Caution: In systems with a relatively small buffer pool and a large amount of memory area used by the file system cache, the activation of direct I/O can lead to a significant deterioration in performance, since the file system cache ceases to exist as an interim buffer. In this case, you should increase the Oracle buffer pool significantly so that released memory can still be used meaningfully.
Concurrent I/O: With this special implementation of direct I/O, the operating system dispenses with the usually necessary locking mechanism (inode locking) in order to avoid parallel changes to the same data. This is possible in the Oracle environment, since the database implementation uses its locking mechanisms to exclude parallel changes to the same data.
- Advantage: As a result of dispensing with inode locking, several processes can change data in the same file at the same time, which may considerably improve write performance.
Raw I/O: The operating system layer (file system cache, inode lock, sync daemon) is bypassed completely and the data is retrieved directly from raw devices and stored on raw devices.
- Advantage: Fastest access type by completely dispensing with operating system mechanisms
- Disadvantage: Increased administration work
6. Which I/O methods are better in terms of performance?
The following generally applies:
The best I/O performance can be achieved using raw devices.
The second best performance can be achieved using concurrent I/O.
Direct I/O follows in third place.
Async I/O is generally independent of other I/O types and should be activated whenever possible.
7. Which I/O methods are available on individual operating systems?
Notes 834343 (9i) and 999524 (10g) contain an overview of the options available in the individual environments.
8. How can I activate operating-system-dependent async I/O?
Async I/O can be used on most operating systems, without additional tasks.
For LINUX, refer to Oracle Metalink Notes 225751.1 (activation) and 279069.1 (compatibility matrix).
For HP-UX, refer to Metalink Note 139272.1.
9. Which Oracle parameters affect the I/O method?
The following parameters affect the I/O type:
DISK_ASYNCH_IO: If this parameter is set to TRUE and if the operating system provides asynchronous I/O, asynchronous I/O is used. Otherwise, synchronous I/O is used. The parameter is set to TRUE by default and should only be set to FALSE if you suspect that the asynchronous I/O implementation on the operating system is causing problems.
FILESYSTEMIO_OPTIONS (Oracle 9i or higher): This parameter overrides DISK_ASYNCH_IO in terms of file systems, but it has no influence on the access to raw devices. The following values are possible:
- NONE: no explicit activation of direct I/O or async I/O
- DIRECTIO: direct I/O
- ASYNCH: asynchronous I/O
- SETALL: Asynchronous direct I/O (incl. concurrent I/O, if possible)
10. How do I know if there are problems in the I/O area of an existing system?
Increased values for Oracle I/O wait events are significant indicators of problems in the I/O area:
Increased times for "db file sequential read" and "db file scattered read" indicate problems when the hard disk reads blocks.
Increased values for "log file sync" and "log buffer space" indicate problems with writing to the online redo logs.
The occurrence of the event "log file switch (checkpoint incomplete)" indicates problems with writing to the datafiles.
11. What should I do if there are problems in the I/O area?
Contact your operating system partner and hardware partner if there are problems in the I/O area, since the cause of problems of this type is usually to be found in the operating system or hardware area, and the partners have the relevant tools and experience to make a more detailed analysis.SAP also provides the optimization service "SMO Storage Subsystem Optimization Service" in collaboration with the relevant partners. Additional information about this service is available at http://service.sap.com/smo.
Otherwise, SAP and Oracle can only provide rough general recommendations:
Check to what extent the system concerned corresponds to the standard recommendations listed above.
Check to what extent the operating system configuration (for example, the async I/O maxserver on AIX) corresponds to the recommendations from the OS Dependencies Guide.
Check whether there are external tools that you can use to improve I/O peformance (for example, "VERITAS Storage Foundation" with Oracle Disk Manager).
If the RAID stripe size is too small (for example, 8 KB), this may sometimes have negative effects on the I/O performance because more I/O requests are required for the same volume of data. Therefore, check with your hardware supplier to see whether it is possible and useful to increase the stripe size.
Recommendations for AIX:
- For information about the optimal configuration of the operating system memory areas, see Notes 78498, 822896 and 902952.
- On AIX, the Oracle parameter FILESYSTEMIO_OPTIONS is partly delivered with the default setting NONE. Make sure that this parameter is set to ASYNCH or SETALL - depending on IBM's recommendation - (see Note 694810).
- Note 973227 contains a list of AIX recommendations for AIX 5.2 or higher.
- You should regularly check whether the number of configured AIO server processes (the maxservers parameter) is sufficient. You can use the following command to determine how many processes are currently running:
pstat -a | grep aio | wc -l
If the product "maxservers * CPUs" has already been reached, you should increase the size of the maxservers parameter.
Recommendations for HP-UX:
- By default, HP file systems do not support asynchronous I/O. However, you can access solutions such as Veritas Quick I/O, which enables async I/O on HP-UX.
- See also Note 1077887.
Recommendations for SOLARIS:
- Note 892631 explains the mount options that should be used for optimal database performance.
- If you are using DISM (that is, the activated Oracle parameters DB_CACHE_SIZE and SGA_MAX_SIZE), make sure that the SOLARIS prerequisites have been met. Otherwise, increased I/O times and an increased SYSTEM-CPU consumption are possible. For more information, see the corresponding section in Note 712624. If in doubt, deactivate the DB_CACHE_SIZE in favor of DB_BLOCK_BUFFERS.
Recommendations for WINDOWS:
- On WINDOWS platforms, check the disk alignment (see Note 886337).
To determine whether Oracle carries out the correct I/O operations and whether there are individual operating system commands with an increased runtime, you can create an operating system call trace with operating-system-dependent tools such as TRACE, TRUSS or TUSC. For example, the following extract, from an AIX trace ("truss -D -p <pid>") of the DBWR process, tells you which I/O commands are used, and that the time is mainly lost on fsync (0.5346 seconds until the next command):
0.0003: times(0x0FFFFFFFFFFFABC0) = 1704161
0.0004: kfcntl(22, F_SETFD, 0x0000000000000001) = 0
0.0004: listio64(0x0000000010000004, ...)
0.0004: aio_nwait(0x0000000000001000, ...)
0.0006: fsync(22) = 0
0.5346: close(22) = 0
0.0007: times(0x0FFFFFFFFFFFAD20) = 1704215
This information should enable the operating system partner to draw more conclusions.
To minimize the problems with inode locking, consider reducing the data files concerned. The smaller the files, the less likely it is that several processes will want exclusive access to the files at the same time.
If a DBWR process cannot process the changes in question fast enough, despite the system having a good I/O configuration, several DBWR processes can be defined using the DB_WRITER_PROCESSES Oracle parameter (for example, 2 or 4).
Changing raw devices for the critical parts of the database is the last option for optimizing the I/O behavior. However, the price you pay for the better performance provided by raw devices is a more difficult and error-prone maintainability.
12. Which errors indicate problems with async I/O?
Known errors are described in Notes 798194 (HP-UX) and 683396/210385 (AIX).
If, on AIX, there are trace files with the entries
Warning: lio_listio returned EAGAIN
Performance degradation may be seen.
the async I/O configuration is not optimal. In this case, check the settings for the AIX parameters such as maxservers or maxreqs. Contact your operating system partner if necessary.
13. How can I reduce the I/O quantity on Oracle?
Depending on the I/O type, the following optimization options exist:
Write I/O in the online redo logs
- The written volume of data is proportional to the number of changes to the database. The less data that is changed, the smaller the write load on the online redo logs.
- Make sure that the data files for online backups are set, so that they are as short as possible in backup mode since, during this time, complete 8K blocks are always written to the redo log when changes are being made, and not just the changes themselves.
- The I/O-intensive database operations described in Note 806554 are also responsible for placing a large load on the redo logs. If necessary, you can use NOLOGGING to prevent the system from writing changes to the redo log.
Read I/O from the data files
- Disk reads may be reduced by tuning processing-intensive SQL statements (Note 766349) or by increasing the size of the Oracle buffer pool.
- To reduce the read I/O due to LOB accesses, cache the LOBs (Note 563359).
- To reduce the read I/O with regard to the temporary tablespace, tune processing-intensive sortings and problematic execution paths with hash operations and bitmap operations.
Write I/O to the data files
- Use online redo logs that are sufficiently large to ensure that at least one minute or more passes between two log switches. Otherwise, the numerous log switches and the checkpoints that go with them cause an unnecessary write load on the data files.
- Use the Oracle parameters FAST_START_MTTR_TARGET or FAST_START_IO_TARGET to avoid configuring incremental checkpoints. Use the following query to check which overhead is triggered by incremental checkpoints.
S1.VALUE "PHYSICAL WRITES",
S2.VALUE "PHYSICAL WRITES NON CHECKPOINT",
ROUND(S1.VALUE / S2.VALUE * 100 - 100) "OVERHEAD (%)"
S1.NAME = 'physical writes' AND
S2.NAME = 'physical writes non checkpoint';
The parameters LOG_CHECKPOINT_TIMEOUT and LOG_CHECKPOINT_INTERVAL can also trigger incremental checkpoints. If you are in doubt, set both of these parameters to 0. However, you should not have to change these parameters.
- To reduce the write I/O with regard to the temporary tablespace, tune processing-intensive sortings and problematic execution paths with hash operations and bitmap operations.