Oracle DBA Interview Questions: Optimizing Performance of Direct Path Loads

Oracle 10 DBA Interview questions and answers

1. Is the following SQL statement syntactically correct? If not, please rewrite it correctly.

SELECT col1 FROM tableA WHERE NOT IN (SELECT col1 FROM tableB);

Ans. SQL is incorrect.

Correct SQL : SELECT col1 FROM tableA WHERE col1 NOT IN (SELECT col1 FROM tableB);

2. What is a more efficient way to write this query, to archive the same set?

Ans: SELECT col1 from tableA minus SELECT col1 from tableB

3.How would you determine that the new query is more efficient than the original query?

Ans: Run explain plan on both query and see the result .

4.How can we find the location of the database trace files from within the data dictionary?

Ans: Generally trace file on the database server machine is located in one of two locations:

1. If you are using a dedicated server connection, the trace file will be generated in the directory specified by

the USER_DUMP_DEST parameter.
2.If you are using a shared server connection, the trace file will be generated in the directory specified by the

BACKGROUND_DUMP_DEST parameter.

you can run sqlplus>SHOW PARAMETER DUMP_DEST
or
select name, value
from v$parameter
where name like '%dump_dest%'

5. What is the correct syntax for a UNIX endless WHILE loop?

while :
do
commands
done

6. Write the SQL statement that will return the name and size of the largest datafile in the database.

SQL> select name,bytes from v$datafile where bytes=(select max(bytes) from v$datafile);

7. What are the proper steps to changing the Oracle database block size?

cold backup all data files and backup controlfile to trace, recreate your database
with the new block size using the backup control file, and restore. It may be easier
with rman. You can not change datbase block size on fly.

8. Using awk, write a script to print the 3rd field of every line.

Ans:

awk '{print }'

awk '{print $3}

awk '{print $3}

9.Under what conditions, is a nested loop better than a merge join?

Ans:

Optimizer uses nested loop when we are joining tables containing small number of rows with an efficient driving

condition.
It is important to have an index on column of inner join table as this table is probed every time for a new value

from outer table.

Optimizer may not use nested loop in case:

1. No of rows of both the table is quite high
2. Inner query always results in same set of records
3. The access path of inner table is independent of data coming from outer table.

merge join is used to join two independent data sources. They perform better than nested loop when the volume of

data is big in tables
but not as good as hash joins in general.

10.Which database views would you use to ascertain the number of commits a user's session has performed?

Joining V$SESSTAT ,V$STATNAME

select * from V$SESSTAT a ,V$STATNAME b where b.CLASS=a.STATISTIC# and b.NAME='user commits' and a.sid=

11.What does the #!bin/ksh at the beginning of a shell script do? Why should it be there?

Ans: On the first line of an interpreter script, the "#!", is the name of a program which should be used to

interpret the contents of the file.
For instance, if the first line contains "#! /bin/ksh", then the contents of the file are executed as a korn shell

script.

12.What command is used to find the status of Oracle 10g Clusterware (CRS) and the various components it manages

(ONS, VIP, listener, instances, etc.)?

Ans:

$ocrcheck

13.Describe a scenario in which a vendor clusterware is required, in addition to the Oracle 10g Clusterware.

If you choose external redundancy for the OCR and voting disk, then to enable redundancy, the disk subsystem must be configurable for RAID mirroring/vendor clusterware. Otherwise, your system may be vulnerable because the OCR and voting disk are single points of failure.

14.How would you find the interconnect IP address from any node within an Oracle 10g RAC configuration?

using oifcfg command.

se the oifcfg -help command to display online help for OIFCFG. The elements of OIFCFG commands, some of which are

optional depending on the command, are:

*nodename—Name of the Oracle Clusterware node as listed in the output from the olsnodes command
*if_name—Name by which the interface is configured in the system
*subnet—Subnet address of the interface
*if_type—Type of interface: public or cluster_interconnect

You can use OIFCFG to list the interface names and the subnets of all of the interfaces available on the local node

by executing the iflist keyword as shown in this example:

oifcfg iflist
hme0 139.185.141.0
qfe0 204.152.65.16

You can also retrieve specific OIFCFG information with a getif command using the following syntax:
oifcfg getif [ [-global | -node nodename] [-if if_name[/subnet]] [-type if_type] ]

To store a new interface use the setif keyword. For example, to store the interface hme0, with the subnet

139.185.141.0, as a global interface (to be used as an interconnect for all of the RAC instances in your cluster),

you would use the command:

oifcfg setif -global hme0/139.185.141.0:cluster_interconnect

For a cluster interconnect that exists between only two nodes, for example rac1 and rac2, you could create the cms0

interface with the following commands, assuming 139.185.142.0 is the subnet addresses for the interconnect on rac1

and rac2 respectively:

oifcfg setif -global cms0/139.185.142.0:cluster_interconnect

Use the OIFCFG delif command to delete the stored configuration for global or node-specific interfaces. A specific

node-specific or global interface can be deleted by supplying the interface name, with an optional subnet, on the

command line. Without the -node or -global options, the delif keyword deletes either the given interface or all of

the global and node-specific interfaces on all of the nodes in the cluster. For example, the following command

deletes the global interface named qfe0 for the subnet 204.152.65.0:

oifcfg delif -global qfe0/204.152.65.0

On the other hand, the next command deletes all of the global interfaces stored with OIFCFG:

oifcfg delif -global

15.What is the Purpose of the voting disk in Oracle 10g Clusterware?

Voting disk record node membership information. Oracle Clusterware uses the voting disk to determine which instances are members of a cluster. The voting disk must reside on a shared disk. For high availability, Oracle recommends that you have a minimum of three voting disks. If you configure a single voting disk, then you should use external mirroring to provide redundancy. You can have up to 32 voting disks in your cluster.

16.What is the purpose of the OCR in Oracle 10g Clusterware?

Ans: Oracle Cluster Registry (OCR) is a component in 10g RAC used to store the cluster configuration information. It is a shared disk component, typically located in a shared raw volume that must be accessible to all nodes in the cluster.

The daemon OCSSd manages the configuration info in OCR and maintains the changes to cluster in the registry.

17. In Oracle Streams archived log downstream capture, which database view can be used to determine which archived

logs are no longer needed by the capture process?

Ans: V$ARCHIVE_DEST_STATUS

Sunday, September 7, 2008

Optimizing Performance of Direct Path Loads

Optimizing Performance of Direct Path Loads

You can control the time and temporary storage used during direct path loads.

To minimize time:

Preallocate storage space
Presort the data
Perform infrequent data saves
Minimize use of the redo log
Specify the number of column array rows and the size of the stream buffer
Specify a date cache value
To minimize space:

When sorting data before the load, sort data on the index that requires the most temporary storage space
Avoid index maintenance during the load
Preallocating Storage for Faster Loading

SQL*Loader automatically adds extents to the table if necessary, but this process takes time. For faster loads into a new table, allocate the required extents when the table is created.

To calculate the space required by a table, see the information about managing database files in the Oracle9i Database Administrator's Guide. Then use the INITIAL or MINEXTENTS clause in the SQL CREATE TABLE statement to allocate the required space.

Another approach is to size extents large enough so that extent allocation is infrequent.

Presorting Data for Faster Indexing

You can improve the performance of direct path loads by presorting your data on indexed columns. Presorting minimizes temporary storage requirements during the load. Presorting also allows you to take advantage of high-performance sorting routines that are optimized for your operating system or application.

If the data is presorted and the existing index is not empty, then presorting minimizes the amount of temporary segment space needed for the new keys. The sort routine appends each new key to the key list.

Instead of requiring extra space for sorting, only space for the keys is needed. To calculate the amount of storage needed, use a sort factor of 1.0 instead of 1.3. For more information on estimating storage requirements, see Temporary Segment Storage Requirements.

If presorting is specified and the existing index is empty, then maximum efficiency is achieved. The new keys are simply inserted into the index. Instead of having a temporary segment and new index existing simultaneously with the empty, old index, only the new index exists. So, temporary storage is not required, and time is saved.

SORTED INDEXES Clause

The SORTED INDEXES clause identifies the indexes on which the data is presorted. This clause is allowed only for direct path loads. See Case Study 6: Loading Data Using the Direct Path Load Method for an example.

Generally, you specify only one index in the SORTED INDEXES clause, because data that is sorted for one index is not usually in the right order for another index. When the data is in the same order for multiple indexes, however, all indexes can be specified at once.

All indexes listed in the SORTED INDEXES clause must be created before you start the direct path load.

Unsorted Data

If you specify an index in the SORTED INDEXES clause, and the data is not sorted for that index, then the index is left in an Index Unusable state at the end of the load. The data is present, but any attempt to use the index results in an error. Any index that is left in an Index Unusable state must be rebuilt after the load.

Multiple-Column Indexes

If you specify a multiple-column index in the SORTED INDEXES clause, the data should be sorted so that it is ordered first on the first column in the index, next on the second column in the index, and so on.

For example, if the first column of the index is city, and the second column is last name; then the data should be ordered by name within each city, as in the following list:

Albuquerque Adams
Albuquerque Hartstein
Albuquerque Klein
... ...
Boston Andrews
Boston Bobrowski
Boston Heigham
... ...
Choosing the Best Sort Order

For the best overall performance of direct path loads, you should presort the data based on the index that requires the most temporary segment space. For example, if the primary key is one numeric column, and the secondary key consists of three text columns, then you can minimize both sort time and storage requirements by presorting on the secondary key.

To determine the index that requires the most storage space, use the following procedure:

For each index, add up the widths of all columns in that index.
For a single-table load, pick the index with the largest overall width.
For each table in a multiple-table load, identify the index with the largest overall width for each table. If the same number of rows are to be loaded into each table, then again pick the index with the largest overall width. Usually, the same number of rows are loaded into each table.
If a different number of rows are to be loaded into the indexed tables in a multiple-table load, then multiply the width of each index identified in step 3 by the number of rows that are to be loaded into that index, and pick the index with the largest result.
Infrequent Data Saves

Frequent data saves resulting from a small ROWS value adversely affect the performance of a direct path load. Because direct path loads can be many times faster than conventional loads, the value of ROWS should be considerably higher for a direct load than it would be for a conventional load.

During a data save, loading stops until all of SQL*Loader's buffers are successfully written. You should select the largest value for ROWS that is consistent with safety. It is a good idea to determine the average time to load a row by loading a few thousand rows. Then you can use that value to select a good value for ROWS.

For example, if you can load 20,000 rows per minute, and you do not want to repeat more than 10 minutes of work after an interruption, then set ROWS to be 200,000 (20,000 rows/minute * 10 minutes).

Minimizing Use of the Redo Log

One way to speed a direct load dramatically is to minimize use of the redo log. There are three ways to do this. You can disable archiving, you can specify that the load is UNRECOVERABLE, or you can set the NOLOG attribute of the objects being loaded. This section discusses all methods.

Disabling Archiving

If archiving is disabled, direct path loads do not generate full image redo. Use the ARCHIVELOG and NOARCHIVELOG parameters to set the archiving mode. See the Oracle9i Database Administrator's Guide for more information about archiving.

Specifying the UNRECOVERABLE Parameter

To save time and space in the redo log file, use the UNRECOVERABLE parameter when you load data. An UNRECOVERABLE load does not record loaded data in the redo log file; instead, it generates invalidation redo.

The UNRECOVERABLE parameter applies to all objects loaded during the load session (both data and index segments). Therefore, media recovery is disabled for the loaded table, although database changes by other users may continue to be logged.

Note:
Because the data load is not logged, you may want to make a backup of the data after loading.

If media recovery becomes necessary on data that was loaded with the UNRECOVERABLE parameter, the data blocks that were loaded are marked as logically corrupted.

To recover the data, drop and re-create the data. It is a good idea to do backups immediately after the load to preserve the otherwise unrecoverable data.

By default, a direct path load is RECOVERABLE.

Setting the NOLOG Attribute

If a data or index segment has the NOLOG attribute set, then full image redo logging is disabled for that segment (invalidation redo is generated.) Use of the NOLOG attribute allows a finer degree of control over the objects that are not logged.

Specifying the Number of Column Array Rows and Size of Stream Buffers

The number of column array rows determines the number of rows loaded before the stream buffer is built. The STREAMSIZE parameter specifies the size (in bytes) of the data stream sent from the client to the server.

Use the COLUMNARRAYROWS parameter to specify a value for the number of column array rows.

Use the STREAMSIZE parameter to specify the size for direct path stream buffers.

The optimal values for these parameters vary, depending on the system, input datatypes, and Oracle column datatypes used. When you are using optimal values for your particular configuration, the elapsed time in the SQL*Loader log file should go down.

To see a list of default values for these and other parameters, invoke SQL*Loader without any parameters, as described in Invoking SQL*Loader.

Note:
You should monitor process paging activity, because if paging becomes excessive, performance can be significantly degraded. You may need to lower the values for READSIZE, STREAMSIZE, and COLUMNARRAYROWS to avoid excessive paging.

It can be particularly useful to specify the number of column array rows and size of the steam buffer when you perform direct path loads on multiple-CPU systems. See Optimizing Direct Path Loads on Multiple-CPU Systems for more information.

Specifying a Value for the Date Cache

If you are performing a direct path load in which the same date or timestamp values are loaded many times, a large percentage of total load time can end up being used for converting date and timestamp data. This is especially true if multiple date columns are being loaded. In such a case, it may be possible to improve performance by using the SQL*Loader date cache.

The date cache reduces the number of date conversions done when many duplicate values are present in the input data. It allows you to specify the number of unique dates anticipated during the load.

The date cache is enabled by default. To completely disable the date cache, set it to 0.

The default date cache size is 1000 elements. If the default is used and the number of unique input values loaded exceeds 1000, then the date cache is automatically disabled for that table. This prevents excessive and unnecessary lookup times that could affect performance. However, if instead of using the default, you specify a nonzero value for the date cache and it is exceeded, the date cache is not disabled. Instead, any input data that exceeded the maximum is explicitly converted using the appropriate conversion routines.

The date cache can be associated with only one table. No date cache sharing can take place across tables. A date cache is created for a table only if all of the following conditions are true:

The DATE_CACHE parameter is not set to 0
One or more date values, timestamp values, or both are being loaded that require datatype conversion in order to be stored in the table
The load is a direct path load
Date cache statistics are written to the log file. You can use those statistics to improve direct path load performance as follows:

If the number of cache entries is less than the cache size and there are no cache misses, then the cache size could safely be set to a smaller value.
If the number of cache hits (entries for which there are duplicate values) is small and the number of cache misses is large, then the cache size should be increased. Be aware that if the cache size is increased too much, it may cause other problems such as excessive paging or too much memory usage.
If most of the input date values are unique, the date cache will not enhance performance and therefore should not be used.

Note:
Date cache statistics are not written to the SQL*Loader log file if the cache was active by default and disabled because the maximum was exceeded.

If increasing the cache size does not improve performance, revert to the default behavior or set the cache size to 0. The overall performance improvement also depends on the datatypes of the other columns being loaded. Improvement will be greater for cases in which the total number of date columns loaded is large compared to other types of data loaded.

See Also:
DATE_CACHE
Table Load Information for an example of how date cache statistics are presented in the SQL*Loader log file

Oracle DBA Interview Questions

Oracle 10 DBA Interview questions and answers

Sunday, September 7, 2008

Optimizing Performance of Direct Path Loads

No comments:

Blog Archive

My Favourite Links

Followers

Oracle DBA Interview Questions

Oracle 10 DBA Interview questions and answers

Sunday, September 7, 2008

Optimizing Performance of Direct Path Loads

No comments:

Blog Archive

My Favourite Links

Followers

Subscribe