ORACLE/이중화(HA)

Heterogeneous Database Initial Load Using Oracle GoldenGate

argoLee 2017. 6. 28. 11:56

Oracle GoldenGate: Heterogeneous Database Initial Load Using Oracle GoldenGate

Introduction

Many times when working with customers, we hear “We were told never to use Oracle GoldenGate (OGG) to do initial loads, so how do we load our non-Oracle data to Oracle?” It is true that native database utilities can be used to make copies of large databases faster than OGG in a homogeneous (like-to-like) implementation. However, when loading data of dissimilar structure OGG may be the best, and possibly only, data load solution. The key is to use some of the more powerful, and advanced features of the product to perform parallel data extraction, conversion, and apply.

In this article we shall present examples of these features to perform an Oracle Database to MySQL initial load. Although the database queries and OGG configuration are specific for these two databases, the concepts presented may be used for any initial load process.

The scripts and information provided in this article are for educational purposes only. They are not supported by Oracle Development or Support, and come with no guarantee or warrant for functionality in any environment other than the test system used to prepare this article.

Main Article

Initial load is the cornerstone of all successful OGG implementations. As such, sufficient time for planning and testing must be allotted to determine the correct target database table definitions and optimal OGG settings.

Source to Target Table DDL

This important task can be cumbersome because there are no utilities provided by Oracle for converting DDL between the various types of databases. Things to double check include primary key or unique index definitions, foreign key constraints, data type support, reserved words, and default precisions.

For example, in our source Oracle 11g database we have several tables with columns defined as TIMESTAMP. The default for Oracle TIMESTAMP includes a six digit fractional seconds precision value, which correlates to TIMESTAMP(6).  The default for MySQL is to not include the fractional seconds precision, which correlates to TIMESTAMP(0).

To ensure the Oracle TIMESTAMP is correctly applied the target MySQL columns must be specified as TIMESTAMP(6), otherwise the fractional seconds data will be truncated resulting in a potential out of sync condition between the source and target databases.

Source Data Size

Query the database to get the source data size; that is; the number of rows per table, average row length, and total bytes per table. This information will be used to determine the number of OGG initial load data extraction processes required. To get this information for my source database I ran the following query (table statistics must be current for this query to work):

set lin 300;
COL TABLE_NAME FORMAT A30;
COL NUM_ROWS FORMAT 999,999,999;
COL AVG_ROW_LEN FORMAT 999,999,999;
COL TOTAL_BYTES FORMAT 999,999,999,999,999;
 
SELECT a.table_name, a.num_rows, a.avg_row_len, b.total_bytes FROM
(SELECT table_name, num_rows, avg_row_len FROM user_tables) a,
 (SELECT SUM (BYTES) total_bytes, segment_name AS table_name FROM user_extents
   WHERE segment_type = 'TABLE' GROUP BY segment_name) b
WHERE a.table_name = b.table_name order by num_rows;

Which returns the following:

TABLE_NAME                         NUM_ROWS  AVG_ROW_LEN          TOTAL_BYTES
------------------------------ ------------ ------------ --------------------
NEXT_ORDER                                1            8               65,536
NEXT_CUST                                 1            7               65,536
ORDERS_TOTAL                              3           54               65,536
CATEGORIES                               21           44               65,536
CATEGORIES_DESCRIPTION                   21           18               65,536
PRODUCTS                                 28           86               65,536
PRODUCTS_DESCRIPTION                     28          562               65,536
PRODUCTS_TO_CATEGORIES                   28            9               65,536
CUSTOMERS_LKUP                           36           46               65,536
CUSTOMERS_INFO                        1,089           37              131,072
CUSTOMERS                             1,089           74              131,072
ORDERS                               30,001          311           11,534,336
ORDERS_STATUS_HISTORY                30,002           63            3,145,728
ORDERS_PRODUCTS                     389,733           59           28,311,552
NOPKTBLE                            500,000           30           18,874,368
MILLITBLE                         1,999,833          408          965,738,496
MILLTBLEA                         2,000,000        1,188        3,019,898,880
 
17 rows selected.

TCP/IP Bandwidth Delay Product

The TCP/IP Bandwidth Delay Product (BDP) is used to determine the optimal setting of the OGG TCP Buffer.

Use the “ping” utility to send 10 data packets to the target. This will give us the average round trip time (RTT)

[lpenton@oelr5u7 ~]$ ping oelmysql -c 10
PING oelmysql (192.168.106.30) 56(84) bytes of data.
64 bytes from oelmysql (192.168.106.30): icmp_seq=1 ttl=64 time=0.667 ms
64 bytes from oelmysql (192.168.106.30): icmp_seq=2 ttl=64 time=0.550 ms
64 bytes from oelmysql (192.168.106.30): icmp_seq=3 ttl=64 time=0.805 ms
64 bytes from oelmysql (192.168.106.30): icmp_seq=4 ttl=64 time=0.811 ms
64 bytes from oelmysql (192.168.106.30): icmp_seq=5 ttl=64 time=0.656 ms
64 bytes from oelmysql (192.168.106.30): icmp_seq=6 ttl=64 time=0.873 ms
64 bytes from oelmysql (192.168.106.30): icmp_seq=7 ttl=64 time=0.892 ms
64 bytes from oelmysql (192.168.106.30): icmp_seq=8 ttl=64 time=0.684 ms
64 bytes from oelmysql (192.168.106.30): icmp_seq=9 ttl=64 time=0.868 ms
64 bytes from oelmysql (192.168.106.30): icmp_seq=10 ttl=64 time=0.894 ms
 
--- oelmysql ping statistics ---
10 packets transmitted, 10 received, 0% packet loss, time 9006ms
rtt min/avg/max/mdev = 0.550/0.770/0.894/0.115 ms

Compute the network bandwidth between the two servers, in bytes and multiply the quotient by the Average RTT to get the BDP value.

The link between my servers is 1,000 megabits, so my BDP value is computed as follows.

BDP = (1,000,000,000 / 8) * .00077

BDP = 96,250 bytes

The optimal OGG TCP Buffer setting is three times the BPD:

96,250 * 3 = 288,750 bytes.

It should be noted that for non-Windows systems, the source and target server Network Stack default settings may not be sufficient to support this buffer size. Additional tuning at the server OS level may be required to improve TCP/IP performance, which is beyond the scope of this article.

Initial Load Using OGG Trails

Since the majority of initial load problems are target side related, for recoverability, the best practice is to use OGG Trail files to hold the initial load data. In the the following sections we shall discuss configuration options for this load method.

Source Side OGG Extract Configuration

Since we have all of the basic information we need about the source side and communications bandwidth to the target, we can now do an initial OGG configuration. Based upon the number of rows and row length data of each table, we are going to create four Extracts that will read directly from the tables. The Extracts will be set to read from the tables as follows:

ExtractTable Read
TMYSQLCATEGORIES, CATEGORIES_DESCRIPTION, CUSTOMERS, CUSTOMERS_INFO, CUSTOMERS_LKUP, NEXT_CUST,
NEXT_ORDER, NOPKTBLE, ORDERS, ORDERS_PRODUCTS, ORDERS_STATUS_HISTORY, ORDERS_TOTAL,
PRODUCTS, PRODUCTS_DESCRIPTION, PRODUCTS_TO_CATEGORIES
TMYSQLAMILLTBLE
TMYSQLCMILLTBLEA
TMYSQLDMILLTBLEA

 

Common configuration settings used by all five Extracts will be:

userid ggadmin, password Oracle1
rmthost oelmysql, mgrport 15600, tcpbufsize 288750, tcpflushbytes 288750
reportcount every 60 seconds, rate

In the above settings, the clear text password is displayed for demonstration purposes only. Best practice dictates that database access passwords be encrypted. The RMTHOST designation also contains settings for the Extract’s TCP  buffer size. REPORTCOUNT is set for demonstration purposes only. If set too low this parameter will have a negative performance impact because it induces a slight delay while each Extract process writes statistics to its report file.

Initial load is the only time OGG Extracts should be configured to immediately transmit data to the target via TCP/IP. Primary Extract processes that read database transaction logs must write to a local OGG Extract Trail, which is then read by an Extract Data Pump for transmission to the target. Configuring the Primary Extract to transmit captured data over TCP/IP will result in sever performance issues and other failures if your network is not up 100% of the time.

We will use an OGG RMTFILE to store records to be applied to the target table by Replicat. To determine the size requirements of each RMTFILE, compute the size requirements for the OGG record header by multiplying the number of rows by 50. Add the product to the total bytes value gathered for each table previously.

The RMTFILE parameter setting for each Extract will be:

ExtractParameter Setting
TMYSQLrmtfile ./dirdat/tm, maxfiles 999999, megabytes 190, append
TMYSQLArmtfile ./dirdat/tn, maxfiles 999999, megabytes 1100, append
TMYSQLCrmtfile ./dirdat/tq, maxfiles 999999, megabytes 1500, append
TMYSQLDrmtfile ./dirdat/tr, maxfiles 999999, megabytes 1500, append

In this test configuration, each file is sized to hold all of the records read by Extract from each defined table. Recoverability is defined so that in the event of an Extract failure data will be appended to the file, if it exists, upon restart and an allotment is made to ensure multiple files will be created if needed.

NOTE: If the target side Replicats are not going to be run concurrently with the Extract, specify PURGE instead of APPEND.

Putting this all together, our initial load source side Extract parameter files look like this:

TMYSQLTMYSQLA
extract tmysql
userid ggadmin, password Oracle1
rmthost oelmysql, mgrport 15600, tcpbufsize 288750, tcpflushbytes 288750
rmtfile ./dirdat/tm, maxfiles 999999, megabytes 190, append
reportcount every 60 seconds, rate
tableexclude east.millitble
tableexclude east.milltblea
table east.*;
extract tmysqla
userid ggadmin, password Oracle1
rmthost oelmysql, mgrport 15600, tcpbufsize 288750, tcpflushbytes 288750
rmtfile ./dirdat/tn, maxfiles 999999, megabytes 1100, append
reportcount every 60 seconds, rate
table east.millitble;

 

We want to split the data load for the MILLTBLEA table across two Extract processes. We could use the @RANGE column conversion function to perform this task; however, because this function uses the primary key column values to determine the hash value we will not get a true 50% split. Instead we set a FILTER on the primary key column, COLA, and specify a primary key value based upon the existing data. To get the median value of the data in the table, we query the database:

SQL> select median (cola) as COLA from milltblea;
                    COLA
------------------------
 499,289,425,723,612,740

 

TMYSQLCTMYSQLD
extract tmysqlc
userid ggadmin, password Oracle1
rmthost oelmysql, mgrport 15600, tcpbufsize 288750, tcpflushbytes 288750
rmtfile ./dirdat/tq, maxfiles 999999, megabytes 1500, append
reportcount every 60 seconds, rate
table east.milltblea, filter (cola < 499289425723612740);
extract tmysqld
userid ggadmin, password Oracle1
rmthost oelmysql, mgrport 15600, tcpbufsize 288750, tcpflushbytes 288750
rmtfile ./dirdat/tr, maxfiles 999999, megabytes 1500, append
reportcount every 60 seconds, rate
table east.milltblea, filter (cola >= 499289425723612740);

 

In GGSCI, add each Extract specifying SOURCEISTABLE to have each process read directly from database tables. Because a SOURCEISTABLE Extract does not maintain checkpoint information, when started it will read all of the table’s data starting from the first row.

GGSCI (oelr5u7) 37> add extract tmysql, sourceistable
EXTRACT added.
 
GGSCI (oelr5u7) 38> add extract tmysqla, sourceistable
EXTRACT added.
 
GGSCI (oelr5u7) 39> add extract tmysqlc, sourceistable
EXTRACT added.
 
GGSCI (oelr5u7) 40> add extract tmysqld, sourceistable
EXTRACT added.

Test The Extract Configuration

Use the GGSCI command “START EXTRACT” to start each group, test the configuration, and get the data read rates. The GGSCI commands “INFO * TASK” and “STATUS * TASK” maybe used to get process status information. Since we specified the  REPORTCOUNT parameter for each Extract, the command “VIEW REPORT [extract name]” maybe used to see real-time processing rates.

This initial configuration results in the following runtime results:

ExtractTotal Runtime (secs)Total BytesBytes Per Second
TMYSQL29182,975,9136,309,514.24
TMYSQLA701,078,996,36215,414,233.74
TMYSQLC5311,319,636,2592,485,190.69
TMYSQLD5291,318,984,4552,493,354.36

 

To compute the total runtime, look in each Extract’s report file for a line similar to the following:

Report at 2013-11-05 13:17:21 (activity since 2013-11-05 13:16:52)

Get the time difference by subtracting the activity since time stamp from the report at time stamp, and then convert the difference into seconds.

Total bytes is obtained from the report file section labelled “Bytes output” under “Redo Log Statistics”. Divide this value by the total runtime seconds to get the average bytes processed per second.

Given the resource challenges of my very small virtual machine (1 CPU, 4Gb RAM, single disk), these numbers are very good. On a production machine additional Extract processes could be run against the table MILLTBLEA to further decrease the overall total run time.

Complete The Source Side Configuration

To complete the source side set-up: (1) use the OGG DEFGEN utility to create source database defines, (2) copy the DEFGEN output to the target server, (3) configure the Primary Extract, and (4) configure the Extract Data Pump(s).

Prior to starting the initial load be sure to terminate any long running transactions and start the Primary Extract and Extract Data Pumps.

Configure The Target Side OGG Replicats

Because Replicat must build and execute DML statements from the data read by Extract, it is much slower. Therefore, we will need multiple Replicat processes to load the target database quickly. To illustrate this lets look at one load stream.

The source TMYSQL Extract reads all of the smaller tables:

TABLE_NAME                         NUM_ROWS  AVG_ROW_LEN          TOTAL_BYTES
------------------------------ ------------ ------------ --------------------
NEXT_ORDER                                1            8               65,536
NEXT_CUST                                 1            7               65,536
ORDERS_TOTAL                              3           54               65,536
CATEGORIES                               21           44               65,536
CATEGORIES_DESCRIPTION                   21           18               65,536
PRODUCTS                                 28           86               65,536
PRODUCTS_DESCRIPTION                     28          562               65,536
PRODUCTS_TO_CATEGORIES                   28            9               65,536
CUSTOMERS_LKUP                           36           46               65,536
CUSTOMERS_INFO                        1,089           37              131,072
CUSTOMERS                             1,089           74              131,072
ORDERS                               30,001          311           11,534,336
ORDERS_STATUS_HISTORY                30,002           63            3,145,728
ORDERS_PRODUCTS                     389,733           59           28,311,552
NOPKTBLE                            500,000           30           18,874,368

These sixteen tables contain a total of 952,109 rows of data; which the Extract was able to read and send to the target in 29 seconds. A single Replicat took 4 minutes and 13 seconds to apply the data, or 8.7 times the total read time. Using this figure as a baseline, we can extrapolate that our larger tables will require multiple Replicats to apply the data in a timely fashion. As we saw with the Extracts, this is where the MAP parameter option FILTER and column conversion function @RANGE comes to our aid.

Using this information, we can set a test target side configuration. The Replicat configuration used to apply all of the smaller table data is:

LMYSQL
replicat lmysql
sourcedefs ./dirdef/ora11_east.def
targetdb east@oelmysql, userid gguser, password Oracle1
discardfile ./dirrpt/lmysql.dsc, purge
reportcount every 60 seconds, rate
overridedups
end runtime
map east.*, target east.*;

 

Because the data for the table MILLITBLE is contained within one OGG RMTFILE we can use the @RANGE function to create a hash based upon the table primary key value. In the parameter files below we are splitting the load across two Replicat processes. It should be noted that because the primary key definition is used to create the hash value, this will not be a true 50% split; but it will be pretty close.

LMYSQLALMYSQLB
replicat lmysqla
sourcedefs ./dirdef/ora11_east.def
targetdb east@oelmysql, userid gguser, password Oracle1
discardfile ./dirrpt/lmysqla.dsc, purge
reportcount every 60 seconds, rate
overridedups
end runtime
MAP EAST.MILLITBLE, TARGET east.MILLITBLE, filter (@RANGE (1,2));
replicat lmysqlb
sourcedefs ./dirdef/ora11_east.def
targetdb east@oelmysql, userid gguser, password Oracle1
discardfile ./dirrpt/lmysqlb.dsc, purge
reportcount every 60 seconds, rate
overridedups
end runtime
MAP EAST.MILLITBLE, TARGET east.MILLITBLE, filter (@RANGE (2,2));

 

The data for MILLTBLEA was already split into two RMTFILEs by the source side Extract. To speed up data apply, we want to have two Replicats process the data contained within each RMTFILE. Since @RANGE uses the table primary key to determine the hash value for data balancing, we cannot use that function in this case. Instead we must set a FILTER on the primary key column and define values based upon the source table median.

For the first two Replicats, the filter data is the median value divided by two. One Replicat will apply records with a primary key value less than this value and the other will apply records of greater than or equal to the value.

LMYSQLCLMYSQLD
replicat lmysqlc
sourcedefs ./dirdef/ora11_east.def
targetdb east@oelmysql, userid gguser, password Oracle1
discardfile ./dirrpt/lmysqlc.dsc, purge
reportcount every 60 seconds, rate
overridedups
end runtime
MAP EAST.MILLTBLEA, TARGET east.MILLTBLEA,
FILTER (COLA < 249644712861806000)
;
replicat lmysqld
sourcedefs ./dirdef/ora11_east.def
targetdb east@oelmysql, userid gguser, password Oracle1
discardfile ./dirrpt/lmysqld.dsc, purge
reportcount every 60 seconds, rate
overridedups
end runtime
MAP EAST.MILLTBLEA, TARGET east.MILLTBLEA,
FILTER (COLA >= 249644712861806000)
;

 

Since the second two Replicats will process data from an RMTFILE containing records greater than or equal to the median source data value, the FILTER values are computed as ((median value / 2) + median value), which is this case is 748934138585419000.

LMYSQLELMYSQLF
replicat lmysqle
sourcedefs ./dirdef/ora11_east.def
targetdb east@oelmysql, userid gguser, password Oracle1
discardfile ./dirrpt/lmysqle.dsc, purge
reportcount every 60 seconds, rate
overridedups
end runtime
MAP EAST.MILLTBLEA, TARGET east.MILLTBLEA,
FILTER (COLA < 748934138585419000)
;
replicat lmysqlf
sourcedefs ./dirdef/ora11_east.def
targetdb east@oelmysql, userid gguser, password Oracle1
discardfile ./dirrpt/lmysqlf.dsc, purge
reportcount every 60 seconds, rate
overridedups
end runtime
MAP EAST.MILLTBLEA, TARGET east.MILLTBLEA,
filter (COLA >= 748934138585419000)
;

 

The parameter OVERRIDEDUPS is used to provide recoverability should there be a need to restart the source side Extract or re-run the Replicat. This will allow the Replicat to run without first truncating the target tables. However, you should note that this will also slow down data apply in this case because the Replicat will perform two database operations whenever a duplicate row is detected. It is important to note that this setting should be made only if delete operations are not performed on the source tables. If deletes occur in the source, and the source side Extract was re-run, the target tables must be truncated before restarting the initial load process.

END RUNTIME will cause the Replicat to perform a graceful shutdown when it has no more data to process. For this article’s tests none of the Replicats were started until the source side Extracts completed so we could obtain concise timing data.

If a large database is being loaded, consider not specifying END RUNTIME. That way the Replicats can all be started, and each will wait for data to process. The caveat is that manual review of the Replicats is required to determine when each has completed their portion of the data load.

In GGSCI add the Replicats:

GGSCI (oelmysql) 1> dblogin sourcedb east@oelmysql, userid gguser, password Oracle1
Successfully logged into database.
 
GGSCI (oelmysql) 2> info checkpointtable
 
No checkpoint table specified, using GLOBALS specification (east.checkpoint)...
 
Checkpoint table east.checkpoint created 2013-11-06 04:44:30.
 
GGSCI (oelmysql) 3> add replicat lmysql, exttrail ./dirdat/tm
REPLICAT added.
 
GGSCI (oelmysql) 4> add replicat lmysqla, exttrail ./dirdat/tn
REPLICAT added.
 
GGSCI (oelmysql) 5> add replicat lmysqlb, exttrail ./dirdat/tn
REPLICAT added.
 
GGSCI (oelmysql) 6> add replicat lmysqlc, exttrail ./dirdat/tq
REPLICAT added.
 
GGSCI (oelmysql) 7> add replicat lmysqld, exttrail ./dirdat/tq
REPLICAT added.
 
GGSCI (oelmysql) 8> add replicat lmysqle, exttrail ./dirdat/tr
REPLICAT added.
 
GGSCI (oelmysql) 9> add replicat lmysqlf, exttrail ./dirdat/tr
REPLICAT added.

Test The Replicat Configuration

Use the GGSCI command “START EXTRACT” to start each source side group, then on the target side use the command “START REPLICAT” to start each of the Replicat load processes to test the configuration, and get the data read rates. Remember not to start any Replicats where @RANGE or FILTER have been specified until after the source side Extracts complete their processing. The GGSCI commands “INFO L*” and “STATUS L*” maybe used to get process status information. Since we specified the  REPORTCOUNT parameter for each Replicat, the command “VIEW REPORT [replicat name]” maybe used to see real-time processing rates.

This initial configuration results in the following runtime results:

ReplicatTotal Runtime (secs)Total RecordsRecords Per Second
LMYSQL124502,2584,050.47
LMYSQLA2,1771,002,367460.44
LMYSQLB2,158997,466462.22
LMYSQLC1,360500,331367.89
LMYSQLD1,354499,669369.03
LMYSQLE1,389497,912358.47
LMYSQLF1,390502,088361.21

Given the resource challenges of my very small virtual machine (1 CPU, 4Gb RAM, single disk), these numbers are very good. On a production machine additional Replicat processes could be used to further decrease the overall total run time.

OGG Direct Load

Using OGG Direct Load, where Extract reads from the source tables and sends this data via TCP/IP directly to the target Replicat, may be faster for smaller database loads. However, there are limitations with the Direct Load method:

  • LOB data types are not supported.
  • There is no recoverability. Any failure in the load mechanism requires a truncate of the target table and reload from the source.
  • TCPBUFSIZE and TCPFLUSHBYTES settings cannot be configured in the Extract. The default setting of 30,000 bytes will be used.
  • Because there is a one-to-one relationship between the Extract and Replicat, the FILTER and @RANGE options may only be used in Extract.

If Direct Load is the method chosen for performing the initial load, the key is to spread the data load process over multiple Extract and Replicat groups. The number of groups depends upon available server resources, disk subsystem speed, and source data.

Using this information, lets look at one possible Direct Load configuration for our test database:

ExtractReplicatTotal BytesLoad Time (secs)Bytes Per Second
extract tmysql
userid ggadmin, password Oracle1
rmthost oelmysql, mgrport 15600
rmttask replicat, group lmysql
tableexclude east.millitble
tableexclude east.milltblea
tableexclude east.nopktble
table east.*;
replicat lmysql
sourcedefs ./dirdef/ora11_east.def
targetdb east@oelmysql, userid gguser, password Oracle1
discardfile ./dirrpt/lmysql.dsc, purge
overridedups
map east.*, target east.*;
112,297,398174408,752.71
extract tmysqla
userid ggadmin, password Oracle1
rmthost oelmysql, mgrport 15600
rmttask replicat, group lmysqla
table east.nopktble;
replicat lmysqla
sourcedefs ./dirdef/ora11_east.def
targetdb east@oelmysql, userid gguser, password Oracle1
discardfile ./dirrpt/lmysqla.dsc, purge
overridedups
map east.*, target east.*;
70,678,515 159 444,518.96
extract tmysqlb
userid ggadmin, password Oracle1
rmthost oelmysql, mgrport 15600
rmttask replicat, group lmysqlb
table east.millitble, filter (@RANGE (1,4));
replicat lmysqlb
sourcedefs ./dirdef/ora11_east.def
targetdb east@oelmysql, userid gguser, password Oracle1
discardfile ./dirrpt/lmysqlb.dsc, purge
overridedups
map east.*, target east.*;
271,046,102902 300,494.57
extract tmysqlc
userid ggadmin, password Oracle1
rmthost oelmysql, mgrport 15600
rmttask replicat, group lmysqlc
table east.millitble, filter (@RANGE (2,4));
replicat lmysqlc
sourcedefs ./dirdef/ora11_east.def
targetdb east@oelmysql, userid gguser, password Oracle1
discardfile ./dirrpt/lmysqlc.dsc, purge
overridedups
map east.*, target east.*;
269,056,311 898 299,617.27
extract tmysqld
userid ggadmin, password Oracle1
rmthost oelmysql, mgrport 15600
rmttask replicat, group lmysqld
table east.millitble, filter (@RANGE (3,4));
replicat lmysqld
sourcedefs ./dirdef/ora11_east.def
targetdb east@oelmysql, userid gguser, password Oracle1
discardfile ./dirrpt/lmysqld.dsc, purge
overridedups
map east.*, target east.*;
 269,854,784863312,693.84
extract tmysqle
userid ggadmin, password Oracle1
rmthost oelmysql, mgrport 15600
rmttask replicat, group lmysqle
table east.millitble, filter (@RANGE (4,4));
replicat lmysqle
sourcedefs ./dirdef/ora11_east.def
targetdb east@oelmysql, userid gguser, password Oracle1
discardfile ./dirrpt/lmysqle.dsc, purge
overridedups
map east.*, target east.*;
 269,039,165831 323,753.51

The load times for east.millitble could be reduced by increasing the @RANGE value to 6; however, this would exceed the limits of my test server. Likewise, for the table east.milltblea I would have initially configured and tested 10 Extract and Replicat pairs because of the size and complexity of the data.

The parameter OVERRIDEDUPS is used to provide recoverability should there be a need to restart the source side Extract. This will allow the Replicat to run without first truncating the target tables. However, you should note that this will also slow down data apply in this case because the Replicat will perform two database operations whenever a duplicate row is detected. It is important to note that this setting should be made only if delete operations are not performed on the source tables. If deletes occur in the source, and the source side Extract was re-run, the target tables must be truncated before restarting the initial load process.

Change The Extract Output

As demonstrated previously, Extract reads data directly from the source tables quickly, the bottleneck is Replicat. The Extract options FORMATASCII, FORMATSQL, and FORMATXML maybe used to change Extract’s output so data load mechanisms other than Replicat may be used.

When using the output formatter option, the Extract configuration would be similar to the following:

FORMATASCIIFORMATSQLFORMATXML
extract tmysqlb
userid ggadmin, password Oracle1
rmthost oelmysql, mgrport 15600, tcpbufsize 288750, tcpflushbytes 288750
formatascii
rmtfile ./dirdat/tn, maxfiles 999999, megabytes 1100, purge
reportcount every 60 seconds, rate
table east.millitble;
extract tmysqlb
userid ggadmin, password Oracle1
rmthost oelmysql, mgrport 15600, tcpbufsize 288750, tcpflushbytes 288750
formatsql
rmtfile ./dirdat/tn, maxfiles 999999, megabytes 1100, purge
reportcount every 60 seconds, rate
table east.millitble;
extract tmysqlb
userid ggadmin, password Oracle1
rmthost oelmysql, mgrport 15600, tcpbufsize 288750, tcpflushbytes 288750
formatxml
rmtfile ./dirdat/tn, maxfiles 999999, megabytes 1100, purge
reportcount every 60 seconds, rate
table east.millitble;

As seen in our prior tests, we can further reduce data read time by splitting the workload for large tables across multiple Extract processes via the FILTER and @RANGE options.

Target Database Instantiation

No matter which initial load procedure is chosen, the process for performing a target database instantiation will remain the same:

  • Make database changes required to support OGG transactional data capture.
  • Configure the OGG source and target components.
  • Stop all long running transactions.
  • Start the Primary Extract and Extract Data Pump.
  • If the source database is Oracle get the current SCN. If the source database is DB2, SQL Server, or Sybase get the current LSN.
  • Start the Primary Extract and Extract Data Pump.
  • Perform the initial load.
  • Wait for the initial load to complete.
  • If the source database is Oracle, DB2, SQL Server, or Sybase start each Replicat specifying the ATCSN option and the CSN/LSN value obtained previously from the source database.

If the source database is other than Oracle, DB2, SQL Server or Sybase:

  • Ensure the parameters HANDLECOLLISIONS and END RUNTIME exists in each Replicat parameter file.
  • Start each Replicat.
  • Monitor the Replicat activity.
  • As each Replicat enters the STOPPED state, remove the parameters HANDLECOLLISIONS and END RUNTIME from its parameter file.
  • Restart the Replicat.

Summary

Oracle GoldenGate may be configured to perform the initial load of heterogeneous data efficiently via the use of advanced configuration options.