Introduction
Many times when working with customers, we hear “We were told never to use Oracle GoldenGate (OGG) to do initial loads, so how do we load our non-Oracle data to Oracle?” It is true that native database utilities can be used to make copies of large databases faster than OGG in a homogeneous (like-to-like) implementation. However, when loading data of dissimilar structure OGG may be the best, and possibly only, data load solution. The key is to use some of the more powerful, and advanced features of the product to perform parallel data extraction, conversion, and apply.
In this article we shall present examples of these features to perform an Oracle Database to MySQL initial load. Although the database queries and OGG configuration are specific for these two databases, the concepts presented may be used for any initial load process.
The scripts and information provided in this article are for educational purposes only. They are not supported by Oracle Development or Support, and come with no guarantee or warrant for functionality in any environment other than the test system used to prepare this article.
Main Article
Initial load is the cornerstone of all successful OGG implementations. As such, sufficient time for planning and testing must be allotted to determine the correct target database table definitions and optimal OGG settings.
Source to Target Table DDL
This important task can be cumbersome because there are no utilities provided by Oracle for converting DDL between the various types of databases. Things to double check include primary key or unique index definitions, foreign key constraints, data type support, reserved words, and default precisions.
For example, in our source Oracle 11g database we have several tables with columns defined as TIMESTAMP. The default for Oracle TIMESTAMP includes a six digit fractional seconds precision value, which correlates to TIMESTAMP(6). The default for MySQL is to not include the fractional seconds precision, which correlates to TIMESTAMP(0).
To ensure the Oracle TIMESTAMP is correctly applied the target MySQL columns must be specified as TIMESTAMP(6), otherwise the fractional seconds data will be truncated resulting in a potential out of sync condition between the source and target databases.
Source Data Size
Query the database to get the source data size; that is; the number of rows per table, average row length, and total bytes per table. This information will be used to determine the number of OGG initial load data extraction processes required. To get this information for my source database I ran the following query (table statistics must be current for this query to work):
set lin 300; COL TABLE_NAME FORMAT A30; COL NUM_ROWS FORMAT 999,999,999; COL AVG_ROW_LEN FORMAT 999,999,999; COL TOTAL_BYTES FORMAT 999,999,999,999,999; SELECT a.table_name, a.num_rows, a.avg_row_len, b.total_bytes FROM ( SELECT table_name, num_rows, avg_row_len FROM user_tables) a, ( SELECT SUM (BYTES) total_bytes, segment_name AS table_name FROM user_extents WHERE segment_type = 'TABLE' GROUP BY segment_name) b WHERE a.table_name = b.table_name order by num_rows; |
Which returns the following:
TABLE_NAME NUM_ROWS AVG_ROW_LEN TOTAL_BYTES ------------------------------ ------------ ------------ -------------------- NEXT_ORDER 1 8 65,536 NEXT_CUST 1 7 65,536 ORDERS_TOTAL 3 54 65,536 CATEGORIES 21 44 65,536 CATEGORIES_DESCRIPTION 21 18 65,536 PRODUCTS 28 86 65,536 PRODUCTS_DESCRIPTION 28 562 65,536 PRODUCTS_TO_CATEGORIES 28 9 65,536 CUSTOMERS_LKUP 36 46 65,536 CUSTOMERS_INFO 1,089 37 131,072 CUSTOMERS 1,089 74 131,072 ORDERS 30,001 311 11,534,336 ORDERS_STATUS_HISTORY 30,002 63 3,145,728 ORDERS_PRODUCTS 389,733 59 28,311,552 NOPKTBLE 500,000 30 18,874,368 MILLITBLE 1,999,833 408 965,738,496 MILLTBLEA 2,000,000 1,188 3,019,898,880 17 rows selected. |
TCP/IP Bandwidth Delay Product
The TCP/IP Bandwidth Delay Product (BDP) is used to determine the optimal setting of the OGG TCP Buffer.
Use the “ping” utility to send 10 data packets to the target. This will give us the average round trip time (RTT)
[lpenton@oelr5u7 ~]$ ping oelmysql -c 10 PING oelmysql (192.168.106.30) 56(84) bytes of data. 64 bytes from oelmysql (192.168.106.30): icmp_seq=1 ttl=64 time=0.667 ms 64 bytes from oelmysql (192.168.106.30): icmp_seq=2 ttl=64 time=0.550 ms 64 bytes from oelmysql (192.168.106.30): icmp_seq=3 ttl=64 time=0.805 ms 64 bytes from oelmysql (192.168.106.30): icmp_seq=4 ttl=64 time=0.811 ms 64 bytes from oelmysql (192.168.106.30): icmp_seq=5 ttl=64 time=0.656 ms 64 bytes from oelmysql (192.168.106.30): icmp_seq=6 ttl=64 time=0.873 ms 64 bytes from oelmysql (192.168.106.30): icmp_seq=7 ttl=64 time=0.892 ms 64 bytes from oelmysql (192.168.106.30): icmp_seq=8 ttl=64 time=0.684 ms 64 bytes from oelmysql (192.168.106.30): icmp_seq=9 ttl=64 time=0.868 ms 64 bytes from oelmysql (192.168.106.30): icmp_seq=10 ttl=64 time=0.894 ms --- oelmysql ping statistics --- 10 packets transmitted, 10 received, 0% packet loss, time 9006ms rtt min/avg/max/mdev = 0.550/0.770/0.894/0.115 ms |
Compute the network bandwidth between the two servers, in bytes and multiply the quotient by the Average RTT to get the BDP value.
The link between my servers is 1,000 megabits, so my BDP value is computed as follows.
BDP = (1,000,000,000 / 8) * .00077
BDP = 96,250 bytes
The optimal OGG TCP Buffer setting is three times the BPD:
96,250 * 3 = 288,750 bytes.
It should be noted that for non-Windows systems, the source and target server Network Stack default settings may not be sufficient to support this buffer size. Additional tuning at the server OS level may be required to improve TCP/IP performance, which is beyond the scope of this article.
Initial Load Using OGG Trails
Since the majority of initial load problems are target side related, for recoverability, the best practice is to use OGG Trail files to hold the initial load data. In the the following sections we shall discuss configuration options for this load method.
Source Side OGG Extract Configuration
Since we have all of the basic information we need about the source side and communications bandwidth to the target, we can now do an initial OGG configuration. Based upon the number of rows and row length data of each table, we are going to create four Extracts that will read directly from the tables. The Extracts will be set to read from the tables as follows:
Extract | Table Read |
---|---|
TMYSQL | CATEGORIES, CATEGORIES_DESCRIPTION, CUSTOMERS, CUSTOMERS_INFO, CUSTOMERS_LKUP, NEXT_CUST, NEXT_ORDER, NOPKTBLE, ORDERS, ORDERS_PRODUCTS, ORDERS_STATUS_HISTORY, ORDERS_TOTAL, PRODUCTS, PRODUCTS_DESCRIPTION, PRODUCTS_TO_CATEGORIES |
TMYSQLA | MILLTBLE |
TMYSQLC | MILLTBLEA |
TMYSQLD | MILLTBLEA |
Common configuration settings used by all five Extracts will be:
userid ggadmin, password Oracle1 rmthost oelmysql, mgrport 15600, tcpbufsize 288750, tcpflushbytes 288750 reportcount every 60 seconds, rate |
In the above settings, the clear text password is displayed for demonstration purposes only. Best practice dictates that database access passwords be encrypted. The RMTHOST designation also contains settings for the Extract’s TCP buffer size. REPORTCOUNT is set for demonstration purposes only. If set too low this parameter will have a negative performance impact because it induces a slight delay while each Extract process writes statistics to its report file.
Initial load is the only time OGG Extracts should be configured to immediately transmit data to the target via TCP/IP. Primary Extract processes that read database transaction logs must write to a local OGG Extract Trail, which is then read by an Extract Data Pump for transmission to the target. Configuring the Primary Extract to transmit captured data over TCP/IP will result in sever performance issues and other failures if your network is not up 100% of the time.
We will use an OGG RMTFILE to store records to be applied to the target table by Replicat. To determine the size requirements of each RMTFILE, compute the size requirements for the OGG record header by multiplying the number of rows by 50. Add the product to the total bytes value gathered for each table previously.
The RMTFILE parameter setting for each Extract will be:
Extract | Parameter Setting |
---|---|
TMYSQL | rmtfile ./dirdat/tm, maxfiles 999999, megabytes 190, append |
TMYSQLA | rmtfile ./dirdat/tn, maxfiles 999999, megabytes 1100, append |
TMYSQLC | rmtfile ./dirdat/tq, maxfiles 999999, megabytes 1500, append |
TMYSQLD | rmtfile ./dirdat/tr, maxfiles 999999, megabytes 1500, append |
In this test configuration, each file is sized to hold all of the records read by Extract from each defined table. Recoverability is defined so that in the event of an Extract failure data will be appended to the file, if it exists, upon restart and an allotment is made to ensure multiple files will be created if needed.
NOTE: If the target side Replicats are not going to be run concurrently with the Extract, specify PURGE instead of APPEND.
Putting this all together, our initial load source side Extract parameter files look like this:
TMYSQL | TMYSQLA |
---|---|
extract tmysql userid ggadmin, password Oracle1 rmthost oelmysql, mgrport 15600, tcpbufsize 288750, tcpflushbytes 288750 rmtfile ./dirdat/tm, maxfiles 999999, megabytes 190, append reportcount every 60 seconds, rate tableexclude east.millitble tableexclude east.milltblea table east.*; | extract tmysqla userid ggadmin, password Oracle1 rmthost oelmysql, mgrport 15600, tcpbufsize 288750, tcpflushbytes 288750 rmtfile ./dirdat/tn, maxfiles 999999, megabytes 1100, append reportcount every 60 seconds, rate table east.millitble; |
We want to split the data load for the MILLTBLEA table across two Extract processes. We could use the @RANGE column conversion function to perform this task; however, because this function uses the primary key column values to determine the hash value we will not get a true 50% split. Instead we set a FILTER on the primary key column, COLA, and specify a primary key value based upon the existing data. To get the median value of the data in the table, we query the database:
SQL> select median (cola) as COLA from milltblea;
COLA ------------------------ 499,289,425,723,612,740
TMYSQLC | TMYSQLD |
---|---|
extract tmysqlc userid ggadmin, password Oracle1 rmthost oelmysql, mgrport 15600, tcpbufsize 288750, tcpflushbytes 288750 rmtfile ./dirdat/tq, maxfiles 999999, megabytes 1500, append reportcount every 60 seconds, rate table east.milltblea, filter (cola < 499289425723612740); | extract tmysqld userid ggadmin, password Oracle1 rmthost oelmysql, mgrport 15600, tcpbufsize 288750, tcpflushbytes 288750 rmtfile ./dirdat/tr, maxfiles 999999, megabytes 1500, append reportcount every 60 seconds, rate table east.milltblea, filter (cola >= 499289425723612740); |
In GGSCI, add each Extract specifying SOURCEISTABLE to have each process read directly from database tables. Because a SOURCEISTABLE Extract does not maintain checkpoint information, when started it will read all of the table’s data starting from the first row.
GGSCI (oelr5u7) 37> add extract tmysql, sourceistable EXTRACT added. GGSCI (oelr5u7) 38> add extract tmysqla, sourceistable EXTRACT added. GGSCI (oelr5u7) 39> add extract tmysqlc, sourceistable EXTRACT added. GGSCI (oelr5u7) 40> add extract tmysqld, sourceistable EXTRACT added. |
Test The Extract Configuration
Use the GGSCI command “START EXTRACT” to start each group, test the configuration, and get the data read rates. The GGSCI commands “INFO * TASK” and “STATUS * TASK” maybe used to get process status information. Since we specified the REPORTCOUNT parameter for each Extract, the command “VIEW REPORT [extract name]” maybe used to see real-time processing rates.
This initial configuration results in the following runtime results:
Extract | Total Runtime (secs) | Total Bytes | Bytes Per Second |
---|---|---|---|
TMYSQL | 29 | 182,975,913 | 6,309,514.24 |
TMYSQLA | 70 | 1,078,996,362 | 15,414,233.74 |
TMYSQLC | 531 | 1,319,636,259 | 2,485,190.69 |
TMYSQLD | 529 | 1,318,984,455 | 2,493,354.36 |
To compute the total runtime, look in each Extract’s report file for a line similar to the following:
Report at 2013-11-05 13:17:21 (activity since 2013-11-05 13:16:52)
Get the time difference by subtracting the activity since time stamp from the report at time stamp, and then convert the difference into seconds.
Total bytes is obtained from the report file section labelled “Bytes output” under “Redo Log Statistics”. Divide this value by the total runtime seconds to get the average bytes processed per second.
Given the resource challenges of my very small virtual machine (1 CPU, 4Gb RAM, single disk), these numbers are very good. On a production machine additional Extract processes could be run against the table MILLTBLEA to further decrease the overall total run time.
Complete The Source Side Configuration
To complete the source side set-up: (1) use the OGG DEFGEN utility to create source database defines, (2) copy the DEFGEN output to the target server, (3) configure the Primary Extract, and (4) configure the Extract Data Pump(s).
Prior to starting the initial load be sure to terminate any long running transactions and start the Primary Extract and Extract Data Pumps.
Configure The Target Side OGG Replicats
Because Replicat must build and execute DML statements from the data read by Extract, it is much slower. Therefore, we will need multiple Replicat processes to load the target database quickly. To illustrate this lets look at one load stream.
The source TMYSQL Extract reads all of the smaller tables:
TABLE_NAME NUM_ROWS AVG_ROW_LEN TOTAL_BYTES ------------------------------ ------------ ------------ -------------------- NEXT_ORDER 1 8 65,536 NEXT_CUST 1 7 65,536 ORDERS_TOTAL 3 54 65,536 CATEGORIES 21 44 65,536 CATEGORIES_DESCRIPTION 21 18 65,536 PRODUCTS 28 86 65,536 PRODUCTS_DESCRIPTION 28 562 65,536 PRODUCTS_TO_CATEGORIES 28 9 65,536 CUSTOMERS_LKUP 36 46 65,536 CUSTOMERS_INFO 1,089 37 131,072 CUSTOMERS 1,089 74 131,072 ORDERS 30,001 311 11,534,336 ORDERS_STATUS_HISTORY 30,002 63 3,145,728 ORDERS_PRODUCTS 389,733 59 28,311,552 NOPKTBLE 500,000 30 18,874,368
These sixteen tables contain a total of 952,109 rows of data; which the Extract was able to read and send to the target in 29 seconds. A single Replicat took 4 minutes and 13 seconds to apply the data, or 8.7 times the total read time. Using this figure as a baseline, we can extrapolate that our larger tables will require multiple Replicats to apply the data in a timely fashion. As we saw with the Extracts, this is where the MAP parameter option FILTER and column conversion function @RANGE comes to our aid.
Using this information, we can set a test target side configuration. The Replicat configuration used to apply all of the smaller table data is:
LMYSQL |
---|
replicat lmysql sourcedefs ./dirdef/ora11_east.def targetdb east@oelmysql, userid gguser, password Oracle1 discardfile ./dirrpt/lmysql.dsc, purge reportcount every 60 seconds, rate overridedups end runtime map east.*, target east.*; |
Because the data for the table MILLITBLE is contained within one OGG RMTFILE we can use the @RANGE function to create a hash based upon the table primary key value. In the parameter files below we are splitting the load across two Replicat processes. It should be noted that because the primary key definition is used to create the hash value, this will not be a true 50% split; but it will be pretty close.
LMYSQLA | LMYSQLB |
---|---|
replicat lmysqla sourcedefs ./dirdef/ora11_east.def targetdb east@oelmysql, userid gguser, password Oracle1 discardfile ./dirrpt/lmysqla.dsc, purge reportcount every 60 seconds, rate overridedups end runtime MAP EAST.MILLITBLE, TARGET east.MILLITBLE, filter (@RANGE (1,2)); | replicat lmysqlb sourcedefs ./dirdef/ora11_east.def targetdb east@oelmysql, userid gguser, password Oracle1 discardfile ./dirrpt/lmysqlb.dsc, purge reportcount every 60 seconds, rate overridedups end runtime MAP EAST.MILLITBLE, TARGET east.MILLITBLE, filter (@RANGE (2,2)); |
The data for MILLTBLEA was already split into two RMTFILEs by the source side Extract. To speed up data apply, we want to have two Replicats process the data contained within each RMTFILE. Since @RANGE uses the table primary key to determine the hash value for data balancing, we cannot use that function in this case. Instead we must set a FILTER on the primary key column and define values based upon the source table median.
For the first two Replicats, the filter data is the median value divided by two. One Replicat will apply records with a primary key value less than this value and the other will apply records of greater than or equal to the value.
LMYSQLC | LMYSQLD |
---|---|
replicat lmysqlc sourcedefs ./dirdef/ora11_east.def targetdb east@oelmysql, userid gguser, password Oracle1 discardfile ./dirrpt/lmysqlc.dsc, purge reportcount every 60 seconds, rate overridedups end runtime MAP EAST.MILLTBLEA, TARGET east.MILLTBLEA, FILTER (COLA < 249644712861806000) ; | replicat lmysqld sourcedefs ./dirdef/ora11_east.def targetdb east@oelmysql, userid gguser, password Oracle1 discardfile ./dirrpt/lmysqld.dsc, purge reportcount every 60 seconds, rate overridedups end runtime MAP EAST.MILLTBLEA, TARGET east.MILLTBLEA, FILTER (COLA >= 249644712861806000) ; |
Since the second two Replicats will process data from an RMTFILE containing records greater than or equal to the median source data value, the FILTER values are computed as ((median value / 2) + median value), which is this case is 748934138585419000.
LMYSQLE | LMYSQLF |
---|---|
replicat lmysqle sourcedefs ./dirdef/ora11_east.def targetdb east@oelmysql, userid gguser, password Oracle1 discardfile ./dirrpt/lmysqle.dsc, purge reportcount every 60 seconds, rate overridedups end runtime MAP EAST.MILLTBLEA, TARGET east.MILLTBLEA, FILTER (COLA < 748934138585419000) ; | replicat lmysqlf sourcedefs ./dirdef/ora11_east.def targetdb east@oelmysql, userid gguser, password Oracle1 discardfile ./dirrpt/lmysqlf.dsc, purge reportcount every 60 seconds, rate overridedups end runtime MAP EAST.MILLTBLEA, TARGET east.MILLTBLEA, filter (COLA >= 748934138585419000) ; |
The parameter OVERRIDEDUPS is used to provide recoverability should there be a need to restart the source side Extract or re-run the Replicat. This will allow the Replicat to run without first truncating the target tables. However, you should note that this will also slow down data apply in this case because the Replicat will perform two database operations whenever a duplicate row is detected. It is important to note that this setting should be made only if delete operations are not performed on the source tables. If deletes occur in the source, and the source side Extract was re-run, the target tables must be truncated before restarting the initial load process.
END RUNTIME will cause the Replicat to perform a graceful shutdown when it has no more data to process. For this article’s tests none of the Replicats were started until the source side Extracts completed so we could obtain concise timing data.
If a large database is being loaded, consider not specifying END RUNTIME. That way the Replicats can all be started, and each will wait for data to process. The caveat is that manual review of the Replicats is required to determine when each has completed their portion of the data load.
In GGSCI add the Replicats:
GGSCI (oelmysql) 1> dblogin sourcedb east@oelmysql, userid gguser, password Oracle1 Successfully logged into database. GGSCI (oelmysql) 2> info checkpointtable No checkpoint table specified, using GLOBALS specification (east.checkpoint)... Checkpoint table east.checkpoint created 2013-11-06 04:44:30. GGSCI (oelmysql) 3> add replicat lmysql, exttrail ./dirdat/tm REPLICAT added. GGSCI (oelmysql) 4> add replicat lmysqla, exttrail ./dirdat/tn REPLICAT added. GGSCI (oelmysql) 5> add replicat lmysqlb, exttrail ./dirdat/tn REPLICAT added. GGSCI (oelmysql) 6> add replicat lmysqlc, exttrail ./dirdat/tq REPLICAT added. GGSCI (oelmysql) 7> add replicat lmysqld, exttrail ./dirdat/tq REPLICAT added. GGSCI (oelmysql) 8> add replicat lmysqle, exttrail ./dirdat/tr REPLICAT added. GGSCI (oelmysql) 9> add replicat lmysqlf, exttrail ./dirdat/tr REPLICAT added. |
Test The Replicat Configuration
Use the GGSCI command “START EXTRACT” to start each source side group, then on the target side use the command “START REPLICAT” to start each of the Replicat load processes to test the configuration, and get the data read rates. Remember not to start any Replicats where @RANGE or FILTER have been specified until after the source side Extracts complete their processing. The GGSCI commands “INFO L*” and “STATUS L*” maybe used to get process status information. Since we specified the REPORTCOUNT parameter for each Replicat, the command “VIEW REPORT [replicat name]” maybe used to see real-time processing rates.
This initial configuration results in the following runtime results:
Replicat | Total Runtime (secs) | Total Records | Records Per Second |
---|---|---|---|
LMYSQL | 124 | 502,258 | 4,050.47 |
LMYSQLA | 2,177 | 1,002,367 | 460.44 |
LMYSQLB | 2,158 | 997,466 | 462.22 |
LMYSQLC | 1,360 | 500,331 | 367.89 |
LMYSQLD | 1,354 | 499,669 | 369.03 |
LMYSQLE | 1,389 | 497,912 | 358.47 |
LMYSQLF | 1,390 | 502,088 | 361.21 |
Given the resource challenges of my very small virtual machine (1 CPU, 4Gb RAM, single disk), these numbers are very good. On a production machine additional Replicat processes could be used to further decrease the overall total run time.
OGG Direct Load
Using OGG Direct Load, where Extract reads from the source tables and sends this data via TCP/IP directly to the target Replicat, may be faster for smaller database loads. However, there are limitations with the Direct Load method:
- LOB data types are not supported.
- There is no recoverability. Any failure in the load mechanism requires a truncate of the target table and reload from the source.
- TCPBUFSIZE and TCPFLUSHBYTES settings cannot be configured in the Extract. The default setting of 30,000 bytes will be used.
- Because there is a one-to-one relationship between the Extract and Replicat, the FILTER and @RANGE options may only be used in Extract.
If Direct Load is the method chosen for performing the initial load, the key is to spread the data load process over multiple Extract and Replicat groups. The number of groups depends upon available server resources, disk subsystem speed, and source data.
Using this information, lets look at one possible Direct Load configuration for our test database:
Extract | Replicat | Total Bytes | Load Time (secs) | Bytes Per Second |
---|---|---|---|---|
extract tmysql userid ggadmin, password Oracle1 rmthost oelmysql, mgrport 15600 rmttask replicat, group lmysql tableexclude east.millitble tableexclude east.milltblea tableexclude east.nopktble table east.*; | replicat lmysql sourcedefs ./dirdef/ora11_east.def targetdb east@oelmysql, userid gguser, password Oracle1 discardfile ./dirrpt/lmysql.dsc, purge overridedups map east.*, target east.*; | 112,297,398 | 174 | 408,752.71 |
extract tmysqla userid ggadmin, password Oracle1 rmthost oelmysql, mgrport 15600 rmttask replicat, group lmysqla table east.nopktble; | replicat lmysqla sourcedefs ./dirdef/ora11_east.def targetdb east@oelmysql, userid gguser, password Oracle1 discardfile ./dirrpt/lmysqla.dsc, purge overridedups map east.*, target east.*; | 70,678,515 | 159 | 444,518.96 |
extract tmysqlb userid ggadmin, password Oracle1 rmthost oelmysql, mgrport 15600 rmttask replicat, group lmysqlb table east.millitble, filter (@RANGE (1,4)); | replicat lmysqlb sourcedefs ./dirdef/ora11_east.def targetdb east@oelmysql, userid gguser, password Oracle1 discardfile ./dirrpt/lmysqlb.dsc, purge overridedups map east.*, target east.*; | 271,046,102 | 902 | 300,494.57 |
extract tmysqlc userid ggadmin, password Oracle1 rmthost oelmysql, mgrport 15600 rmttask replicat, group lmysqlc table east.millitble, filter (@RANGE (2,4)); | replicat lmysqlc sourcedefs ./dirdef/ora11_east.def targetdb east@oelmysql, userid gguser, password Oracle1 discardfile ./dirrpt/lmysqlc.dsc, purge overridedups map east.*, target east.*; | 269,056,311 | 898 | 299,617.27 |
extract tmysqld userid ggadmin, password Oracle1 rmthost oelmysql, mgrport 15600 rmttask replicat, group lmysqld table east.millitble, filter (@RANGE (3,4)); | replicat lmysqld sourcedefs ./dirdef/ora11_east.def targetdb east@oelmysql, userid gguser, password Oracle1 discardfile ./dirrpt/lmysqld.dsc, purge overridedups map east.*, target east.*; | 269,854,784 | 863 | 312,693.84 |
extract tmysqle userid ggadmin, password Oracle1 rmthost oelmysql, mgrport 15600 rmttask replicat, group lmysqle table east.millitble, filter (@RANGE (4,4)); | replicat lmysqle sourcedefs ./dirdef/ora11_east.def targetdb east@oelmysql, userid gguser, password Oracle1 discardfile ./dirrpt/lmysqle.dsc, purge overridedups map east.*, target east.*; | 269,039,165 | 831 | 323,753.51 |
The load times for east.millitble could be reduced by increasing the @RANGE value to 6; however, this would exceed the limits of my test server. Likewise, for the table east.milltblea I would have initially configured and tested 10 Extract and Replicat pairs because of the size and complexity of the data.
The parameter OVERRIDEDUPS is used to provide recoverability should there be a need to restart the source side Extract. This will allow the Replicat to run without first truncating the target tables. However, you should note that this will also slow down data apply in this case because the Replicat will perform two database operations whenever a duplicate row is detected. It is important to note that this setting should be made only if delete operations are not performed on the source tables. If deletes occur in the source, and the source side Extract was re-run, the target tables must be truncated before restarting the initial load process.
Change The Extract Output
As demonstrated previously, Extract reads data directly from the source tables quickly, the bottleneck is Replicat. The Extract options FORMATASCII, FORMATSQL, and FORMATXML maybe used to change Extract’s output so data load mechanisms other than Replicat may be used.
When using the output formatter option, the Extract configuration would be similar to the following:
FORMATASCII | FORMATSQL | FORMATXML |
---|---|---|
extract tmysqlb userid ggadmin, password Oracle1 rmthost oelmysql, mgrport 15600, tcpbufsize 288750, tcpflushbytes 288750 formatascii rmtfile ./dirdat/tn, maxfiles 999999, megabytes 1100, purge reportcount every 60 seconds, rate table east.millitble; | extract tmysqlb userid ggadmin, password Oracle1 rmthost oelmysql, mgrport 15600, tcpbufsize 288750, tcpflushbytes 288750 formatsql rmtfile ./dirdat/tn, maxfiles 999999, megabytes 1100, purge reportcount every 60 seconds, rate table east.millitble; | extract tmysqlb userid ggadmin, password Oracle1 rmthost oelmysql, mgrport 15600, tcpbufsize 288750, tcpflushbytes 288750 formatxml rmtfile ./dirdat/tn, maxfiles 999999, megabytes 1100, purge reportcount every 60 seconds, rate table east.millitble; |
As seen in our prior tests, we can further reduce data read time by splitting the workload for large tables across multiple Extract processes via the FILTER and @RANGE options.
Target Database Instantiation
No matter which initial load procedure is chosen, the process for performing a target database instantiation will remain the same:
- Make database changes required to support OGG transactional data capture.
- Configure the OGG source and target components.
- Stop all long running transactions.
- Start the Primary Extract and Extract Data Pump.
- If the source database is Oracle get the current SCN. If the source database is DB2, SQL Server, or Sybase get the current LSN.
- Start the Primary Extract and Extract Data Pump.
- Perform the initial load.
- Wait for the initial load to complete.
- If the source database is Oracle, DB2, SQL Server, or Sybase start each Replicat specifying the ATCSN option and the CSN/LSN value obtained previously from the source database.
If the source database is other than Oracle, DB2, SQL Server or Sybase:
- Ensure the parameters HANDLECOLLISIONS and END RUNTIME exists in each Replicat parameter file.
- Start each Replicat.
- Monitor the Replicat activity.
- As each Replicat enters the STOPPED state, remove the parameters HANDLECOLLISIONS and END RUNTIME from its parameter file.
- Restart the Replicat.
Summary
Oracle GoldenGate may be configured to perform the initial load of heterogeneous data efficiently via the use of advanced configuration options.
All site content is the property of Oracle Corp. Redistribution not allowed without written permission