Options for Distributed Data

Oracle provides several methods for accessing data that is distributed among two or more database instances. All of these methods provide location transparency , which means that users and applications can manipulate data as though it were all in one single database instance. These various methods are summarized here and are described in detail throughout this book.

Export/Import

The Oracle export and import utilities (illustrated in Figure 1.4) are the most primitive method of sharing data among databases and are also used as part of a backup and recovery strategy. Export (exp) creates a file that is essentially a set of SQL statements that invoke the DDL (Data Description Language) and DML (Data Manipulation Language) required to create objects and insert data. Import (imp) is the utility that reads this file and executes the SQL statements to re-create the objects and populate tables. A full database export creates a file that you can use to re-create the entire database.

Export/import

Figure 1-4. Export/import

Unlike any of the other options, export and import are static. An export file contains the data from the time of the export and cannot be updated. In fact, an export file could easily be out of date before the export job is finished. In addition, you must specify the export option CONSISTENT=Y in order for all of the data in the export file to be consistent as of a single point in time. Exports are only one part of a comprehensive backup strategy.

Database Links

Database links are the invisible glue that makes location transparency possible. In more technical terms, a database link defines a connection from one database instance to another, and this definition is stored in the Oracle data dictionary. Since database link connections log in to a normal account in the remote database instance, you have complete control over its privileges and quotas.

Used in conjunction with synonyms, database links (shown in Figure 1.5) can make remote objects appear to be local as far as applications and users are concerned.

Database links

Figure 1-5. Database links

If your inventory application at a manufacturing site needs to reference the VENDORS table at headquarters, you could provide location transparency with the following three SQL statements:

CREATE PUBLIC DATABASE LINK D8CA.BIGWHEEL.COM
   USING 'hqaccounting.bigwheel.com'

CREATE PUBLIC SYNONYM vendors FOR vendors@D8CA.BIGWHEEL.COM

GRANT SELECT ON vendors TO inventory_reader

Since the CREATE DATABASE LINK statement in this example creates a PUBLIC link without specifying an account to connect to in the D8CA.BIGWHEEL.COM database, this particular implementation assumes that every application user in the inventory database has an account in the remote database with the same password and with privileges to see the VENDORS table. If the remote database is unavailable, the VENDORS table also will be unavailable.

Of course, there are several ways to provide location transparency; these are described in greater detail later in this book.

Read-Only Snapshots

If you have an application that cannot risk a dependency on the availability of a remote database, you could use a read-only snapshot (shown in Figure 1.6). A read-only snapshot is essentially a local table whose data is refreshed at specified intervals by performing a query against one or more remote tables. The inventory application could create the same functionality as the database link described in the previous section by following these steps:

CREATE PUBLIC DATABASE LINK D8CA.BIGWHEEL.COM
   USING 'hqaccounting.bigwheel.com'

CREATE SNAPSHOT vendors
   REFRESH COMPLETE
   START WITH SYSDATE
   NEXT TRUNC(sysdate + 1) + 10/1440
AS
SELECT	 vendor_id, company_name
FROM	 vendors@D8CA.BIGWHEEL.COM

CREATE PUBLIC SYNONYM vendors FOR vendors

GRANT SELECT ON vendors TO inventory_reader

This snapshot is populated when the CREATE SNAPSHOT statement executes, and is then refreshed every day from that point on at 10 minutes after midnight. Again, this is just one example of how the technique could be implemented; the details come later. Snapshots use the Oracle built-in package DBMS_JOB to schedule refreshes and require the INIT.ORA parameter JOB_QUEUE_PROCESSES to be greater than zero.

Read-only snapshot

Figure 1-6. Read-only snapshot

The benefit of read-only snapshots over database links and public synonyms is that the snapshot is available even when the remote site is not. The disadvantages are that the data is neither real time nor updateable.

Note

Oracle introduced read-only snapshots with Oracle Version 7.0. The infrastructure this feature required has been expanded with each subsequent release, with additional functionality such as updateable snapshots and advanced replication. The base components include the job queue and triggers. The feature set is continuing to expand.

Updateable Snapshots

If your application needs to change data in a snapshot and send the changes back to the master site, you can use updateable snapshots, shown in Figure 1.7. A trigger on the snapshot table logs updates that are applied at the master site when the snapshot refreshes. Updateable snapshots require the advanced replication facilities. A common use of updateable snapshots is an application that consolidates data from various sites into a single master site. For example, a bicycle company might collect sales transactions from its distributors every night, or travelling salespeople might enter customer leads on their laptops and upload this information to the headquarters database when they return to the office.

Updateable snapshots

Figure 1-7. Updateable snapshots

Two important characteristics of updateable snapshots, which distinguish them from multi-master replicated tables, are:

  • They update only the master site.

  • They can be disconnected from the master site for extended periods.

You also can configure an updateable snapshot such that the updates are not sent back to the master. You can use this configuration to perform “What if " analyses against the local data without fear of overwriting the definitive values at the master site.

Advanced Replication

Advanced (or multi-master) replication (shown in Figure 1.8) is the most powerful of the replication options. You can use it to maintain a table at numerous sites, with updates at any one location being applied at all the other locations. There is no single “master” table, although there is a master definition site, from which schema maintenance must be performed. Unlike the situation with snapshots, you can configure a multi-master environment to provide real-time data; this technique is known as synchronous replication. If you use asynchronous replication (by far the more common implementation), updates to a table are placed in the deferred queue and pushed to other participating sites at user-defined intervals.

Multi-master replication

Figure 1-8. Multi-master replication

Since updates can occur at several locations, these updates can conflict with one another. Oracle provides a number of built-in methods to assist in resolving these conflicts, such as Latest Timestamp and Site Priority, but these techniques must be selected carefully to guarantee that data always converges. Conflict resolution, described in detail in Chapter 15, is usually the biggest challenge to creating and maintaining a successful implementation.

Advanced replication also has some significant limitations:

  • No support for sequences

  • No support for LONG or LONG RAW or HHCODE data, although Oracle8 supports replication of binary large objects (BLOBs) and character large objects (CLOBs)

  • Not recommended for applications performing massive updates (i.e., updates to tens of thousands of rows per hour)

Procedural Replication

Procedural replication (shown in Figure 1.9) is the preferred way to perform the massive updates that are not recommended with advanced replication. Instead of queuing up row-level changes and sending them to the other database instances, procedural replication queues calls to procedures and sends them to the other participants. If, for example, you wanted to mark up the prices of all your products by five percent, you could replicate the procedure call UPDATE_PRICES(pct_increase => 5). The procedure will execute at every site with the same parameters.

Procedural replication

Figure 1-9. Procedural replication

Oracle does not provide any conflict handlers that work in conjunction with procedural replication, so any routines that you want to use in this way must account for conflicts. In the price increase example, suppose that a price for one item had been changed at a remote site, and the change had not yet propagated to the site initiating the UPDATE_PRICES call. The data would not converge to the same values at both sites. Table 1.2 summarizes the kinds of conflicts that may occur with procedural replication.

Table 1-2. Potential Conflicts with Procedural Replication

Time

Activity

CA Price

NY Price

12:00

Sites agree

$100

$100

12:05

CA calls UPDATE_PRICES(pct_increase => 5)

$105

$100

12:10

NY site updates price to $120 before procedure replicates

$105

$120

12:15

Procedure call replicates to NY site

$105

$126

12:20

Update from NY at 12:10 arrives at CA site

$120

$126

It is safest to perform procedural replication during periods of low or no activity.

Get Oracle Distributed Systems now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.