Perils of Distributed Databases

Nobody ever said that the administration of distributed databases is easy; it’s not. For one thing, it can be difficult to keep track of who needs what sort of access to a given database instance, and what access needs to be available from it to other instances. If users are experiencing difficulties or applications are unable to perform, how do you know which database is causing the problem? When you create a new user, what database instances should have the account? What is USER_A really seeing when he references the VENDORS table? None of these difficulties exist in a standalone system. Some of the more significant perils are summarized here and are discussed in detail in the chapters that follow.

Security

Didn’t this topic appear under the “Benefits” section, too? Yes, because there are two sides to the security story. Because it can be difficult to know and to control who is coming into a database via a database link, the accounts to which database links connect should be given no more access rights than absolutely necessary. Similarly, the CREATE PUBLIC DATABASE LINK system privilege should be granted sparingly because whoever has it can effectively create a public doorway into any system to which she has access. If you use operating system validated (OPS$) accounts, be extremely careful of using them in the CONNECT clause of database links. Be aware that holes to exploit do exist.

In an advanced replication environment, security issues can become complicated because the user community can be the sum of all users in all databases participating in replication. The maintenance of accounts in and of itself can become a full-time job. Oracle8 alleviates this chore somewhat, but you will need to decide if replicated transactions should be performed at remote sites by the original user or by a generic replication account.

It is possible to configure an extremely well controlled and robust distributed environment, but it takes care and planning as I’ll describe in Part II of this book.

Data Consistency

If you are using multi-master replication or procedural replication or if you have written your own code to perform DML on remote tables, one of your most formidable tasks will be to guarantee that data converges. This responsibility is shared among designers, developers, and DBAs (who should be coordinating their efforts). Designers must consider potential conflicts in their architecture; developers must code so that conflicts are addressed; and DBAs must resolve the unresolved conflicts. In general, the design and realization of a replicated system necessitates the solution of far more problems than does a standalone system, and the bulk of these problems concern data convergence.

Transaction Management

Do you want to update 15,000 records in the VENDORS table to reflect an area code change? Well, if that transaction needs to be replicated to five other sites, you’d better think twice about it because it’s going to queue up 15,000 × 5 = 75,000 transactions across your replicated environment. Do you want to use procedural replication to do it tonight at midnight California time? What about your site in Hong Kong where users are at work and updating the table? The point is that any batch updates in a replicated environment must be carefully coordinated with all sites in order to avoid massive conflicts and logjams.

The initial load and distribution of data among sites also requires coordination. For example, you might want to lock users out of all instances until you can guarantee that the data is identical everywhere.

Monitoring

The additional workload a distributed environment demands of the DBA can be considerable. In addition to the normal DBA responsibilities such as monitoring space utilization and extent allocation, the DBA must monitor objects such as snapshot logs, job queues, transaction queues, and error queues. If left unresolved, problems in a distributed environment can become so difficult to solve that it is easier to reload data from scratch than try to resolve specific errors.

For that reason, most people consider alert mechanisms to be essential in a replicated environment. For example, if unresolved conflicts put entries into the error queue (deferror ), the DBA should be notified as soon as possible. You will find utilities for this sort of automated notification in Appendix B, of this book .

Recovery

If a database that is part of a distributed environment fails, the recovery process must ensure not only the complete restoration of the local data but also the restoration of distributed data, such as snapshots and deferred transactions. It may be necessary to refresh snapshots at remote sites, to requeue deferred transactions, and so on. The point is that the recovery of the local system does not necessarily mean that the overall distributed database is recovered.

Performance

Several factors can affect performance in a distributed database. If the application references data over a database link, the performance of the network will have a direct bearing on performance. Replication components that utilize store-and-forward techniques, such as snapshots and multi-master replication, also exact their toll on overall system performance. If, for example, a snapshot master has a snapshot log, all DML on that table will cause a row-level trigger to fire that inserts records into the snapshot log. Similarly, DML against a replicated table will either put entries into the deftran queue (in the case of asynchronous replication) or require the successful delivery of every transaction to remote sites before completing (in the case of synchronous replication).

The storing and forwarding of transactions will impact overall system performance, and you should take this impact into consideration when specifying hardware requirements. In addition, activities such as snapshot refreshes and application of pushed transactions at destination sites impact performance. Oracle has taken great steps to minimize the impact of data distribution, but it still is a factor to consider.

Get Oracle Distributed Systems now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Oracle Distributed Systems by Charles Dye