Distributed Database Systems - UiO [PDF]

nodes in the DDBS. Independent DBMSs. Implement some. “cooperation functions”: - Transaction Mgmt. - Schema mapping.

0 downloads 21 Views 567KB Size

Recommend Stories


Distributed Database Systems
The only limits you see are the ones you impose on yourself. Dr. Wayne Dyer

[PDF] Download Database Systems
No amount of guilt can solve the past, and no amount of anxiety can change the future. Anonymous

[PDF] Database Systems
You have to expect things of yourself before you can do them. Michael Jordan

[PDF] Download Database Systems
What you seek is seeking you. Rumi

PDF Download Database Systems
Ego says, "Once everything falls into place, I'll feel peace." Spirit says "Find your peace, and then

[PDF] Download Database Systems
The happiest people don't have the best of everything, they just make the best of everything. Anony

[PDF] Download Database Systems
Why complain about yesterday, when you can make a better tomorrow by making the most of today? Anon

PDF Download Database Systems
If you feel beautiful, then you are. Even if you don't, you still are. Terri Guillemets

PdF Download Database Systems
We may have all come on different ships, but we're in the same boat now. M.L.King

Read PDF Database Systems
You can never cross the ocean unless you have the courage to lose sight of the shore. Andrè Gide

Idea Transcript


Distributed Database Systems

Vera Goebel Department of Informatics University of Oslo

2011 1

Contents • Review: Layered DBMS Architecture • Distributed DBMS Architectures – DDBMS Taxonomy • Client/Server Models • Key Problems of Distributed DBMS – Distributed data modeling – Distributed query processing & optimization – Distributed transaction management • Concurrency Control • Recovery 2

Applications Data Model / View Management

Transaction Management

Functional Layers Interface of a DBMS

Control Semantic Integrity Control / Authorization Compilation Execution

Query Processing and Optimization Storage Structures

Data Access

Buffer Management

Consistency

Concurrency Control / Logging

Database 3

Dependencies among DBMS components Transaction Mgmt Access Path Mgmt

Sorting component

Lock component

Log component (with savepoint mgmt) Central Components

System Buffer Mgmt

Indicates a dependency 4

Centralized DBS • logically integrated • physically centralized T1

T2 T3

network T4

DBS T5

Traditionally: one large mainframe DBMS + n “stupid” terminals 5

Distributed DBS • Data logically integrated (i.e., access based on one schema) • Data physically distributed among multiple database nodes • Processing is distributed among multiple database nodes

T1 T3

T2

DBS1 network DBS2

DBS3

Why a Distributed DBS? Performance via parallel execution - more users - quick response More data volume - max disks on multiple nodes

Traditionally: m mainframes for the DBMSs + n terminals 6

DBMS Implementation Alternatives Distribution Distributed DBMS DBMS Homogeneous

Distributed Homog. Federated DBMS

Distributed Multi-DBMS

Client/Server Distribution Distributed Heterogeneous DBMS

Distributed Heterog. Federated DBMS

Centralized DBMS DBMS Homogeneous

Centralized Heterogeneous DBMS

Distributed Heterog. Multi-DBMS

Centralized Homog. Federated DBMS

Centralized Heterog. Federated DBMS

Centralized Homog. Multi-DBMS

Autonomy

Centralized Heterog. Multi-DBMS

Heterogeneity 7

Common DBMS Architectural Configurations No Autonomy N1 N3

N2

N4

Fully integrated nodes Complete cooperation on: - Data Schema - Transaction Mgmt Fully aware of all other nodes in the DDBS

Federated

Multi

D1

GTM & SM

D3

D2

D4

Independent DBMSs Implement some “cooperation functions”: - Transaction Mgmt - Schema mapping Aware of other DBSs in the federation

D1

D2

D3

D4

Fully independent DBMSs Independent, global manager tries to coordinate the DBMSs using only existing DBMS services Unaware of GTM and other DBSs in the MultiDBS 8

Parallel Database System Platforms (Non-autonomous Distributed DBMS) Shared Everything Proc1 Proc2

Shared Nothing

... ProcN

Fast Interconnection Network Mem1

Mem2

...

MemM Disk 1

...

Disk p

Typically a special-purpose hardware configuration, but this can be emulated in software.

Proc1

...

Fast Interconnection Network

MemN

Mem1

Disk 1

ProcN

...

Disk N

Can be a special-purpose hardware configuration, or a network of workstations.

9

Distributed DBMS • Advantages: – – – –

Improved performance Efficiency Extensibility (addition of new nodes) Transparency of distribution • Storage of data • Query execution

– Autonomy of individual nodes

• Problems: – – – –

Complexity of design and implementation Data consistency Safety Failure recovery 10

Client/Server Database Systems

The “Simple” Case of Distributed Database Systems?

11

Client/Server Environments data (object) server + n smart clients (workstations)

• objects stored and administered on server • objects processed (accessed and modified) on workstations [sometimes on the server too] • CPU-time intensive applications on workstations – GUIs, design tools

• Use client system local storage capabilities • combination with distributed DBS services => distributed server architecture 12

Clients with Centralized Server Architecture workstations

...

local communication network

Data Server

interface database functions

... Disk 1 database

Disk n

13

Clients with Distributed Server Architecture workstations

...

local communication network Data Server 1

Data Server m

interface

interface

distributed DBMS

...

distributed DBMS

local data mgnt. functions

local data mgnt. functions

...

...

Disk 1 database

Disk n

Disk 1

database

Disk p

14

Clients with Data Server Approach “3-tier client/server”

... user interface

Application Server

query parsing data server interface communication channel

Data Server

application server interface database functions

... Disk 1 database

Disk n 15

In the Client/Server DBMS Architecture, how are the DB services organized?

There are several architectural options!

16

Client/Server Architectures server process

Relational

client process Application cursor management

Object-Oriented

SQL

client process Application server process

object/page cache management

objects, pages, or files

DBMS 17

Object Server Architecture server process client process

Log/Lock Manager

Application

Object Manager

Object Cache

objects object references queries method calls locks log records

Object Manager

Object Cache

File/Index Manager

Page Cache Manager

Page Cache

Storage Allocation and I/O

Database 18

Object Server Architecture - summary • • • •

Unit of transfer: object(s) Server understands the object concept and can execute methods most DBMS functionality is replicated on client(s) and server(s) Advantages: - server and client can run methods, workload can be balanced - simplifies concurrency control design (centralized in server) - implementation of object-level locking - low cost for enforcing “constraints” on objects

• Disadvantages/problems: - remote procedure calls (RPC) for object references - complex server design - client cache consistency problems - page-level locking, objects get copied multiple times, large objects 19

Page Server Architecture client process Application Object Cache

Object Manager

server process File/Index Manager Page Cache Manager

pages page references Log/Lock Page Cache locks Manager Manager log records

Page Cache

Storage Allocation and I/O

Page Cache database

20

Page Server Architecture - summary • unit of transfer: page(s) • server deals only with pages (understands no object semantics) • server functionality: storage/retrieval of page(s), concurrency control, recovery • advantages: - most DBMS functionality at client - page transfer -> server overhead minimized - more clients can be supported - object clustering improves performance considerably • disadvantages/problems: - method execution only on clients - object-level locking - object clustering - large client buffer pool required 21

File Server Architecture client process Application

Object Cache

Object Manager File/Index Manager

Page Cache

server process locks log records

Log/Lock Manager

pages page references

Space Allocation

NFS

Page Cache Manager database

22

File Server Architecture - summary • unit of transfer: page(s) • simplification of page server, clients use remote file system (e.g., NFS) to read and write DB page(s) directly • server functionality: I/O handling, concurrency control, recovery • advantages: - same as for page server - NFS: user-level context switches can be avoided - NFS widely-used -> stable SW, will be improved • disadvantages/problems: - same as for page server - NFS write are slow - read operations bypass the server -> no request combination - object clustering - coordination of disk space allocation 23

Cache Consistency in Client/Server Architectures Synchronous

AvoidanceBased Algorithms

DetectionBased Algorithms

Asynchronous

Deferred

Client sends 1 msg per lock to server; Client waits; Server replies with ACK or NACK.

Client sends 1 msg per lock to the server; Client continues; Server invalidates cached copies at other clients.

Client sends all write lock requests to the server at commit time; Client waits; Server replies when all cached copies are freed.

Client sends object status query to server for each access; Client waits; Server replies.

Client sends 1 msg per lock to the server; Client continues; After commit, the server sends updates to all cached copies.

Client sends all write lock requests to the server at commit time; Client waits; Server replies based on W-W conflicts only.

Best Performing Algorithms 24

Comparison of the 3 Client/Server Architectures Page & File Server • Simple server design • Complex client design • Fine grained concurrency control difficult • Very sensitive to client buffer pool size and clustering

Object Server • Complex server design • “Relatively” simple client design • Fine-grained concurrency control • Reduces data movement, relatively insensitive to clustering • Sensitive to client buffer pool size

Conclusions: • No clear winner • Depends on object size and application’s object access pattern • File server ruled out by poor NFS performance 25

Problems in Distributed DBMS Services Distributed Database Design Distributed Directory/Catalogue Mgmt Distributed Query Processing and Optimization Distributed Transaction Mgmt – Distributed Concurreny Control – Distributed Deadlock Mgmt – Distributed Recovery Mgmt directory management

query processing

distributed DB design

concurrency control (lock) influences

reliability (log)

transaction management

deadlock management 26

Distributed Storage in Relational DBMSs • horizontal fragmentation: distribution of “rows”, selection • vertical fragmentation: distribution of “columns”, projection

• hybrid fragmentation: ”projected columns” from ”selected rows” • allocation: which fragment is assigned to which node? • replication: multiple copies at different nodes, how many copies? • Design factors: – Most frequent query access patterns – Available distributed query processing algorithms

• Evaluation Criteria – Cost metrics for: network traffic, query processing, transaction mgmt – A system-wide goal: Maximize throughput or minimize latency 27

Distributed Storage in OODBMSs • Must fragment, allocate, and replicate object data among nodes • Complicating Factors: – – – – –

Encapsulation hides object structure Object methods (bound to class not instance) Users dynamically create new classes Complex objects Effects on garbage collection algorithm and performance

• Approaches: – Store objects in relations and use RDBMS strategies to distribute data • All objects are stored in binary relations [OID, attr-value] • One relation per class attribute • Use nested relations to store complex (multi-class) objects

– Use object semantics

• Evaluation Criteria: – Affinity metric (maximize) – Cost metric (minimize) 28

Horizontal Partitioning using Object Semantics • Divide instances of a class into multiple groups – Based on selected attribute values Node1: Attr A < 100

Attr A >= 100

– Based on subclass designation

: Node2 Class C

Node1: Subcl X

Subcl Y : Node2

• Fragment an object’s complex attributes from it’s simple attributes • Fragment an object’s attributes based on typical method invocation sequence – Keep sequentially referenced attributes together 29

Vertical Partitioning using Object Semantics

• Fragment object attributes based on the class hierarchy Instances of Class X Instances of Subclass subX Instances of Subclass subsubX

Fragment #1

Fragment #2 Fragment #3

Breaks object encapsulation? 30

Path Partitioning using Object Semantics • Use the object composition graph for complex objects • A path partition is the set of objects corresponding to instance variables in the subtree rooted at the composite object – Typically represented in a ”structure index” Composite Object OID

{SubC#1-OID, SubC#2-OID, …SubC#n-OID}

31

More Issues in OODBMS Fragmentation Local Object Data Good performance; Local Methods Issue: Replicate methods so they are local to all instances?

Remote Methods

Send data to remote site, execute, and return result OR fetch the method and execute; Issues: Time/cost for two transfers? Ability to execute locally?

Remote Object Data Fetch the remote data and execute the methods; Issue: Time/cost for data transfer? Send additional input values via RPC, execute and return result; Issues: RPC time? Execution load on remote node?

• Replication Options – – – –

Objects Classes (collections of objects) Methods Class/Type specifications 32

Distributed Directory and Catalogue Management • Directory Information: – Description and location of records/objects • Size, special data properties (e.g.,executable, DB type, user-defined type, etc.) • Fragmentation scheme

– Definitions for views, integrity constraints

• Options for organizing the directory: – – – –

Issues: bottleneck, unreliable Centralized Issues: consistency, storage overhead Fully replicated Issues: complicated access protocol, consistency Partitioned e.g., zoned, Combination of partitioned and replicated

replicated zoned 33

Distributed Query Processing and Optimization • Construction and execution of query plans, query optimization • Goals: maximize parallelism (response time optimization) minimize network data transfer (throughput optimization)

• Basic Approaches to distributed query processing: Pipelining – functional decomposition Rel A

Node 1

Select A.x > 100

Node 2

Processing Timelines Rel B

Join A and B on y

Parallelism – data decomposition Frag A.1 Frag A.2

Node 1 Select A.x > 100 Node 2

Node 1: Node 2:

Processing Timelines

Node 3

Node 1: Node 2: Node 3:

Union 34

Creating the Distributed Query Processing Plan • Factors to be considered: - distribution of data - communication costs - lack of sufficient locally available information • 4 processing steps: (1) query decomposition (2) data localization (3) global optimization (4) local optimization

control site (uses global information)

local sites (use local information)

35

Generic Layered Scheme for Planning a Dist. Query calculus query on distributed objects

control site

query decomposition

Global schema

algebraic query on distributed objects data localization

Fragment schema

fragment query

global optimization

Statistics on fragments

local sites

optimized fragment query with communication operations local optimization

This approach was initially designed for distributed relational DBMSs. It also applies to distributed OODBMSs, Multi-DBMSs, and distributed MultiDBMSs.

Local schema

optimized local queries 36

Distributed Query Processing Plans in OODBMSs • Two forms of queries – Explicit queries written in OQL – Object navigation/traversal

}

Reduce to logical algebraic expressions

• Planning and Optimizing – Additional rewriting rules for sets of objects • Union, Intersection, and Selection

– Cost function • Should consider object size, structure, location, indexes, etc. • Breaks object encapsulation to obtain this info? • Objects can ”reflect” their access cost estimate

– Volatile object databases can invalidate optimizations • Dynamic Plan Selection (compile N plans; select 1 at runtime) • Periodic replan during long-running queries 37

Distributed Query Optimization • information needed for optimization (fragment statistics): - size of objects, image sizes of attributes - transfer costs - workload among nodes - physical data layout - access path, indexes, clustering information - properties of the result (objects) - formulas for estimating the cardinalities of operation results • execution cost is expressed as a weighted combination of I/O, CPU, and communication costs (mostly dominant). total-cost = CCPU * #insts + CI/O * #I/Os + CMSG * #msgs + CTR * #bytes Optimization Goals: response time of single transaction or system throughput 38

Distributed Transaction Managment • Transaction Management (TM) in centralized DBS – Achieves transaction ACID properties by using: • concurrency control (CC) • recovery (logging)

• TM in DDBS – Achieves transaction ACID properties by using: Transaction Manager Concurrency Control (Isolation)

Recovery Protocol (Durability)

Log Manager

Buffer Manager

Commit/Abort Protocol (Atomicity)

Replica Control Protocol (Mutual Consistency)

Strong algorithmic dependencies 39

Classification of Concurrency Control Approaches Isolation CC Approaches

Pessimistic Locking

Centralized

Primary Copy

Distributed

Timestamp Ordering

Optimistic Hybrid

Locking

Timestamp Ordering

Basic

MultiVersion Conservative

phases of pessimistic transaction execution

validate

phases of optimistic transaction execution

read

read

compute

write

compute

validate

write

40

Two-Phase-Locking Approach (2PL)

Obtain Lock

Number of locks

2PL Lock Graph

Isolation

Release Lock

BEGIN

LOCK POINT

END

Transaction Duration

Number of locks

Obtain Lock

Release Lock

Strict 2PL Lock Graph BEGIN

END Period of data item use

Transaction Duration 41

Communication Structure of Centralized 2PL Isolation Data Processors at participating sites

Coordinating TM

Central Site LM

1 Lock Request

2 Lock Granted

3 Operation

4 End of Operation

5 Release Locks

42

Communication Structure of Distributed 2PL Isolation Data Processors at participating sites

Participating LMs

Coordinating TM

1 Lock Request

2 Operation

3 End of Operation

4 Release Locks

43

Distributed Deadlock Management

Isolation

Example Site X

waits for T1 x to release Lc

T2 x

T1 x holds lock Lc

holds lock Lb

T1 x needs a waits for T1 on site y to complete

Site Y

T2 y needs b waits for T2 on site x to complete waits for T2 y to release Ld

T1 y holds lock La

Distributed Waits-For Graph - requires many messages to update lock status - combine the “wait-for” tracing message with lock status update messages - still an expensive algorithm

T2 y holds lock Ld

44

Communication Structure of Centralized 2P Commit Protocol Coordinating TM

Atomicity

Participating Sites

1 Prepare

Make local commit/abort decision Write log entry

2 Vote-Commit or Vote-Abort

Count the votes If (missing any votes or any Vote-Abort) Then Global-Abort Else Global-Commit

Who participates? Depends on the CC alg.

3

Global-Abort or Global-Commit Write log entry

4 ACK

Other communication structures are possible: – Linear – Distributed 45

State Transitions for 2P Commit Protocol Coordinating TM

Participating Sites

Initial 1 Prepare

Advantages: - Preserves atomicity - All processes are “synchronous within one state transition”

Initial

Wait

2 Vote-Commit

3

or Vote-Abort

Ready

Global-Abort or Global-Commit

Commit

Atomicity

Abort

4 ACK

Commit

Abort

Disadvantages: - Many, many messages - If failure occurs, the 2PC protocol blocks! Attempted solutions for the blocking problem: 1) Termination Protocol 2) 3-Phase Commit 3) Quorum 3P Commit 46

Failures in a Distributed System Types of Failure: – – – –

Transaction failure Node failure Media failure Network failure • Partitions each containing 1 or more sites Who addresses the problem?

Issues to be addressed: – How to continue service – How to maintain ACID properties while providing continued service – How to ensure ACID properties after recovery from the failure(s)

Termination Protocols Modified Concurrency Control & Commit/Abort Protocols Recovery Protocols, Termination Protocols, & Replica Control Protocols 47

Termination Protocols Coordinating TM

Atomicity, Consistency

Participating Sites

Initial

Timeout states:

1

Coordinator: wait, commit, abort Participant: initial, ready

Prepare

Initial

Wait

2 3

Use timeouts to detect potential failures that could block protocol progress

Coordinator Termination Protocol:

Vote-Commit or Vote-Abort

Wait – Send global-abort Commit or Abort – BLOCKED!

Ready

Global-Abort or Global-Commit

Participant Termination Protocol: Commit

Abort

4 ACK

Commit

Abort

Ready – Query the coordinator If timeout then query other participants; If global-abort | global-commit then proceed and terminate else BLOCKED! 48

Replica Control Protocols

Consistency

Number of locks

• Update propagation of committed write operations

STRICT UPDATE LAZY UPDATE

END

BEGIN Obtain Locks

Period of data use

2-Phase Commit

Release Locks

COMMIT POINT

49

Strict Replica Control Protocol

Consistency

• Read-One-Write-All (ROWA) • Part of the Concurrency Control Protocol and the 2-Phase Commit Protocol – CC locks all copies – 2PC propagates the updated values with 2PC messages (or an update propagation phase is inserted between the wait and commit states for those nodes holding an updateable value).

50

Lazy Replica Control Protocol

Consistency

• Propagates updates from a primary node. • Concurrency Control algorithm locks the primary copy node (same node as the primary lock node). • To preserve single copy semantics, must ensure that a transaction reads a current copy. – Changes the CC algorithm for read-locks – Adds an extra communication cost for reading data

• Extended transaction models may not require single copy semantics. 51

Recovery in Distributed Systems

Atomicity, Durability

Select COMMIT or ABORT (or blocked) for each interrupted subtransaction Commit Approaches: Redo – use the undo/redo log to perform all the write operations again Retry – use the transaction log to redo the entire subtransaction (R + W) Abort Approaches: Undo – use the undo/redo log to backout all the writes that were actually performed Compensation – use the transaction log to select and execute ”reverse” subtransactions that semantically undo the write operations. Implementation requires knowledge of: – Buffer manager algorithms for writing updated data from volatile storage buffers to persistent storage – Concurrency Control Algorithm – Commit/Abort Protocols – Replica Control Protocol 52

Network Partitions in Distributed Systems Partition #2 N2

N1

N3

N8 N4

network

Partition #1 N7

N6

N5

Partition #3

Issues:  Termination of interrupted transactions  Partition integration upon recovery from a network failure – Data availability while failure is ongoing 53

Data Availability in Partitioned Networks • Concurrency Control model impacts data availability. • ROWA – data replicated in multiple partitions is not available for reading or writing. • Primary Copy Node CC – can execute transactions if the primary copy node for all of the read-set and all of the write-set are in the client’s partition.

Availability is still very limited . . . We need a new idea!

54

Quorums

ACID

• Quorum – a special type of majority • Use quorums in the Concurrency Control, Commit/Abort, Termination, and Recovery Protocols – CC uses read-quorum & write-quorum – C/A, Term, & Recov use commit-quorum & abort-quorum

• Advantages: – More transactions can be executed during site failure and network failure (and still retain ACID properties)

• Disadvantages: – Many messages are required to establish a quorum – Necessity for a read-quorum slows down read operations – Not quite sufficient (failures are not “clean”) 55

Read-Quorums and Write-Quorums

Isolation

• The Concurrency Control Algorithm serializes valid transactions in a partition. It must obtain – A read-quorum for each read operation – A write-quorum for each write operation

• Let N=total number of nodes in the system • Define the size of the read-quorum Nr and the size of the write-quorum Nw as follows: – Nr + Nw > N – Nw > (N/2)

Simple Example: N=8 Nr = 4

Nw = 5

• When failures occur, it is possible to have a valid read quorum and no valid write quorum 56

Commit-Quorums and Abort-Quorums

ACID

• The Commit/Abort Protocol requires votes from all participants to commit or abort a transaction. – Commit a transaction if the accessible nodes can form a commitquorum – Abort a transaction if the accessible nodes can form an abort-quorum – Elect a new coordinator node (if necessary) – Try to form a commit-quorum before attempting to form an abort-quorum

• Let N=total number of nodes in the system • Define the size of the commit-quorum Nc and the size of abort-quorum Na as follows: Simple Examples: – Na + Nc > N; 0  Na, Nc  N

N=7 N=7

Nc = 4 Nc = 5

Na = 4 Na = 3 57

Conclusions • Nearly all commerical relational database systems offer some form of distribution – Client/server at a minimum

• Only a few commercial object-oriented database systems support distribution beyond N clients and 1 server • Future research directions: – Distributed data architecture and placement schemes (app-influenced) – Distributed databases as part of Internet applications – Continued service during disconnection or failure • Mobile systems accessing databases • Relaxed transaction models for semantically-correct continued operation 58

Smile Life

When life gives you a hundred reasons to cry, show life that you have a thousand reasons to smile

Get in touch

© Copyright 2015 - 2024 PDFFOX.COM - All rights reserved.