>> Ressourcen > Archiv > Maurer H. (Ed.)[..] > 10 The Hyper-G [..] > 10.6 Maintainin[..]

ErstesErstesVorherigesNächstesLetztes 6/9

10.6 Maintaining referential integrity

 

By now, it should be apparent that referential integrity (consistency of the links and collection hierarchy) is guaranteed by the Hyper-G server in the local case, i.e. both endpoints of a link are on the same server, or both the collection and child object are on the same server. This section presents a scalable architecture for automatic maintenance of referential integrity in large (thousands of servers) distributed information systems.

A central feature of the proposed architecture is the p-flood algorithm, which is a scalable, robust, prioritizable, probabilistic server-server protocol for efficient distribution of update information to a large collection of servers. We give only a short overview (see `Further reading' at the end of this chapter).

  figure4015
Figure 10.4: Partitioning the Web among servers

Figure 10.4 illustrates this situation. The hyperweb is partitioned by server boundaries (the servers are labeled A, B and C in the figure). Links which span server boundaries are shown as thicker edges. We will call these links surface links, and documents connected to other servers by such links shall be called surface document (the others are called core links and core documents, respectively). Although not apparent from Figure 10.4, a server's surface will typically be small compared to its core.

In order to keep the useful property of bidirectional links, the link information of surface links must be stored in both affected servers. For increased performance, the servers also keep replicas of the other surface document's meta-information. In Figure 10.4, server A stores document 1 plus a replica of document 2's meta-information and the link between them, while server B stores document 2 plus replicas of documents 1's and 3's meta-information and the links from 1 to 2 and from 2 to 3.

In this setup, documents on different servers are interconnected as tightly as the documents on a single server. The bidirectional links enable more advanced navigation techniques (such as Harmony's local map, collection browser and information landscape), but also simplifies maintenance of the hyperweb.

The problem which remains is how to inform the other servers that document 2 has been removed. An earlier implementation of Hyper-G used the knowledge about what documents are affected to directly engage the other servers in a multi-server transaction, in order to remove document 2 and all links to and from it. However, this approach has scalability problems when many servers must participate in the transaction (because many links point to the document).

Therefore, it was decided to adopt a weak consistency approach, whereby it is accepted that the hyperweb may be inconsistent for a certain period of time, but is guaranteed to converge to a consistent state eventually. Of course, we would like to keep the duration of the inconsistency as short as possible.

Updates may only take place at a well-defined server. This server is not the same for all operations, but depends on the document or link being modified (or removed or inserted): For documents, it is the server which holds the document; for links, it is the server which holds the document where the link emanates (in our example, server B would be responsible for updates of document 2, while the link from 1 to 2 would be updated by server A). This eliminates the problem of conflicting updates (they are handled one after the other). Of course, the server must be available at update time. However, since for security reasons users wishing to update document 2 must have write permission for document 2 (this is checked by server B which holds document 2), this fact is inevitable and we have to live it, anyway.

Updates of core documents or core links require no further action (integrity is maintained by the dbserver). However, other servers need to be notified of updates happening at a server's surface (i.e. updates of surface documents or surface links). We chose to use a flood algorithm to propagate updates from the master to the slaves (i.e. all other servers), because of its scalability (the traffic generated does not depend on the number of references to the object in question), because it does not require that the recipients are available at update time, and because it can be used for other purposes as well (like distributing server addresses and statistics, and maintaining the consistency of replicas and caches).


10.6.1 The p-flood algorithm 10.6.1 The p-flood algorithm
10.6.2 Simulation results 10.6.2 Simulation results
10.6.3 The p-flood daemon 10.6.3 The p-flood daemon