>> Ressourcen > Theses > Koch, Thomas: G[..]

Diploma Thesis

 

 

Groupware on the Internet

 

 

 

Diploma Thesis, from

 

Thomas Koch

 

August 1998

 

 

 

 

 

 

 

 

Technical University Graz

Institut für Informationsverarbeitung und

Computergestützte neue Medien

 

 

 

 

 

Supervisor: H. Maurer

 

 

Abstract:

 

This is a diploma thesis about groupware and the Internet as a platform for creating, deploying and running applications which support teamwork.

For all kinds of organisations (business companies, universities, etc.) it is becoming increasingly important to have flexible teams and computer-supported teamwork. Teams need to be established across organisational, political and geographical boundaries. Organisations want to integrate and work together with partner organisations and business clients.

Rapid Application Development (RAD) is starting to play an important role for software developers due to the demand for shorter development time of less expensive and more stable software. For more flexibility, the software should run cross-platform. That way the software can not only be used in a clearly defined environment of an Intranet but by millions of Internet users. Software deployment and maintenance have to be made easier.

Groupware needs to provide a familiar context and working environment for members of a distributed team. The software must provide necessary social and task-oriented information to facilitate coordination tasks and to make each member’s work more efficient. However, since groupware also interferes with and changes the subtle and complex social dynamics that are common to teams, these social conflicts, which exist in all human teams, need to be recognised and resolved with the support of the groupware system.

This flexibility and interference with existing social structures poses new challenges for software developers.

 

Kurzfassung:

 

Diese Diplomarbeit handelt von Groupware und dem Internet als Plattform zum Programmieren, Verteilen und Ausführen von Applikationen, welche die Arbeit im Team vereinfachen und unterstützen sollen.

Es wird für alle Arten von Organisationen (Firmen, Universitäten, etc.) immer wichtiger, eine flexible Teamorganisation und computerunterstützte Teamarbeit zu haben. Teams arbeiten immer öfter über organisationsbedingte, politische und geographische Grenzen hinweg. Organisationen wollen verstärkt Partnerorganisationen und ihre Kunden in die eigene Struktur integrieren.

Rapid Application Development (RAD) spielt für Softwarehersteller eine größere Rolle, weil die Entwicklungszeit kürzer und die Software billiger wird, aber trotzdem höhere Qualität sichergestellt werden muß. Software sollte plattformunabhängig sein, um flexibler eingesetzt werden zu können. Dadurch kann die Software nicht nur in einer klar definierten und bekannten Umgebung eines Intranets, sondern von allen Benutzern des Internets verwendet werden. Zusätzlich müßten Softwareverteilung und Wartung einfacher und billiger sein.

Eine Aufgabe von Groupware ist es, eine vertraute Arbeitsumgebung für die Leute im Team zu schaffen. Die Software muß die nötigen sozialen und arbeitsbezogenen Informationen liefern, um die Teamkoordination einfacher und die Arbeit effizienter zu gestalten. Dadurch kommt es auch zu einer Beeinflussung und Veränderung der sozialen Struktur in einem Team. Soziale Konflikte, wie sie in allen Teams existieren, müssen mit Hilfe der Groupware erkannt und auch gelöst werden.

Diese notwendige Flexibilität und Beeinflussung der sozialen Struktur stellen eine neue Herausforderung für Softwarehersteller.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Ich möchte mich an dieser Stelle bei meinem Professor und Betreuer dieser Diplomarbeit Hermann Maurer bedanken. Seine Tips haben mir sehr geholfen, diese Diplomarbeit fertigzustellen. Außerdem war er mir behilflich, die Kontakte zur Universität Unimas in Sarawak, Borneo zu knüpfen, damit ich dort mit meiner Diplomarbeit anfangen konnte.

Diese Arbeit widme ich meinen Eltern in Dankbarkeit. Sie haben mich finanziell unterstützt und mich auf meinem Weg bestärkt. Ich möchte mich auch bei meinen Geschwistern und Großeltern sowie allen Verwandten für deren Unterstützung bedanken.

Besonderer Dank gilt auch meiner Freundin Barbara, die in dieser schwierigen Zeit meines Auslandaufenthalts zu mir gehalten hat, und meinen langjährigen Freunden Helmut und Wolfgang.

 

Groupware on the Internet

 

Thomas Koch

 

 

 

I hereby certify that the work reported in this is my own and that work performed by others is appropriately cited.

 

 

 

 


Signature of Author
:

 

 

 

 

Chapter 1 Introduction and Motivation *

1.1 Introduction *

1.2 Organisation of this thesis *

Chapter 2 The Internet *

2.1 History of the Internet *

2.2 The Internet Collaboration Platform *

2.2.1 Information access *

2.2.2 Communication and collaboration *

2.2.3 Costs *

2.3 Focus of the Internet *

Chapter 3 Internet Applications *

3.1 First Generation Hypermedia Systems *

3.1.1 World Wide Web *

3.1.2 Problems of 1st Generation Systems *

3.2 2nd Generation Hypermedia System: Towards a Workplace for Collaboration *

3.2.1 Maintenance support *

3.2.2 Structured Hypermedia *

3.2.3 Meta-Information for Objects *

3.2.4 Advanced Links *

3.2.5 Access Control and Logging *

3.2.6 Versioning of documents *

3.2.7 More Precise Information *

3.3 Hyperwave – the first full-scale implementation of a 2nd generation system *

3.3.1 Objects and object attributes *

3.3.2 Meta-Information *

3.3.3 Navigational Concepts *

3.3.4 Search in Hyperwave *

3.3.5 Document Management *

3.3.6 Access Control *

Chapter 4 Computer Supported Cooperative Work (CSCW) *

4.1 Definition *

4.1.1 CSCW (Computer-Supported Cooperative Work) *

4.1.2 Groupware *

4.1.3 Workgroup Computing *

4.1.4 Workflow Management *

4.2 General Problems with Groupware *

4.2.1 Technical Problems *

4.2.2 Social Problems *

4.3 Different Categorisation of Groupware *

4.3.1 Focus of the Cooperative Activity *

4.3.2 Amount of Structure Involved *

4.3.3 Degree of Embedded Semantics of the Collaborative Task *

4.3.4 Levels of Sharing *

4.3.5 Location of Users *

4.3.6 Time of Collaboration *

Chapter 5 Workgroup Computing *

5.1 Architectures *

5.1.1 Distributed or Client-to-Client *

5.1.2 Selected Client as a Serialisation Point *

5.1.3 Central or Client-Server *

5.1.4 Paradigm for Enabling Large-Scale Group Collaboration *

5.2 Awareness *

5.2.1 Types of Awareness *

5.2.2 Filtering Awareness Information *

5.2.3 Issues and problems *

5.3 System-Controlled Concurrency Control *

5.3.1 Drawbacks of Traditional Database Concurrency Control *

5.3.2 Different Categorisation of Concurrency Control *

5.3.3 Requirements for Dynamism *

5.4 Social Conflict Management *

5.4.1 Information Flow *

5.4.2 System-Supported Social Management *

5.5 Communication *

5.5.1 Parameters for Groupware Communication Protocols *

5.5.2 Reasons for Implementing Communication Facilities *

5.6 Collaboration *

5.6.1 Example: Common Text Editing *

5.7 Lotus NSTP 1.0 *

5.7.1 Basic Conceptual Model *

5.7.2 Support for awareness *

5.7.3 Support for Communication *

5.7.4 Examples for the Usage of NSTP *

5.7.5 Common Text Editor *

Chapter 6 Workflow Management *

6.1 Features of Workflow Systems *

6.2 General Problems with Workflow Systems *

6.3 Generations *

6.3.1 Application-specific *

6.3.2 Factored Application *

6.3.3 Tailorable Service (now) *

6.3.4 Embedded Enabler *

6.4 Differentiation of Systems *

6.4.1 Development Methods *

6.4.2 Process Modes *

Chapter 7 Creation of Internet-Based Groupware *

7.1 The WWW as a Platform for Collaboration *

7.2 Requirements for Applications *

7.3 Programming Interfaces *

7.3.1 Location of Software Execution *

7.3.2 Security Issues *

7.4 Programming Languages *

7.4.1 JavaScript *

7.4.2 Java *

7.5 Component Models *

7.5.1 JavaBeans Framework Model *

7.5.2 ActiveX Framework Model *

7.6 XML (Extensible Markup Language) *

7.6.1 Limitations of HTML *

7.6.2 XML Differs from HTML *

7.6.3 Web Applications with XML *

Chapter 8 Functionality of an Asynchronous Conference System *

8.1 Using the WWW as Platform for Asynchronous Conferencing *

8.2 Functionality of Current Conferencing Systems *

8.3 Problems of Current Conferencing Systems *

8.4 Functionality of a Full-Featured WWW-Based Conference System *

8.4.1 Full support of a rich text format (HTML) *

8.4.2 Searchable Meta Information *

8.5 Hierarchy of Documents *

8.6 List of Keywords (LoK) *

8.6.1 Adding a New Listword *

8.6.2 Moving a Listword in the Hierarchy *

8.6.3 Removing a Listword *

8.7 Searching for Documents *

8.7.1 Browsing the Hierarchy of Listwords *

8.7.2 Queries *

8.7.3 Similarity of Documents *

8.7.4 Forum Dynamics *

8.8 Maintenance *

8.9 Problems *

Chapter 9 Summary *

Chapter 10 References *

 



  1. Introduction and Motivation
    1. Introduction
    2. In the last 20 years, the Internet has evolved into a global marketplace for information. Education was based on "teaching facts" for a very long time. Therefore, it is not surprising that Internet has become important for information dissemination and global accessibility. Also in business people need to cooperate with other people and use business processes to get their work done. Much like the step from the telegraph to the telephone, the step to Internet technology connects people to a richer flow of information.

      However, in the last few years interaction and collaboration with teams has become increasingly important. Therefore, the Web is changing from being merely an information source to a platform for applications and more recently to a workplace for global collaboration.

      Collaboration is happening in increasingly heterogeneous environments. The trend shows a movement from traditional internal organisational collaboration towards open and global workgroups. This implies that traditional collaboration tools need to move towards open standards to attract customers. It will be shown that the Internet is one of those open standards used by millions of people every day.

    3. Organisation of this thesis

The following is a brief outline of the organisation of the thesis.

  • Chapter 1 – Introduction
  • Chapter 2 – The Internet: The Internet, its history and its ability to serve as a collaborative platform are described in this chapter.
  • Chapter 3 – Internet Applications: The most common applications for the every day use of the Internet are explained in that chapter.
  • Chapter 4 – Computer Supported Cooperative Work: Important terminology and general problems concerning groupware are explained. Different ways of categorisation are introduced. The difference between workgroup computing and workflow is shown.
  • Chapter 5 – Workgroup Computing: Workgroup computing, different architectures and necessary components (awareness, concurrency control, and social conflict management) are explained. The two interaction paradigms communication and collaboration are introduced.
  • Chapter 6 – Workflow Management: Features, general problems, different generations of workflow systems and ways to differentiate them are shown.
  • Chapter 7 – Creation of Internet-Based Groupware: Tools, programming languages, interfaces and their requirements to make them suitable to create applications for the Internet are explained.
  • Chapter 8 – Functionality of an Asynchronous Conference System: The functionality of current discussion forums and necessary improvements are investigated.
  • Chapter 9 – Summary
  • Chapter 10 – References


  1. The Internet
    1. History of the Internet
    2. L. Kleinrock at MIT published the first paper on packet switching theory in July 1961 and the first book on the subject in 1964. J.C.R. Licklider of MIT discussed his "Galactic Network" concept in August 1962. He envisioned a globally interconnected set of computers through which everyone could quickly access data and programs from any site. In 1965, the first (however small) wide-area computer network was built using a circuit switched telephone system, which was totally inadequate for the job. The need for packet switching was confirmed.

      The plan for the "ARPANET" was publishing in 1967. As the ARPANET sites completed implementing NCP (a predecessor protocol of TCP) during the period 1971-1972, the network users were finally able to begin to develop applications. In 1972, the first "hot" application, electronic mail, was introduced.

      DARPA supported UC Berkeley in investigating modifications to the Unix operating system, including incorporating TCP/IP. The transition of the ARPANET host protocol from NCP to TCP/IP was completed in 1983 – and the Internet was born.

      Taken from [Leiner97]. For further information visit [Chambe97].

    3. The Internet Collaboration Platform
      1. Information access
  • The Internet is everywhere: Since the late 80’s, the Web has become widespread. Thus, documents and applications can be accessed and shared by millions of people. Moreover, by developing applications for the Internet and deploying them on the Internet, these people are potential customers.
  • Information storage and retrieval: With the appearance of WWW, the Internet became the largest information repository in the world. Hundreds of million of papers with multimedia content are available for browsing.
  • Fast file exchange: At the beginning it was just a convenient way to interchange files and documents very quickly. With FTP, files can be distributed all over the world in a very short time. Email is much faster than conventional letter delivery services and the documents stay digital, so there is no need to key in the information again. Information can be exchanged efficiently between organisations if they use the same protocols.
  • The Internet promises ubiquitous, universally unified access to information: Today’s browsers can access and display different kinds of data without the interaction of the user. The user just needs to know how to handle the browser. The browser recognises the format and displays the information properly if the format is known to the browser.
      1. Communication and collaboration
  • More and more people are permanently connected to the Internet: This makes a shift from asynchronous (e.g. Email) to synchronous (e.g. MS NetMeeting, Netscape Conference) communication and collaboration possible. That way the Internet can be used as a platform for interaction with other people. [McGr97] shows an example of the collaborative potential of bringing so many experts into real-time contact.
  • Cross-platform interoperability based on open standards: Especially universities (but also other organisations) with their heterogeneous networks and different platforms need to focus on open standards to keep on being open for computer-based groupwork. There are several standards defined for the Internet that are widely accepted. Creating applications which use these open standards ensures an open door to the big marketplace of the whole Internet with its millions of users.
      1. Costs
  • Thin clients: The computing power of CPUs is doubling faster and faster. Companies and universities cannot however afford to replace their hardware every year. The Internet is mostly based on client-server architecture. Clients just install the browser and applications can be downloaded on demand. Therefore, the cost-of-ownership will be much lower. Software deployment and maintenance becomes easier.

 

This increasing popularity has motivated both research and industrial environments to investigate the Internet’s potential for groupwork and collaboration support. Consequently, various WWW tools have been developed whose purpose is to enhance and enable communication and collaboration via WWW.

Some tendencies can be observed:

  • Tools based on 1st generation hypermedia systems like Netscape Suitespot add more features to address the need for collaboration and communication support. Email and discussion servers are added to the product line. Browser packages retrieve not only information from Web servers, but also provide support for email and news server. Even tools for synchronous communication and collaboration are integrated.
  • Highly integrated and proprietary groupware products like Lotus Notes move towards open standards or provide gateways for communication and data access.
  • 2nd generation hypermedia systems like Hyperwave are coming into existence. They already integrate features for collaboration and communication.
  • Programming interfaces are added on the client and server side. This will make the media even richer and more interactive.
    1. Focus of the Internet

The evolution has been from Web publishing to implementing Web-based applications, and continues to grow into collaboration and workflow projects.

  • Publishing and authoring: Since the addition of the World Wide Web to the Internet, often the first use of this medium is to publish information much like an electronic billboard.
  • Web-based applications: Soon schools and universities discovered that they could use the Internet as a platform for applications to deliver more interactivity. This for example was used to create interactive learning software for learning on demand. In addition, companies wanted to provide direct access to their corporate data and applications to consumers. In fact, some companies’ only storefront is on the Web. Building a Web-based user interface to a business application also makes it feasible to quickly and inexpensively extend corporate applications and data to remote employees and business partners. That way the Internet helps to deliver information that is more accurate.
  • Workflow and collaboration: The next step is to use the Internet not only as application platform but even as a workplace for distributed teams. Collaboration and communication can occur between co-workers, but also customers, and suppliers. They find ways to provide broader access to business processes and facilitate communication that is more effective. When person-to-person communication can occur quickly, or without the need to travel, costs go down. When manual business processes are automated, often quality increases while the costs decrease.


  1. Internet Applications
    1. First Generation Hypermedia Systems
      1. World Wide Web
      2. The World Wide Web (WWW) can be described as an Internet-wide distributed heterogeneous hypermedia information retrieval system.

        1. HTTP (hypertext transfer protocol)

HTTP is a stateless protocol. The Client/Server connection is only maintained for the duration of one transaction. Every transaction to be initiated by a client establishes connection with the server and closes it when the transaction is complete.

This has one major drawback. Opening a TCP/IP connection is time-consuming and if documents with several pictures need to be downloaded connections have to be established for each object (which is time consuming using TCP/IP). Luckily, modern browsers can perform simultaneous download of several objects.

The life cycle of a connection consists of four parts:

  • Connecting of the client to the server’s port
  • Request of certain information by the client
  • Response of the server to the request
  • Closing of the connection
        1. HTML (Hypertext Mark-up Language)
        2. HTML is an SGML-like mark-up language. Tags are used to format the text and include multimedia documents like inline images and hyperlinks. Therefore, hyperlinks are embedded in the document and not extracted by the server. Nor is meta-information explicitly extracted and kept in a separate storage.

        3. URL (Uniform Resource Locator)

URLs consist mainly of three parts:

  • The protocol used to access the document, e.g. "http", "https", "gopher", "ftp"
  • The IP address of the server hosting the document, e.g. "www.iicm.edu"
  • The location and name of the document on the server, e.g. "/myspace/test.html"

Thus the URL not only points to the document, but also provides information about the protocol to use. That way, WWW can be used to access documents on a heterogeneous network using different protocols.

      1. Problems of 1st Generation Systems
      2. First generation Web servers work fine for small Web sites, like 50-200 pages. However, today’s Web sites are growing. 10 years ago Web sites served for the dissemination of documents. Nowadays, whole dictionaries, electronic books and newspapers are published on the Web. People are starting to develop applications for the Web and interact with the applications and other people. Therefore, information is dynamically changing and more interactive. Automatic maintenance, navigation support and security are becoming more important [Andrew94].

        At the beginning, the Web was mainly used for dissemination of text and pictures. There is also a shift towards using multimedia like (streaming) audio and video.

        1. Disorientation: Lost in Hyperspace

While browsing through the Web, the user just sees one page at a time. There is no overview of the server’s information structure. After following several links, the user often feels "lost".

  • Flat storage model: The documents in WWW servers are not structured. The storage model is flat. Hierarchies can just be recognised by examining URLs. However, there are no features like "go to the parent directory". Of course, today’s browsers offer a "Back" button, but what, if there is more than one parent?
  • Unidirectional hyperlinks: Navigation is only based on unidirectional hyperlinks. The user sees links pointing from this document to others, but not links pointing to the document. Therefore, it is impossible or very difficult to show a local map to see where the user came from and where he can go.
  • No global navigation map: There is no way to find out, in which "part of the information space" of the server one is located at a given moment to get an overview of the hierarchical structure of the Web. It’s much easier for people to memorise hierarchical structures or 3-dimensional maps of the server’s info-space than the spaghetti-bowl links of the Web.
        1. Links

It was mentioned in the last section that links are unidirectional, which makes it difficult to generate local maps. Another point is that links are embedded and in principle ignored by the server. It is up to the user to maintain the Web’s integrity:

  • Link maintenance: [Andrew94] Links are stored embedded in the document. Therefore, the Web servers store no information about which links point to a certain document. If a document is deleted, removed, or just renamed, the links pointing to this document become dangling. When the user tries to follow that link, he just gets the famous "404 not found" response. It is up to the operator to maintain the Web’s integrity, which is manageable for 50 to 100 Web pages with the help of 3rd party tools like Microsoft FrontPage. This software checks the link integrity before uploading the pages to the web. However, today’s Web sites often contain several thousand pages, which is impossible to maintain by hand.
  • Links break the document integrity: Because links are embedded in documents, one has to change the document to insert a new link in the document. That makes it impossible to add an annotation or a link to a read-only document or a document one does not own.
        1. Security

Early Web servers were merely used for accessing and downloading documents, not for an information flow in both directions. Hence security was not of much concern. If the Web is also used for disseminating confidential information or running applications on the Intranet or Internet, security and access control becomes an important issue.

  • Access control: At the very beginning of WWW, there was no access control. Of course, there is still the access control of the operating system, but the server usually runs under a privileged account and effectively prevents access control that way. This is a problem if the user can make the server start a program (like what happens with CGI scripts). Today’s WWW servers like Netscape’s Enterprise Server 3.0 are starting to address this problem.
  • Different views for different people: It could be helpful to have different visibility of objects on the Web site instead of constructing two or more sites for Intra-, Internet or other user groups.
        1. The Web as a Platform for Collaboration
  • Stateless HTTP protocol: As discussed above an HTTP connection is opened for every file downloaded and HTTP basically is a simple Client-request/Server-respond protocol. So there is no way to store changes in the state of the client. This drawback has been overcome by introducing cookies. The client stores cookie data in flat files on the local hard drive. The server is able to request this information later. The disadvantage is that this information is lost if the user changes the computer.
  • Client-request/server-respond: This protocol is sufficient for simple information delivery, but not for interaction of the user with the server.
  • Precise information: The lack of meta information makes it difficult to automate precise information retrieval. Search engines rely on descriptive meta data, or information about a document's content. The same online resource can be described in many ways, depending on the criteria used. Because the need for machine-usable descriptions of collections of distributed information is increasing rapidly there have been a number of proposals in the recent past that have made significant steps toward this goal, including MCF using XML (see [Guha97]) and PICS (see [Resnick96], [Resnick]).
  • Static HTML documents: Usually the server just delivers static information from a file on its hard discs. CGI (Common Gateway Interface) was designed as an interface between server and applications. Users can call these applications and the server just forks a new process and starts the application. The application runs on the platform of the operating system and provides a Web page created "on the fly" for the server. This technique could be used to provide dynamic HTML pages. Furthermore, CGI could be used to access databases and store client’s state centrally on the server.
    1. 2nd Generation Hypermedia System: Towards a Workplace for Collaboration
    2. Now let’s take a more careful look at requirements of 2nd generation hypermedia systems to support the shift from simple information serving to a full-featured workplace. For a comparison of WWW and Hyperwave, see [Pam95].

      1. Maintenance support

If one shifts the workplace from the desktop environment to the Web, it means that on the one hand one has to provide access to a huge number of documents like the user’s repositories, libraries and other background information. On the other hand, some of these documents are constantly changing. So there is need for a systematic support for automatic maintenance of the Web’s integrity.

  • Link management in a changing environment: Automatic link management [Andrew94] means that the server checks which links point to deleted documents and deletes these as well. If documents are moved, the links pointing to this document should still be valid.
  • Abstraction of resource identification: In 1st generation systems, the document is addressed by a unique URL. The URL points to a static place and a static name of the document. With an abstraction of the resource identification, the document could be moved or renamed and the links would still point to this document.
  • Visibility of documents: In 1st generation systems, documents become visible to the user if a link from another document points to this one. Thus, just uploading documents does not make them visible (visible doesn’t mean accessible; a document with no link pointing to it can be accessed, if the URL of the document is known). Documents should automatically be integrated in a hierarchy. This way, users don’t need to change several other documents to make a new one visible.
      1. Structured Hypermedia

Today’s Web with its links is often compared to a spaghetti bowl. An intuitive structure like multiple DAGs could help the user to avoid the "Lost in Hyperspace" syndrome and provide a subspace for semantically similar documents.

  • Navigational support: like local and global maps of the Web [Andrew94].
  • Gathering of semantically similar documents: That’s what most people do now in their local file system. There is one directory for personal data, one for programs, and one for drivers...
  • Setting search scope: Instead of searching the whole Web to find something about mushrooms, the search scope could be limited to the Fungi-hierarchy. This would decrease the workload of the server and help avoiding the retrieval of non-related documents, like documents about the Internet mushroom project.
      1. Meta-Information for Objects

One major problem Internet users are facing nowadays is the retrieval of useful information. The Internet provides access to Terabytes of data, but people spend more and more time searching for it. Meta-information does not just make it easier to retrieve useful information but also helps to manage the Web.

Because the need for machine-usable descriptions of collections of distributed information is increasing rapidly there have been a number of proposals in the recent past that have made significant steps toward this goal, including MCF using XML (see [Guha97]) and PICS (see [Resnick96], [Resnick]).

  • Objects can be identified by their properties: Often people know the author or the date of creation of a document. This enables to search more efficiently than performing full text searches.
  • Objects can become valid or invalid: Some announcements and other information have just a limited time of validity. After that time the document or part of it can be hidden or automatically removed to avoid outdated information.
  • Provides information about an object, before downloading or viewing it: The system can provide information like mime type of the document...
      1. Advanced Links

Automatic link management and meta-information for links have already been mentioned. In 2nd generation hypermedia systems links should be open for various kinds of multimedia documents and sources.

  • Keeping document integrity: Adding a link to a document in a traditional system means that the document itself has to be changed. So it is impossible to add links to documents without write permission or to read-only sources.
  • Extensible media and link support: Today’s Webs serve not only HTML-documents, but also Postscript and PDF files, videos and audio files. Storing separate links makes it easier to link various kinds of file types.
  • Bi-directional links: this makes it easier to generate a local map of documents pointing to a file on the fly or showing the parents of a file. Also this feature is necessary to maintain link consistency. If a document is deleted, links pointing to it can easily be found and removed as well.
      1. Access Control and Logging
  • Different visibility for different users: different user groups need a different view (structure, amount of information...) of the same information without multiple storage of information.
  • Encryption of confidential information: because information is sent over the network, the encryption of confidential data like the password is crucial.
  • Statistical information about the system: This makes it easier for system administrators to decide when to buy a new computer. For example, they see that response time is too long.
      1. Versioning of documents
      2. When the Web is used as an application platform, version control including restoration of older versions and document locking becomes important.

      3. More Precise Information

The usage of the Web is getting more and more automated. Agent technologies especially are used to retrieve useful information, or to fill out forms, etc. However, for the application to work efficiently the kind of information needs to be clearly defined. This would make the exchange of information more efficient.

    1. Hyperwave – the first full-scale implementation of a 2nd generation system
    2. Hyperwave (formerly called Hyper-G) claims to be the first second-generation hypermedia system among Internet web servers. The server is described in [Maurer97] as "distributed database system that is WWW transparent", as a "WWW oriented document management system" or as an "advanced WWW server with integrated database facilities".

      The use of Hyperwave for a Web based collaboration of different authors to write a book will be shown in chapter 5.6.1.1.

      1. Objects and object attributes
      2. Hyperwave uses an object-oriented approach to store documents, links, etc. Every object (document, collection, link...) is stored with meta-information for name, title(s) in different languages, keywords, author, date of last modification, etc.

        Every object gets a global ID on insertion. Therefore, even if the title, the location, or the name changes, Hyperwave still references the right object. This prevents the disadvantages of using static URLs. Other Hyperwave servers can use these global IDs to point to a remote object on another Hyperwave server.

      3. Meta-Information
      4. Indexed meta-information [Kappe97] provides faster access to relevant information and is customisable. Thus the search for information is faster and the search result can be more relevant.

      5. Navigational Concepts

Hyperwave offers the navigational concept of Hyperlinks, like in WWW, and of hierarchies, like in Gopher. Therefore, the server helps the user in getting an overview of the structure of a web site and helps to avoid the "lost in Hyperspace" syndrome.

  • Collection hierarchy: Every document being uploaded has to be inserted in at least one collection (or cluster). Therefore, the document is accessible through browsing the hierarchies even if no link is pointing to it. The advantages of this concept are:
  • It avoids many navigational links to create and maintain.
  • Every document is visible on insertion even if no link is pointing to it, because it is integrated in the hierarchy and can be accessed through it.
  • If one document or collection belongs semantically to more than one group, it can be inserted to more than one collection but is still physically stored just once. That way similar objects can be semantically gathered in one sub-graph.
  • The collections can be used for defining search scopes. This improves the use of search tools (which is discussed later).
  • Hyperlinks are objects like collections or documents with their own meta-information and access rights. They are not stored embedded in the document, but externally. Links consist of a source and destination anchor. The object-oriented approach and the fact that links are stored externally make links more flexible in Hyperwave. Therefore, Hyperwave can even link to Postscript files or frames of movies. The advantages are:
  • Links are bi-directional: The user can follow them in both directions. It is easier for the server to create a local map of a document (parents and children of a document).
  • Different visibility: Different visibility for different users (access rights). Predefined trails e.g. can be provided for a special group of users.
  • Link types: Like any object in Hyperwave, links have meta-information attached. Therefore, different link types can be defined and even be searched for.
      1. Search in Hyperwave

Hyperwave helps to find information through enhanced navigation paradigms and through a build-in (and thus highly integrated) search engine. Since Version 2.5 administrators are even able to choose between a native and an external (Verity) search engine.

Hyperwave is more powerful than other Web servers combined with search engines due to the following features:

  • Context based: As mentioned before, the user can flexibly select the search scope. Even in the newest versions of Netscape (Netscape’s Enterprise Server 3.0) users can only use predefined search scopes.
  • Build-in full-text: Every HTML and text document is automatically full-text-indexed on insertion. Using Verity’s search engine, even PDF and various MS Office files are recognised and included in the index.
  • Meta-data: Traditional search engines have the (unsolved) problem of not recognising the semantics of a document. Hyperwave lets the user define various kinds of Meta–information and keywords. The search for this data is much faster and enhances the quality of retrieval.
      1. Document Management
  • Access control: every object has its access attributes, so Hyperwave handles access control on the object level.
  • Locking: Locking of a version of a document enables concurrency control and organises collaboration.
  • Version control: It’s important to keep different versions of a document to see how the document has changed and to go back to a prior version.
  • Check-in: Makes the experimental version the last committed version.
  • Check-out: Creates an experimental version by copying the last committed version (including anchors and hyperlinks). The experimental version is stored on the server.
  • Revert to version: Makes arbitrary (old) version the last committed version. Newer versions are deleted.
  • Experimental version is just visible to the lock owner (the user who checked out the document) and "system" users, the last committed version by all others.
  • Link consistency is maintained automatically: Dynamic hyperlinks always point to the correct version.
  • Version numbers consist of major and minor version numbers.
  • Document retrieval: See chapter 3.3.4
      1. Access Control

Access control in Hyperwave happens at the object level. Rights can be defined for individual users or groups. With those rights, the visibility of objects like links or documents can be controlled. So different groups of users get a different perception of the Information.



  1. Computer Supported Cooperative Work (CSCW)
    1. Definition
    2. In this thesis the term CSCW is used as abbreviation for the topical area and groupware for those products or applications supporting work groups.

      1. CSCW (Computer-Supported Cooperative Work)

The term "computer-supported cooperative work (CSCW)" was coined by Irene Greif and Paul Cashman in 1984 as a marketing tool for a vision of integrated office IT support -- "...A shorthand way of referring to a set of concerns about supporting multiple individuals working together with computer systems." [Whit96].

There are three areas of research:

  • Development of a general understanding of teamwork and coordination.
  • Development of concepts and tools for the support of distributed work processes.
  • Evaluation of these concepts and tools.
        1. CSCW can be seen as a result of three key background factors [Whit96]
  • The general trend away from manufacturing toward service industries in the Western economies, combined with...
  • The reliance of these service industries on information flows (analogous to the materials flows critical to heavy industries); and
  • The increasingly diversified organisational scope (interorganisational computing; decentralisation) and extended geographical scope (globalisation) of operations in enterprises of all types.
      1. Groupware

Groupware is software that supports and augments group work. It is a technical term meant to differentiate "group-oriented" products, explicitly designed to assist groups of people working together, from "single-user" products that help people pursue only their isolated tasks [Greenb91].

The goal of groupware is to make the process of people working together more effective. This compares with previous desktop computing innovations – word processing, spreadsheets and the like – that made individual users more productive. CSCW-supporting software is called Groupware.

 

Examples of groupware components are:

  • Desktop conferencing systems,
  • Videoconferencing systems,
  • Co-authoring features and applications,
  • Electronic mail systems and bulletin boards,
  • Meeting support systems,
  • Workflow systems, and
  • Group calendars (automatic meeting scheduling).

 

Groupware is influenced by some factors:

  • The person
  • The task
  • The organisational structure of the group
  • The technology used

 

In the next two chapters, two different kinds of groupware "Workgroup Computing" and "Workflow Management" will be introduced. There are several reasons to differentiate between these two. [Prinz] e.g. uses circulation folders as workflow tools to support structured work processes and shared workspaces for workgroup computing to provide a working environment for less structured processes.

  • The most significant difference is the focus. Workgroup computing focuses on the information being processed, enhancing the user’s ability to share information within workgroups. Workflow emphasises the importance of the process, which acts as a container for information (see [Koulop]).
  • Workflow systems need a set of rules to define the steps for the problem solving. Workgroup computing is more flexible and spontaneous.
  • The user controls "workgroup computing" tools. The user initiates the interaction. Workflow is defined at the beginning of the process and then the Workflow system initiates the necessary actions to finish the task (computer-mediated communication).
  • The basic idea of Workflow is to divide the problem into several smaller sub-problems, which can be solved by different people. "Workgroup Computing" focuses on people working together at the same time to solve one big problem.
  • The number of participants in Workflow systems can be large, but in workgroup systems, the number of people involved in the solution of a problem is still limited because of the difficulties in concurrency control and coordination.
      1. Workgroup Computing

Workgroup Computing is the application of a computer-based and commonly usable environment for the support of teams to fulfil their common tasks. Supported are primarily:

  • Coordination of tasks (see chapter 5.2 for awareness, chapter 5.3 for concurrency control and chapter 5.4 for social conflict management),
  • Communication (see chapter 5.5) and
  • Collaboration (see chapter 5.6).

 

Workgroup Computing tries to create a virtual enhanced office or work space (see [Rosema96], [Fitz96], [Fahlen93]) where people can meet and work together to solve a problem in a group. The shared workspace is a communication medium (see [Mitche95]).

Workgroup Computing systems don’t define strict rules for the cooperation of these people but leaves it mostly to the people to coordinate their tasks. A strict concurrency control like that used for databases would be more restrictive than necessary (see [Munson96]). This is flexible and straightforward for asynchronous software, where people usually don’t work at the same time on the same problem, like Email. It wouldn’t make much sense to write an answer to a letter one does not have yet.

However, it gets more complicated with synchronous software. People work on the same document at the same time. In a real office, people are usually aware of the other’s action and focus of interest. The systems should encourage awareness to provide similar information about the shared context.

Workgroup Computing systems usually provide two ways of working together (also called two ways of communication): Direct communication and collaboration (indirect communication through shared artefact and common workplaces).

      1. Workflow Management

Workflow Management is the planning, simulation, execution and control of business processes and the providing of necessary tools and information. Studies of working behaviour have increasingly observed that the "coordination" of work is, itself, work (see [Dourish96]). Workflow systems offer to relieve users of the burden of coordination, by managing task coordination within the system, so that the user can focus on the work activities. Workflow technologies are increasingly going hand in hand with the popularity of business process re-engineering.

The emphasis in workflow management is on using computers to help manage business processes and boost productivity by eliminating overhead time spent in collecting and disseminating the information needed for performing tasks. Furthermore, workflow systems can be used to monitor the task.

Although usually used for clearly defined business processes workflow technology can also be applied to highly individualised processes. Ad hoc workflow requires the use of graphical workflow development tools that are easily created and modified by the end user (see [Koulop]).

 

Components of workflow systems are:

  • Workflow editor: for graphical planning of processes.
  • Workflow simulator: for the simulation and verification of business processes.
  • Workflow engine: for the execution of business processes.
  • Workflow monitor: for the controlling and monitoring of running processes.
    1. General Problems with Groupware
    2. Jonathan Grudin mentions 8 problems in [Grudin] related to groupware. Volker Wulf writes about conflict management in [Wulf97].

      A key to successful groupware is flexibility. Different kind of users with different styles of working uses the software. The working style also changes over the time the software is used and in the different phases of the process. The software should run in different environments and should be usable for various tasks.

      1. Technical Problems

This chapter describes problems related to the creation and testing of software, to the necessity of flexible groupware which adapts to the environment, to the user and to the task.

  • Difficulty of evaluating groupware: Groupware must often interface simultaneously to users with different and sometimes shifting roles, preferences, and backgrounds. Users can be tested in a laboratory on the perceptual, tactile, and cognitive aspects of human-computer interaction that are central to single-user applications, but lab situations and partial prototypes cannot reliably capture complex but important social, motivational, economic, and political dynamics.
  • Complexity of software: Either groupware is specialised in a specific work task or/and in a special group of users or it is designed as flexible software. Some of the aspects of flexible software are:
  • Different communication paradigms: Groupware can support different kinds of data exchange like video/audio (e.g. video/audio on demand), big files and chat texts. These data sources need different kinds of transmission (parameters are "amount of data" and "amount of data per time period"). Another important parameter is the QoS (Quality of Service): sometimes it is acceptable to lose some of the data. Most probably, it is acceptable for online video but surely not for the transmission of a file of important data. Does the data source need to be sure of the successful transmission (e.g. an automatic observation point which needs to report a significant increase of the waterlevel of a river). Sometimes the QoS is even different for different clients (the increase must be reported to the crisis team but not necessarily to the statistics team).
  • Different levels of concurrency control: The concurrency control mechanism has to adapt to the different work styles and phases of work. For example, two people decide to co-write a book. They decide to define each chapter as a unit. In the brainstorming phase, both people want to add notes and ideas to all of the chapters at the same time. In the second phase, each author starts to write his chapters (without an interference of the other author); and in the third phase they want to edit the same chapters at the same time again, but just different paragraphs (the paragraph is defined as a semantic unit). One reviews the others chapters and the other one already implements the necessary changes. Therefore, they need to change the concurrency control policy over the different phases of the project.
  • Different levels of awareness of co-workers: [Lee96] describes the use of Portholes (video awareness tools) to identify, develop and maintain work and social relationships for distributed groups by fostering a sense of proximity, accessibility and community-hood. Awareness creates the context of a virtual office or place to work. A problem is the sense of loss of privacy and feeling of surveillance and monitoring. Therefore, the software allows controlling the resolution of their image. In the example about the co-writing of a book (mentioned above) the two authors also need different levels of awareness of the others focus in the text. Too much awareness could result in an overflow of information and distraction; too little awareness causes increased concurrency and social conflicts. In the first phase for example the two people work highly parallel and need to know about the exact position of the others position in the text. In the second phase, they just need the chapter the other one is working on currently. In the third phase, they need to know the paragraphs edited by the co-worker.
  • Different hardware environments: Most networks are heterogeneous. Organisations often use workstations and PCs in the same network. There are even different kinds of LAN or WAN. Some networks support multicasting, others need software emulation of that function. The QoS, transmission rate and time also play an essential role. The software needs a robust protocol, graceful degradation and extensibility to be able to react to changes in the hardware environment.
  • Different software environments: Not just the hardware environment but also the software environment is heterogeneous. People use UNIX, MS Windows, VMS or other operation systems. Thus for an organisation to allow a flexible user and group management the software may need to be deployed on different platforms being platform independent (e.g. Java). In addition, Groupware often needs to be integrated in or integrates itself into other software packages like word processors. This way people still can use their own software, which again helps to get a higher user acceptance of Groupware. To be flexible, open protocols need to be established and used.
  • Different group organisations: Groupware also needs to adapt to different group organisations and roles in the group. A hierarchical group for example often needs different mechanisms to resolve conflicts than a flat group. In a hierarchical group conflicts could be resolved by the decision of the group leader. However, if the people are at the same level they need to find a solution together.
  • Exception handling in workgroups: Work processes can usually be described in two ways: the way things are supposed to work and the way they do work. A wide range of error handling, exception handling, and improvisation are characteristic of human activity. We have to recognise a large amount of ad hoc problem solving in human activity. This especially makes the creation of workflow tools a very difficult task because workflow tries to find a "standard process" for using it as a model.
      1. Social Problems
  • The disparity between who does the work and who gets the benefit: Most groupware applications require some people to do additional work to enter or process information required or produced by the application. A company for example introduces a new electronic calendar system for automatic meeting scheduling. The direct beneficiary is the meeting convenor, typically a manager or secretary, but for the feature to work efficiently, everyone in the group must maintain a personal calendar. If the calendar is not properly maintained, the system will not work. That is one reason why email was widely accepted. The user was the direct beneficiary. Another example is the automatic monitoring and protocolling of the user’s action. So the progress of work can be observed by the manager but the user’s problem is the loss of privacy and feeling of surveillance.
  • Critical mass: Most groupware is only useful if a high percentage of group members use it. An automatic scheduling system can only be used if all the members of a group use the system and maintain the data properly. This makes the introduction of a new system even more complicated. The early adopters cannot use the system to its full extent, because not everybody uses it. They may well abandon it before the critical mass of users is reached.
  • Social, political and motivational factors: Groupware may be resisted if it interferes with the subtle and complex social dynamics that are common to groups. Often unconsciously, our actions are guided by social conventions and by our awareness of the personalities and priorities of people around us, knowledge not available to the computer.
  • Designing for infrequently used features: Infrequently used groupware features must not obstruct more frequently used features, yet they must be known and accessible to users. Most writing is done alone, whether single-authored or on a section of a jointly written document. Who would abandon their favourite word processor to use a co-authorship application? The next generation of workgroup tools is said to address the issue and embed the service in other applications through standardised interfaces and interchange formats [Abbott94].
  • The breakdown of intuitive decision-making: Decisions to develop unworkable applications are frequent. The problem often lies not in the detailed design but in the conception, in the nature of decision-making in development environments. Most product development experience is based on single-user applications. In particular, decision-makers are drawn to applications that selectively benefit one subset of the used population: managers. Project management applications primarily benefit project managers; meeting schedulers and meeting management systems benefit those who convene meetings; decision support systems primarily benefit decision-makers.
  • Managing acceptance: A New Challenge For Product Developers: A word processor that is immediately liked by one in five prospective customers and disliked by the rest could be a big success. A groupware application to support teams of five people that initially appeals to only one person in five is a big disaster. Groupware must be introduced very carefully.
  • Latent conflict management: Single-user software creates the illusion of being alone in the system. Conflicts (e.g. simultaneous access of database records or documents) are resolved by the system itself. However, people react positively to a more open conflict management (see [Wulf97]). That way, latent conflicts can be resolved at an early stage without escalating.
    1. Different Categorisation of Groupware
    2. The diversity of groupware applications is enormous, largely due to the lack of agreement as to the exact boundaries of the field. However, these applications can be categorised along a number of axes, as done here (see [Mitche95]).

      1. Focus of the Cooperative Activity
  • Focus on the user: The focus can be on communication between users. Information is delivered from one point to another through one-way channels, mostly from point to point, if it is synchronous communication with a large amount of data, like video or audio conferencing. Asynchronous communication usually uses server for delivering data, like email systems and discussion groups.
  • Focus on the document: People are working on the same document. Concurrency control becomes important. Examples are whiteboards or collaborative editors.
  • Focus on the process: In workflow systems the focus is on the process. Workflow systems are discussed in the next chapter.
      1. Amount of Structure Involved
  • Unstructured work: In an unstructured work environment like a brainstorming session, concurrency control by the system is often unwanted. Social protocols are often used for the basic control of co-operation. Groupware only supports the team members by creating a common virtual workplace (even if the members are dislocated) with basic communication and collaboration functions and by improving team coordination (see chapter 4.1.3).
  • Highly structured work: In applications dealing with structured data, people often have clearly defined roles, like mediators and observers. The application can take over the role of concurrency control and even controls the flow of information and monitors the progress of work.

Also of influence is the structure of data and semantic objects:

  • Unstructured data: A whiteboard can be based on manipulation of pixels. Single pixels are not semantic units. So concurrency control is either based on the whole picture (one painter and one or more observers) or the application bases on social protocols to coordinate concurrent task.
  • Structured data: If a whiteboard defines objects, like squares and circles, to create the image, concurrency control can base on object level.
  • Hierarchical data: Books for example are divided into chapters and again into sub-chapters. Applications can adopt their granularity of concurrency control to respond to the need of the level of concurrency.
      1. Degree of Embedded Semantics of the Collaborative Task
  • Collaboration-awareness: Systems which are aware of the fact that several people are working in cooperation, are called collaboration-aware.
  • Collaboration-transparency: Systems which do not contain semantics for collaborative tasks, are called collaboration-transparent. Single-user applications could for example simply lock documents before working on them. This leads to a simple turn-taking mechanism, which is just a limited kind of collaboration.
      1. Levels of Sharing

[Bentley94] identifies three levels of sharing:

  • Presentation-level sharing: Each user looks at the same display of information from a common information space, also called WYSIWIS (What You See Is What I See).
  • View-level sharing: Each user has a presentation of the same information, but the presentation may differ, also called "relaxed" WYSIWIS.
  • Object-level sharing: Each user is working in the same information space, but different information is drawn from it (for example because of different access rights or preferences).
      1. Location of Users
      2. The location of the user can be either remote or local, but this distinction is not so important for collaboration software, especially for software based on the Internet. However, it is important for scalability and reliability. For software which is only used in the Intranet of a company or university, it can be easily estimated how many people will use the system; a system which is accessible from the Internet, has to scale very well. Furthermore, the Intranet network is usually more reliable than the Internet.

      3. Time of Collaboration
  • Synchronously: If highly concurrent work is desirable, concurrency control has to be fine-grained or relaxed.
  • Asynchronously: With asynchronous collaboration, if concurrent work is unlikely, concurrency control can be at a very coarse-grained level without disturbing people’s work.


  1. Workgroup Computing
  2. Figure 1

    When projects become more global there is a need for new powerful tools to organise and manage the work in groups, which may be spread all over the world (for an example see [Fielding97]). A group of 5 to 10 people distributed over the whole world can be managed by using the telephone and fax. A group of 50 people in a company can be managed by regular meetings. However, a large group of people distributed in place and time needs better tools for management and information dissemination.

    Think of writing a book with several guest-authors, like [Maurer96]. Of course, the group members could be called by telephone and information could be exchanged by fax, leaving to the other side to key in the information again. Alternatively, one could use Internet-based tools like email and a document management system like Hyperwave.

    Let’s look at different kinds of system architecture, because the architecture defines some characteristics of the system like reliability, response time, ability for locking and scalability. Then a deeper look has to be taken into the topics of awareness, concurrency control, social conflicts, communication and collaboration. The chapter will be concluded with a look at two examples: Lotus NSTP and Hyperwave’s Document Management System.

    1. Architectures

If an application needs to be created for the Web, some important points need to be observed:

  • 24 hour availability: If people use the application all over the world, 24 hour- availability is important because of the time difference.
  • Scalability: The internet makes it possible, that the number of people using the service increases to more than 100 percent in just one day. With accessibility from all over the world, one can get millions of people using the system. Therefore, the system has to grow on demand.
  • Response time: Response time is another important factor in satisfying the users of a system. This point is closely related to the second one. Using synchronous collaboration, like sharing a whiteboard, the system depends on quick reaction of the client’s interfaces to changes.

 

Calculation example: video/audio transmission with network bandwidth limitation

In a video conference, the amount of data created per time unit stays approximately the same (the compression ratio could change over the time). So there is the same amount of data created each second and has to be transmitted in a certain time limit (real time criteria), usually in the same amount of time it is generated. This makes sure that there isn’t more data created than can be transmitted. There is no hardware-supported multicasting.

The bottleneck in this example is the bandwidth of the network (most likely if the Internet is used). Another bottleneck could be the maximum amount of data the client is able to send per second.

The QoS (quality of service) is not so important in this example. If some packages of data (frames or part of frames) are lost the communication is still possible. Therefore, in this example the confirmation of the successful transmission of data is not necessary.

#m..............Number of group members

bn...............Average bandwidth of the network for point-to-point communication [ bit/s ]

dt................Amount of data created per second [ bit/s ]

tp................Average transmission time of one bit: is neglected in this calculation. It does not change the maximum number of group members or the amount of data which can be transmitted. This parameter must be considered in a hard real time system, because it increases the delay to get the information.

tu................Time unit to transmit the data

tt.................Maximum transmission time of the whole information to the group [ s ]

      1. Distributed or Client-to-Client

Figure 2 (one transmission)

This is a totally distributed system. Usually there is a central entry-point to get necessary information for joining a group, like the addresses of other group members. Then the group members just communicate to each other.

Microsoft’s NetMeeting and Netscape’s Conference can be mentioned as an example for such architecture. When these programs start, they optionally notify a public Internet directory, which are used by other people to look up "telephone numbers". However, the rest of the communication (when they connect to somebody) is just client-to-client.

This architecture is often used, if much data has to be transferred with short delay (like video and audio data) and locking and causal multicast is not so important. The bottleneck of a server is avoided and there is no additional time of transfer to the server.

If the group gets too big, the workload for the clients gets bigger, because they have to send the information to every other client. To overcome the problem the network could support multicasting protocols.

 

Characteristics of this architecture

  • Fat client: The client has to deal with communication, locking and error handling because there is no server to deal with it. In addition, the client has to keep the list of other members of the group. So each client has an overhead of programming data and information.
  • Locking: Distributed locking is more difficult than central locking.
  • No central component: This makes the system very reliable. Even if some clients crash the others can still keep on working (after error-recovery). The only central component would be a service for getting the groups entry points. As soon as the client has that information, it doesn’t need the service anymore.
  • Fast transmission of much data: there is no such bottleneck like a central server and the additional time for the transmission of data from client to server is not necessary. Thus, this architecture is well suited for real-time applications if the group is not too big.
  • Scalability: the maximum size of the group depends on the amount of data to transmit per time unit, the bandwidth of the network and the amount of data the client is actually able to transmit per time unit. The number of groups is virtually not limited.
  • Response time: the response time is very short, because the information is sent from point to point. The response time gets longer the bigger the group is.

 

Calculation example:

Number of bits per second to transmit:

Every bit created has to be sent to all the other group members.

Transmission time for the information created in one second:

The amount of information created in a time unit has to be transmitted in at least the same time unit if not faster; otherwise, the information source would create more data than it can transmit.

Maximum number of group members:

#m is the maximum number of group members so that the amount of data created per second can be transmitted to all the members in a second.

 

Thus, the maximum number of users in a group is limited by the maximum amount of data the client can transmit and the amount of data created per time unit.

      1. Selected Client as a Serialisation Point

Figure 3 (one transmission)

This architecture is partly distributed, but has central components. One client is selected to be the server. Client-to-client systems often implement this type of architecture as well, to have a central point for storing some information (e.g. for locking). This central point is used as a serialisation point.

  • Distributed mode: If the transmission of video/audio data is still done client-to-client, it behaves like a distributed or client system (see last chapter). In this mode, the central client, which serves as the group server just stores general group information and locking data. This just makes the locking easier.
  • Client-server mode: If all the data transmission is done through the group server (it also serves as a multicasting point), the number of clients is limited by the amount of data created in the whole group and the amount of data the group server is able to transmit in a time unit. In this mode, the system behaves like a central or client-server system (see next chapter). Just the error recovery and reliability is better.

 

Characteristics of this architecture

  • Fat client: like the purely distributed architecture, we discussed before, the software of these clients is rather complicated. However, just one client has to store general information like team members and locking information.
  • Reliability: Every client is able to be a server, so if the group server crashes or is not available, another client can take its part and perform the error recovery.
  • Locking: The "central" client is used to store locking information and other central data. This makes locking much easier, because the central client serves as a serialisation point. The decision making is done centrally by this group server and not distributed in the group.
  • Scalability: the maximum size of the group depends on the amount of data the central client has to and is able to transmit per time unit and the bandwidth of the network. The number of groups is virtually unlimited. The server as the central point of entry just has to serve the information of group entry points.
  • Response time: the response time is very short, because the information is sent from point to point. The response time gets longer the bigger the group is.

 

Calculation example:

Distributed mode:

Number of bits per second to transmit:

Every bit created has to be sent to all the other group members.

Transmission time for the information created in one second:

The amount of information created in a time unit has to be transmitted in at least the same time unit; otherwise, the information source would create more data than it can transmit.

Maximum number of group members:

#m is the maximum number of group members so that the amount of data created per second can be transmitted to all the members in a second.

Client-server mode:

Number of bits per second to transmit:

Every group client (except for the client that plays the role of a server) has to send its information to the server.

The server has to send the information it got from the clients to all the other clients. In this example, the server sends the information to the whole group (except itself).

Transmission time for the information created in one second:

This is the transmission time of data from the clients to the server. Here the bandwidth of the network is the bottleneck.

The data from all #m clients (including the server) must be sent to (#m - 1) clients (not to the group server) from the server.

The amount of information created in a time unit has to be transmitted in at least the same time unit. Otherwise, the information source would create more data than it can transmit.

Maximum number of group members:

Therefore, the maximum number of group members decreases with the square root of the maximum amount of data the group server can transmit and the amount of data created per time unit. This protocol does not scale as well to the number of group members.

      1. Central or Client-Server

Figure 4 (one transaction)

In this architecture, different software is used for the client and server. So every part can be specialised in its task.

 

Characteristics of this architecture

  • Thin client: the client just has to implement the communication protocol to the server. Most information is kept with the server.
  • Easy locking: requests are automatically serialised; locking is easy because of the data being kept centrally and central decision making.
  • Atomicity of message delivery could be easily implemented.
  • One central process: If the server fails, the clients cannot continue with their work unless there is a backup server. This server is also a bottleneck of the system.
  • Scalability: the central server is the bottleneck of the system. The maximum size of the group depends on the amount of data the central server has to and is able to transmit per time unit and the bandwidth of the network. The number of groups is limited, if they all use the same server.
  • Response time: the response time is longer than in the distributed system, because the information has to be sent to the server and the server forwards it to the clients.

 

Calculation example:

Number of bits per second to transmit:

Every group client has to send its information to the server, which is not group member.

The server has to send the information it got from the clients to all the other clients. In this example, the server sends the information to the whole group, even to the client from which he got the information.

Transmission time for the information created in one second:

This is the transmission time of data from the clients to the server. Here the bandwidth of the network is the bottleneck.

The data from all #m clients (including the server) must be sent to all #m clients from the server. In this example, the server sends the information to the whole group, even to the client from which he got the information.

The amount of information created in a time unit has to be transmitted in at least the same time unit. Otherwise, the information source would create more data than it can transmit.

Maximum number of group members:

Therefore, the maximum number of group members decreases with the square root of the maximum amount of data the group server can transmit and the amount of data created per time unit. This protocol does not scale as well to the number of group members.

      1. Paradigm for Enabling Large-Scale Group Collaboration

The necessity of different communication paradigms for groupware was discussed in chapter 4.2. The demands can change in the different stages of work. In addition, different kinds of data need different sending paradigms [Gall].

 

Publish/subscribe paradigm

Publisher

Distributor

Subscriber

Figure 5

This paradigm ([Mathur95]) is characterised by one or more data sources or publishers sending data to multiple recipients or subscribers by using publishers. A publisher multicasts data to a set of intermediate nodes, referred to as distributors. The distributors then route the data to other distributors or local subscribers. The direction of communication is just one way (from publisher to subscriber) and anonymous. The publishers are aware of their recipients, but the subscribers are unaware of each other and just aware of the publisher that they are receiving data from.

This paradigm supports a weak form of reliability for the subscribers. If one publisher crashes, the subscriber just searches for the next publisher and subscribes again. However, that way the subscriber could lose some information.

    1. Awareness
    2. One of the specific design goals of multiprocessor operating systems has been to give each user the look and feel of being the only one on the system. One prints to queues to be able to print documents even if the printer is busy at that moment. Databases try to serialise concurrent operations of users to ensure that they have the same effect as operations which are performed one after the other. Information about other people using the system must be explicitly requested.

      However, in groupware the awareness ofofof other people is crucial. Nowadays the members of a human team are often spread among several departments of an organisation or even live in different countries. Groupware interferes with and changes the subtle and complex social dynamics that are common to teams. Often unconsciously, actions of group members are guided by social conventions and by the awareness of the personalities and priorities of other people, knowledge not available to the computer [Grudin]. If collaborating teams use distributed applications for their work, coordination tasks are also carried out through (and should be supported by) the software system.

      Workspace awareness creates a common working environment for the team members [Greenb96]. So the effort needed to coordinate tasks and resources can be reduced, people move easily between individual and shared activities, and a context is provided to interpret other people’s activities [Gutwin96]. Group awareness can be defined as "an understanding of the activities of others, which provides a context of your own activity" [Dourish92].

      1. Types of Awareness

[Schlichter97] mentions four types of awareness:

  • Informal awareness of a work community is basic knowledge about who is around in general (but perhaps out of site) or who is "physically" in the same room.
  • Group-structural awareness involves knowledge about such things as people’s roles and responsibilities, their positions on an issue, their status, and group processes.
  • Social awareness is the information that a person maintains about others in a social or conversational context: things like whether another person is paying attention, their emotional state, or their level of interest.
  • Workspace awareness is the up-to-the minute knowledge a person requires about another group member’s interaction with a shared workspace if they are to collaborate effectively.
      1. Filtering Awareness Information
      2. There are some reasons why the groupware system should filter the awareness information before it is brought to the user’s attention. In different phases of collaborative work it is necessary to switch between individual and shared activities. In these phases the level of awareness needed for efficient teamwork changes.

        1. Outgoing filters
        2. It is very important for most people to have a certain level of privacy. Let’s take the example of a telephone as a tool for enabling groupwork. If the call is not accepted the calling person does not know whether the person being called is not in office or is just too busy to pick up the receiver. This information would be helpful for the calling person to decide whether to call again in a few minutes. If the person being called is busy, it is very disturbing to be called every five minutes. If the person being called is just not in the office, that is no problem. For the person being called this lack of information means more privacy [Lee96]. These different interests of people with different roles can be called social conflict (see chapter 5.4).

          In other phases of their work, when the people need more active collaboration to finish their tasks, team members could decide to provide more information about themselves to facilitate collaborative activities.

        3. Incoming filters

        In a real office people permanently get information about what is going on at that moment. This can be disturbing. In distributed collaborative software, filters could reduce the amount of awareness information to avoid information overflow. Again in some phases the level of awareness needs to be higher.

      3. Issues and problems

[Gutwin97] mentions some general issues that complicate the search for general and transferable awareness requirements.

  • Domain specificity: Much of what a person needs to know about others depends heavily on the application domain and the person’s own role in that domain (e.g. information distributor or observer).
  • Information importance: Some awareness information is crucial for the completion of a shared task. Other information is beneficial but not critical. Team members need to be aware of critical information, but they should be able to decide whether to be informed of additional information. Usually it is easy for system designers to find out which information is critical (e.g. a countdown for a system shutdown), but it is more difficult to find out by what additional information the collaboration is supported.
  • Changing requirements: As mentioned in chapter 5.2.2 the optimal level of awareness changes over time, because people shift their focus between individual and shared tasks. Adaptable filters could provide flexible amounts of information.
  • Effects of expertise: As people become more familiar with a domain, a task, the software, and a group of collaborators, they are able to infer more and more about other people’s activities from smaller and more subtle perceptual signals.
  • Evaluation: Awareness is not a quality that can be easily measured, and showing the benefits of awareness support in groupware is difficult at the best of times. Evaluation is complicated by the lack of a clear cognitive theory of what awareness is and how it works. Studies of awareness support in groupware cannot rely only on time and errors. ([Greenb97], [Gutwin96])
    1. System-Controlled Concurrency Control
    2. When two or more users work jointly together sharing one object, there is a need for the synchronisation of their actions to ensure the consistency of the object. Conflicts which can be resolved by the system itself will be called "software conflicts" in contrast to "social conflicts". Software conflicts are conflicts through the multiple access to an object or semantic unit at the same time, network bandwidth and transmission problems or other hard- and software problems.

      Concurrency control has been used for a long time in the area of database systems. First, the drawbacks of restrictive concurrency control of database systems used in the field of collaboration and groupware are discussed. Secondly, it has to be talked about different categories of collaboration control. Then the requirements for dynamism in groupware concurrency control are outlined.

      1. Drawbacks of Traditional Database Concurrency Control

[Munson96] addresses four drawbacks of traditional database concurrency control when used in collaboration systems.

  • Traditional database concurrency control is generally too restrictive for collaboration systems. Database transactions consist of simple read/write operations. The semantic of the shared objects of collaboration is usually more complex. So conservative database-like concurrency control is often more conservative than necessary. If for example people work concurrently on writing a book, then blocking the whole bibliography chapter for insertion is unnecessary. Concurrent entries could be allowed (risking redundant entries that can be removed in a consolidation phase), but it could be necessary for the chapters of the book.
  • Traditional database systems do not allow concurrent transactions to mutually depend on each other. Users of a groupware system may be expected to influence each other. Let’s take again the example of jointly writing a book. If one author needs to insert a new bibliography citation and observes another author doing the same, this author can see if the other one tries to insert the same citation before he has finished.
  • Collaborative systems users may wish to temporarily allow conflicting actions and delay their resolution until some later time. Conventional database systems do not allow the database to remain in an inconsistent state for indefinite periods. In a brainstorming session, this could be desired; or joint authors of a book may independently add bibliography citations and leave removal of duplicates to a later stage of their work.
  • When a conflict is identified, a conventional database system will throw away all work that led to the conflict and returns the database to a prior consistent state. For a user about to commit a large number of changes to a document, this would be unpleasant and unnecessary. Maybe only the changes of some paragraphs need to be discharged.
      1. Different Categorisation of Concurrency Control
  • Pessimistic vs. optimistic: Pessimistic concurrency control ensures first, that the user’s action causes no inconsistency. If for example locking is used for concurrency control, the action is not performed until the lock is granted. Pessimistic concurrency control reduces concurrency. Optimistic concurrency control on the other hand lets the user go on with his action "hoping" that this action will not cause inconsistency of the data. If this happens, the system has to undo the action to bring the system back into a consistent state. This way of concurrency control ensures a maximum of concurrency but is more complicated to implement because of the need for an undo function.
  • Social vs. system-based control: Concurrency control based on social protocols means that the user himself ensures that the system stays consistent. Email is a typical groupware tool which is based on social protocols. Users are not prevented from answering an email they have not received yet but it would make no sense, so people usually don’t do it. To make concurrency control based on social protocols possible, the user interface has to provide some kind of awareness of the other user’s actions. In a shared editor, the author could for example see, which chapters are being worked on by other users. The author avoids interference by moving to chapters which are not currently in use. System-based control means that this concurrency control is integrated into the system, which usually is not so flexible. The system would lock chapters or paragraphs (depending on the granularity) which are currently in use by other authors.
      1. Requirements for Dynamism
  • Different kinds of data require different levels of consistency: Brainstorming sessions for example need highly interactive work. If there is a document, which is passed from one public servant to the next one to be completed, it needs to be assured that people who have finished their work are not allowed to change anything afterwards.
  • Different modes of collaboration require different kinds of awareness: Too much awareness distracts people from their work. Not knowing enough could lead to misunderstanding and conflicts. The level of awareness often depends on the level of concurrency.
    1. Social Conflict Management
    2. The last chapter was about system controlled concurrency control. However, there is another type of conflict potential besides "software conflicts": "social conflicts" (see [Wulf97] for more details)

      Groupware affects and is affected by the social structure of the group. Let’s take group calendar software for meeting scheduling. The problem is to find a free time slot of a group for a meeting. The usual procedure of the group for example is to ask the group manager to organise a meeting and he asks the group members about possible time slots. The group manager doesn’t just look at their calendars, he also asks everyone about preferred times and finds the optimum time this way.

      If the software allows everybody to enter new meetings in other calendars, this would change the group’s procedure. Thus, the group has to agree on certain procedures and stick to them. However, it would be helpful if different ways of controlling are supported by the system.

      Another thing is the necessity to exchange informal information. If there are two possible time slots for a meeting people might prefer one because of some reasons (less fragmentation of one’s own time schedule for example). It could be helpful to the group if the system supports (in-)formal negotiation of different solutions.

      1. Information Flow
      2. Lets look at groupwork as group members activating functions, which affect other group members. A function could be the changing of a text (which could be used by other members at the same time) or the calling for a meeting with a calendar software or the changing of a record, which could be used by others as well. For a generalisation, let’s say that one activator tries to execute a function, which affects one or more group members.

        This could cause conflicts of interest. Let’s take NYNEX Portholes ([Lee96]) as an example. Every computer has a video camera to observe its user. For the other group member this additional information could be useful to see if this person is too busy to be interrupted or if this person is not there at all. However, the observed person experiences the loss of privacy and surveillance.

        Another conflict could be that the group leader calls for a meeting at a time when one or more group members are too busy.

        The system should support a way for communication and information flow to resolve these social conflicts. Information is necessary to better understand the position of the activator and the affected person. The activator for example could inform the group members that a meeting is necessary because the goals of the project changed. The affected person could inform the activator about problems concerning the execution.

        If the system supports a two-way information flow between activator and affected person, they could negotiate a solution through communication.

        The information flow between the activator and affected person can be formal (choosing one information out of n) or informal (chatting with the other one). Formal information is limited but easier to be analysed by the system. Informal information could explain the problem more exactly but takes more time to create and evaluate.

        If information flow is necessary to resolve a broad range of problems, a mixture of formal and informal information could be the most efficient way.

      3. System-Supported Social Management
      4. Below is a description of six different ways of system-supported (not system-controlled!) social controlling. For more details see [Wulf97].

        1. One-Sided Controlling
        2. One person checks if the execution of a function violates the group’s rules and executes it if possible.

          A typical function of group calendar software could be the insertion of a meeting into the calendar of another group member. This person is the activator of the function. In a group calendar software this means that everybody can add new meetings to other calendars as long as there is no other meeting at the same time (necessary criteria) and as long as the person has the right to activate the function.

          The affected person is not notified of the activation of the function. There is neither communication nor information flow. The system does not support the resolution of a social conflict.

        3. Countermeasurement
        4. One person (the activator) activates a function. The affected person is not notified of the activation (before or after the execution of the function itself), but the affected person can pre-define an automated reaction of the system to the activation.

          As long as a group member for example is not sure if he is there on the next Tuesday, he might prevent an entry to his electronic appointment book.

          Again, the system supports neither a communication channel nor the information flow between activator and affected person. However, the person is able to pre-define certain preferences.

        5. Activation-Related Transparency
        6. One person checks if the execution of a function violates the group’s rules and executes it if possible. Again, this person is called activator. The affected person is notified of the activation (before or after the execution of the function itself), but the affected person can not control or prevent the execution of the function.

          The system does not support a communication channel between activator and affected person, but there is a one-way information flow. This flow of information could be used to start further non-system-supported social conflict management (like making a telephone call to the activator to complain about the time of the meeting).

        7. Intervention
        8. One person (the activator) activates a function. The affected person is notified by the system and decides if the execution of the function should be cancelled or performed. There is a timeout for the reaction to make sure that the conflict is resolved in a certain period.

          Here the system supports the one-way information flow from activator to the affected person, but there is still no communication channel between these two. The activator just gets feedback which says if the function actually executed or not. If the affected person aborts the execution, the activator does not know the reason.

          Again, this system-feedback can start further non-system-supported social conflict management.

        9. Annotation Support
        10. One person (the activator) activates a function. The affected person is notified by the system and may send back (in-)formal information concerning the activation of the function.

          The system supports the one-way information flow from the affected