Hasso-Plattner-Institut Potsdam Operating Systems and Middleware Group at HPI University of Potsdam, Germany
Operating Systems and Middleware Group at HPI

Dependable Systems

Dr. Peter Tröger

Summer 2010

Please note: Oral exam takes place in A1.1.

Description

Continous service provisioning is a key feature of modern hardware und software server systems. These systems achieve their level of user-perceived availability through a set of formal and technical approaches, commonly summarized under the term dependability.

Dependability is defined as the trustworthiness of hardware and software systems, so that reliance can be placed on the service they provide. The main dependability attributes commonly known and accepted are availability, reliability, safety, and security.

The Dependable Systems course gives an introduction into theoretical foundations, common building blocks and example implementations for dependable IT components and systems. The focus is on reliability and availability aspects of dependable systems, such as reliability analysis, fault tolerance, fault models or failure prediction. Amongst other things, the following topics are covered:

  • Dependability definitions and metrics
  • Design patterns for fault tolerance
  • Analytical evaluation of system dependability
  • Hardware dependability approaches
  • Software dependability approaches
  • Latest research topics

This course is an extended and adjusted version of the 'Dependable Systems' course by Prof. M. Malek and Dr. F. Salfner at Humboldt University, Computer Architecture and Communication Group

Regularities

Students taking this course need to have basic knowledge in operating systems and middleware technology. On request of at least one participant, the course will be given in English. The course contains of two modules: Lectures and the group projects. The successful completion of the project work demands practical experiments in one of the given topics. The results of these experiments must be described in a written report. The pass-grading of the report is the mandatory precondition for taking the oral exam. The final course grade is the oral exam grade.

Slides

Project Reports

Dates

  • Lectures: Tue, 13:00 - 14:30 / Wed, 13:00 - 14:30
  • Project Decision / Final Course Enrollment: May 14
  • Project Presentation: July 13-14 (see below)
  • Project Report Submission: July 31 (5-25 pages)
Oral Exam DateStudent
3.8.2010, 09:00 - 09:30Edgar Naether
3.8.2010, 09:30 - 10:00Sven Wagner-Boysen
3.8.2010, 10:00 - 10:30Benjamin Karran
3.8.2010, 10:30 - 11:00Marko Röder
3.8.2010, 11:00 - 11:30Christopher Schuster
3.8.2010, 11:30 - 12:00Richard Metzler
3.8.2010, 13:30 - 14:00Matthias Richly
3.8.2010, 14:30 - 15:00
30.9.2010, 09:00 - 09:30Paul Römer
30.9.2010, 09:30 - 10:00Martin Schütte
30.9.2010, 10:00 - 10:30Norman Kluge
30.9.2010, 10:30 - 11:00Ingo Jaeckel
30.9.2010, 11:00 - 11:30Jan Schütze
30.9.2010, 11:30 - 12:00Frank Zschockelt
30.9.2010, 13:30 - 14:00Jan Brunnert
30.9.2010, 14:30 - 15:00

Project Work

The knowledge gained from the lectures has to be applied in practical project work. Students need to form groups of 2-3 persons and work jointly on a dependability experiment from one of the topics given below. Each of the topics is supervised by a member of the Operating Systems and Middleware Group. The achieved study results have to presented at the end of the semester, as well as documented in a report. Every group needs to answer the following questions within their oral / written result presentation:

  • How does the product / solution compare to similar solutions ?
  • What are the installation / operational experiences ?
  • What is the supported fault model on the different hardware / software layers in the investigated solution ?
  • What error states are supported / reported ?
  • What is the chosen redundancy approach ? In case of data replication, what is the functional extend and the consistency model ?
  • Is there any specified down-time during error recovery resp. compensation ?
  • What is the performance impact of the chosen fault tolerance technique ?

Project Topic List

Clustering of OpenVMS installations for high availability (Norman Kluge)

  • Supervisor: Bernhard Rabe
  • Presentation: July 13th, 13:00

Comparing clustering solutions for J2EE application servers (Edgar Näther, Sven Wagner-Boysen)

  • Products: JBoss and GlassFish application servers
  • Analysis of FT-clustering capabilities for web tier and business tier
  • Documentation of installation experience
  • How good is the clustered deployment of applications solved ?
  • Performance comparison of clustered solutions (2-tier J2EE application, no data tier)
  • Supervisors: Frank Feinbube, Robert Wierschke
  • Presentation: July 13th, 13:30

Virtualization Fault Tolerance - Feature analysis (Matthias Richly, Christopher Schuster)

  • Analyze available solutions for fault-tolerant operation of virtual machines
  • In-depth investigation of one particular product, preferably VMWare
  • Analysis of failover capabilities (e.g. open network connections)
  • Supervisor: Bernhard Rabe
  • Presentation: July 13th, 14:00

Dependability analysis of railroad design alternatives (free)

  • Supervisors: Uwe Hentschel, Jan-Arne Sobarnia

Software-implemented fault injection in Windows (Benjamin Karran)

  • Implementation of an operating system-based fault injector as driver
  • Interception of system calls for chosen processes and threads (e.g. disk full notification)
  • Comparative test with different applications regarding their error handling capabilities
  • Supervisor: Peter Tröger
  • Presentation: July 13th, 14:30

Software-implemented fault injection in Linux (Frank Zschockelt, Paul Römer)

  • Implementation of an operating system-based fault injector as driver
  • Injection of single bit stuck-at and bit-flip faults at chosen locations
  • Comparative test with different fault locations under the fail-stop fault model
  • Presentation: July 14th, 13:00

FT CORBA (free)

  • Supervisor: Martin v. Löwis

Windows Cluster Services for High Availability (free)

  • Supervisor: Alexander Schmidt

Linux HA Cluster - Available solutions and their properties (Jan Brunnert, Ingo Jaeckel)

  • Supervisor: Peter Tröger
  • Presentation: July 14th, 13:30

Non-relational databases (Richard Metzler, Jan Schütze)

  • Comparison of different NoSQL database products (e.g. Cassandra, CouchDB, RIAK, Gizzard)
  • Deeper comparison of chosen subset with respect to: fault model, replication approach, data lookup handling, consistency model
  • Supervisor: Peter Tröger
  • Presentation: July 14th, 14:00

Distributed Fault-Tolerant File Systems (Martin Schütte, Marko Roeder)

  • Comparison of different distributed file systems with respect to the questions above
  • Examples: Coda, Microsoft DFS, dCache, IBM GPFS, Hadoop HDFS
  • Presentation: July 14th, 14:30

Recommended Readings

Recommended readings (external link)