Distributed Systems, Fall 2001, 3 cu

(Hajautetut järjestelmät, Syksy 2001, 3 ov)

Coursework

General information

Material

Exercises

Literature

The Distributed Transaction Management course starts up on 31.10. The lecture times and places are the same as in the Distributed Systems course.

New instructions on how to return your coursework can be found in the coursework section.

The lectures are finished. The last exercise session will be on Tuesday 30.10. As altogether 20 exercise problems have been given, the required fourth of the exercises is 5 exercises. You can still do them all on the last week, if you wish to do so, as there are five problems, and they are not that difficult.

The exam for this term will be on 15.11.2001. You have to register to the exam. You get to the registration page from here. You can take material to the exam with you (no computers, no-one to assist you). You may answer in English or in Finnish.


General information

If there are foreing students present, then the teaching will be given in English. The course material will be in English in any case.

In principle, this is a fairly traditional course. There are lectures, weekly exercises, and courseworks (probably two). However, the pattern is slightly modernised. Usually, I have something I want to show or explain. For that, there will be traditional lecturing with software demonstrations. You should primarily learn by doing, however. That is why there will be weekly exercises.

I will try to give the exercises on Wednesdays and you will have to do them by the following Wednesday. This way, you can ask about them on Tuesdays, if there are some problems about the exercises. There are no separate exercise groups or sessions - we will do them at the start of the Wednesday sessions.

The course will include a coursework, which is a programming project. In some ways, this is a fairly traditional computer science course. There are lectures (which may not always be like traditional lectures, though), and weekly exercises. There is also a coursework, which is a programming project, and an exam.

The lectures for this course will run for 6 weeks. There are no lectures on 18.9. and 19.9., which means that we can estimate that the last lecture time will be 24.10. My current plan is to do a course on distributed transaction management after this - same lecture time.

Material

I will put leftover copies of the notes in the bookshelf opposite to room 3082 (not far from my office).

Check out the distributed commit simulation applet!

This is the place for the cryptographic examples

One file, Protection.java, was initally missing from the cryptographic examples. It is now there. It is used in StrongServer/StrongClient example.

JCE installations to CS lab 3070 are done and I have checked them by compiling and running "Masher". If you have problems, please contact me.

If the jdk is not in the path, you will either have to add it, like PATH=C:\jdk1.3\bin;%PATH% assuming, of course, that that is the path.

Here is the algorithm missing from Note 4:

Reliable multicast algorithm

On intialization
  Received:={};

For process p to R-multicast message m to group g
  B-multicast(g,m); // also send to self, that is, p

On B-deliver(m) at process q ith g=group(m)
  if (m not in Received) then 
    Received:=Received union {m};
    if (q<>p) then 
      B-multicast(g,m);
    endif;
    R-deliver(m);
  endif;

Exercise information

The answer to the first exercise on 26.9. is 45 and you can find here a program to compute the answer.

Exercises on 26.9.

The first exercises are on 26.9. and the exercises assigned for that day are exercises 1, 2, and 3 from Note 2. A clarification for Exercise 3: First show that if vector timestamps are assigned by the algorithm given in the note and compared with the operations given in the note, then if e happens before e', then the vector timestamp of e is less than that of e'. Then study whether the following holds: If the vector timestamp of e is less than the vector timestamp of e', then e happens before e'.

Exercises on 3.10.

  1. Reformulate the Bully algorithm so that it is only necessary to decide on the coordinator and there is no need to distribute a new task to the participating processes, and if a site times out (does not hear from the others), it re-starts the election.
  2. How does the Bully algorithm behave when a network partitions?
  3. Suppose that a distribute snapshot is taken using the method given in the lecture notes. If the causal order of all events is the same, do we always get the same snapshot?

Exercises on 10.10.

  1. Demonstrate by example that in multicasting, delivery in causal order, FIFO order, and total order can all be genuinely different.
  2. Solve Byzantine Generals for three generals in the case where the generals sign digitally their messages, ie if a general passes on, say, top generals message, you will know if it was genuinely from the top general.
  3. Construct a solution to a reliable and totally ordered multicast in a synchronous system, using a reliable multicast and a solution to the consensus problem. You may just assume that these two solutions exist.

Exercises on 17.10.

  1. Demonstrate by example, that if you just collect (no snapshot algorithm) local waits-for graphs (graphs showing which processes are waiting for which processes' locks), then they might show a deadlock which does not really happen.
  2. Is it true that if the coordinator does not crash, then the two-phase commit protocol (2PC) does not block ie the correct processes do not need to wait for the recovery of the crashed ones. Why?
  3. Discuss the difficulties in solving the consensus using an atomic commit protocol, or doing atomic commitment using consensus.

Exercises on 24.10.

These exercises are based on the cryptographic example programs.
  1. Make a program, which writes one message (fixed-length, whatever) to a socket along with a digest code of the message, and another program, which waits for the message and the digest code, and when it retrives them, it checks the message agains the digest code, and prints out if the code matches the message. You can get the digest code part from the Masher example, and the communication part from the StrongClient/StrongServer example.
  2. Make a program, which writes one encrypted message to a socket, and another program, which waits for one encrypted message and when it retrives it, it decrypts it and prints it. Use the cipher class to do the encryption and decryption. You can get the encryption/decryption part from the SecretWriting example, and the communication part from the StrongClient/StrongServer example. For the encryption and decryption, you need to have the same keyfile available for both programs.
  3. Write a set of programs, with which you can try out the solution for Byzantine Generals problem with three generals. You need a Top General program and an Ordinary General program (of which you run two instances). The programs are to communicate using sockets. You may use command-line arguments (or some other way) to tell the program if it is to act as a faulty general. When you start the programs, you tell one of them to be the faulty one.

Exercises on 30.10. (Tuesday!)

  1. How would you try to synchronise the clocks of two computers connected by a network.
  2. Select a distributed computing problem from the course, where completely synchronised clocks help to solve the problem. How do they help?
  3. Why can adjusting a too fast clock be more of a problem than adjusting a too slow clock?
  4. Select a distributed computing problem from the course, where completely synchronised clocks do not help to solve the problem. Why is that?
  5. In Note 7, several ways were given to optimize file multicast for a large number of recipients. Would it be meaningful to combine any of these ways? Why?

Literature

- Chow & Johnson: Distributed Operating Systems & Algorithms, Addison-Wesley.
- Farley: JAVA Distributed Computing, O'Reilly & Associates.
- Knudsen: JAVA Cryptography, O'Reilly & Associates.
- Mullende: Distributed Systems, ACM Press.
- Coulouris, Dollimore and Kindberg: Distributed Systems - Concepts and Design, Addison-Wesley.

 

Coursework

The coursework will be a programming project, in which leader election is to be implemented. Follow this space for important notices and corrections. When you have finished your implementation, write a document, 1-2 pages, which includes
  1. instructions how to install and run the coursework, and
  2. a short explanation of the structure of your software (e.g. what is the role of each class).
I want to see a demo about the system. For that, book a time to see me in my office. If the implementation uses something non-standard (ie. does not run on a PC with a standard Java implementation sdk 1.3 or so with the cryptographic extension installed), then let me know.

I would like to see the demos on 3.-5.12. I will be away for the last week of November, but up to 23.11. seems possible. The demo times will be on between 12.00 and 16.30 on each of these days. The demo is expected to last 15 minutes. When you have done the coursework, send me e-mail telling which of these times are suitable for you, of if you want to do this before 23.11.

You are to implement a participant, which works as follows. All participants know of all other participants. When a new participant is started, it is given as input (command-line parameters are just fine for this):

If the participant is the first one, then you have to somehow tell this to the participant at start-up. Then this participant will also become the coordinator.

The new participant will then contact the old participant, and and the old participant will tell it everyone's IDs, IP addresses and ports, and the IP address and port and ID of the current coordinator. After this, the new participant will contact everyone to let them know that they can add it to the set of participants.

If someone of these sites does not answer, it will inform all the other sites about this, and the other sites can update their list of existing sites.

If the coordinator does not answer, then the new participant will start a coordinator election (Bully Algorithm or Invitation Election will do).

The programs do not need to accept input from keyboard after they have got the starting parameters, but they should print out some information about the events. We can kill processes with a simple ctrl-C.

All processes sign their messages digitally, and all message signatures are checked. All keys are stored in a java keystore file, a copy of which all programs have in their working directory. This also means that you have a predefined set of participants (say, maximum 20 participants) for which you have created a key in the keystore file.

You may co-operate when doing this, but everyone returns their individual coursework.

The deadline for the coursework is 30.11.