CS153 lab: Eulerian Path

In this lab we will solve the Eulerian Path Problem (from Chapter 3) on Rosalind. This should help prepare you for the next homework problem (ba3h).

We will build on ba3f.py, a solution to the "Eulerian Cycle Problem" that we considered last week.

Recall from the textbook, a directed graph is Eulerian if there is a cycle using every edge exactly once.

In fact, a directed graph is Eulerian if it has these two properties:

Now in this problem, we want to know if there is an Eulerian path. That is, we still have to use every edge exactly once, but unlike a cycle, the path does not have to end where it started. The "TODO" items explain how we will solve this problem.

Note we will not have a lab next week, instead we will review on Monday, and have a midterm exam on Wednesday.

SCRIPT (your steps)

  1. First find this web page.
    If you delete 'SCRIPT.html' in the address bar, you should see a web listing of the lab files.

  2. Make your personal copy of the lab files. If you are on your own machine, you can download them (as needed) via your web browser. In the CS lab, you can just copy-and-paste these commands to a terminal: The last line should open your copy of ba3g.py in the IDLE3 editor.

  3. As usual, you should sign the honor statement, read the comments, and look for the "TODO" items.

  4. As part of this lab, I will first go over ba3f.py, a full solution to the Rosalind Eulerian Cycle problem, which will be the starting point for your solution to this lab.

    Next, I will edit balanced.py, and put it in working order. This program is meant to detect whether all the vertices in the input graph are balanced. That will be useful to you, as you can see in the "TODO" items of your code.

    Note we have several sample input files, and I have written all these programs so that we can specify the input filename on the command line. Furthermore, ba3f.py writes its output to file out.txt, since it can be rather long. So in Rosalind, you can just submit this output file rather than trying to copy-and-paste the output.

  5. As usual, you should submit your code via Rosalind, and also leave a copy of your working program in your personal cs153/lab6/ directory. It is due in a week (night of 10/18). If you miss the deadline, you can still try to finish it late for partial credit (via email or office hours).

Further Remarks

The textbook proposes finding an Eulerian path in the de Bruijn graph as a way to assemble genomes. In class I claimed this was impractical, because it requires "perfect coverage" (every k-mer represented the correct number of times, and no errors). However, Pevzner did help develop a practical genome assembler, EULER, based on such ideas (plus a lot more). His papers [2001, 2008] give more details, but we won't get into them.