CS153 lab: Eulerian Path
In this lab we will solve the
Eulerian
Path Problem (from Chapter 3) on Rosalind.
This should help prepare you for the next homework problem (ba3h).
We will build on ba3f.py, a solution to
the "Eulerian Cycle Problem" that we considered last week.
Recall from the textbook, a directed graph is Eulerian if
there is a cycle using every edge exactly once.
In fact, a directed graph is Eulerian if it has these two properties:
 Every vertex is balanced (indegree = outdegree).
 The graph is strongly connected (you can get from any
vertex to any other).
Now in this problem, we want to know if there is an Eulerian path.
That is, we still have to use every edge exactly once, but unlike
a cycle, the path does not have to end where it started. The
"TODO" items explain how we will solve this problem.
Note we will not have a lab next week, instead we
will review on Monday, and have a midterm exam on Wednesday.
SCRIPT (your steps)
 First find this web page.
If you delete 'SCRIPT.html' in the address bar, you should see a web listing
of the lab files.

Make your personal copy of the lab files. If you are on your own
machine, you can download them (as needed) via your web browser. In the CS lab,
you can just copyandpaste these commands to a terminal:
mkdir ~/cs153/lab6
cd ~/cs153/lab6
cp /home/cs153001/share/lab6/* .
ls l
idle3 ba3g.py &
The last line should open your copy
of ba3g.py in the IDLE3 editor.

As usual, you should sign the honor statement, read the comments, and look for the "TODO" items.

As part of this lab, I will first go over ba3f.py,
a full solution to the Rosalind Eulerian Cycle problem, which will
be the starting point for your solution to this lab.
Next, I will edit balanced.py, and put
it in working order. This program is meant to detect whether all
the vertices in the input graph are balanced. That will be useful
to you, as you can see in the "TODO" items of your code.
Note we have several sample input files, and I have written all these
programs so that we can specify the input filename on the command
line.
Furthermore, ba3f.py writes its output to
file out.txt, since it can be rather long. So in Rosalind,
you can just submit this output file rather than trying to
copyandpaste the output.

As usual, you should submit your code via Rosalind, and also leave a
copy of your working program in your personal cs153/lab6/
directory. It is due in a week (night of 10/18). If you miss the
deadline, you can still try to finish it late for partial credit (via
email or office hours).
Further Remarks
The textbook proposes finding an Eulerian path in the de Bruijn graph
as a way to assemble genomes. In class I claimed this was
impractical, because it requires "perfect coverage" (every kmer
represented the correct number of times, and no errors). However,
Pevzner did help develop a practical genome assembler, EULER, based on
such ideas (plus a lot more).
His papers
[2001,
2008]
give more details, but we won't get into them.