CS153 lab: Consensus and Profile

Today we solve the Consensus and Profile problem (cons) on Rosalind. This is not a textbook problem, but it is closely related to Chapter 2. This one is mostly done, and easier than the last lab. Also it gives us some practice with arrays, which should help with hw2.

SCRIPT (your steps)

  1. First find this web page.
    If you omit 'SCRIPT.html' in the address bar, you should see a listing of today's files.

  2. Open a terminal (start 'bash' if necessary). To make your initial copy of the lab files, copy-and-paste these commands: You should see a listing of your lab files. File cons.py is the main program for you to edit. File rosalind_cons.txt is a sample input, which you are welcome to overwrite when you download a larger test input from Rosalind.

  3. Edit your copy of cons.py, and look for "TODO" items.

  4. When you think you are done with cons.py, you should try to solve the problem on Rosalind. When Rosalind gives you a test input to download, just overwrite your copy of rosalind_cons.txt.

    REMARK: when solving problems for CS153 on Rosalind, make sure you see ?class=435 at end of the address bar. That means you are solving the problem in the context of our class, not just as an independent learner.

  5. Finally, be sure to leave a copy of your working program in your lab4 directory. That is, I will look for your ~/cs153/lab4/cons.py file. This should be automatic if you finish this in the CS lab, but it may require an extra step if you do your Rosalind submission from outside the lab (like from a laptop). Let me know if you need advice on copying files into the CS lab.

Counts array versus Profile array

The 'Profile' array computed in this problem is called the 'Counts' array in our textbook. To get the textbook 'Profile' array, you divide each column by its sum, getting a probability distribution (each column sums to one). Optionally, we might add a pseudocount to each count first, to avoid zero probabilities.