### Finding the Longest Commone Subsequence

• Definition: subsequence

• Subsequence:

 Subsequence = a sequence that can be derived from another sequence by deleting some elements without changing the order of the remaining elements.

• Examples of subsequences

 ``` String: X = CGATAATTGAGA S1 = CGAT is a subsequence of X because: CGATAATTGAGA (delete the blue characters) S2 = ATTTA is a subsequence of X because: CGATAATTGAGA ```

• Examples that are not subsequences

 ``` String: X = CGATAATTGAGA S1 = ACGA CGATAATTGAGA No C after A S2 = CAAC CGATAATTGAGA No C after A ```

• Common subsequence

• Common subsequence:

• C is a common subsequence of A and B iff:

 C is a subsequence of A           C is a subsequence of B

• Examples: common subsequence

 ``` String 1: X = ABCABCABC String 2: Y = BABACBAB Common subsequence 1: AAA X = ABCABCABC Y = BABACBAB Common subsequence 2: ABAA X = ABCABCABC Y = BABACBAB Common subsequence 3: ABABA X = ABCABCABC Y = BABACBAB Common subsequence 4: ABABAB X = ABCABCABC Y = BABACBAB ```

• Longest Common Subsequence (LCS)

• The LCS problem:

 Given 2 strings X and Y Find the longest common substring in X and Y

• Applications

• Text similarity testing:

• Given 2 text documents

• The longer their common substring, the better the match between the 2 text documents:

 When you make a copy of some text and then make changes to the original text, then it is highly likely that there is common substring between the copy and the original is long.

• The LCS algorithm will allow you to discover plagerized documents

• Gene similarity:

• When 2 gene sequences (strings) have a large (long) common subsequence, the genes are more alike:

 When you make a copy of a gene sequence and then make mutations to the original sequence, then it is highly likely that there is common substring between the copy and the original is long.

• The LCS algorithm will allow you to discover genes that are similar

• Naive solution:: the Brute force approach

• Find longest common subsequence:

 ``` X = BAA Y = BBAB ```

• Brute force solution:

• Generate every subsequence of X:

 ``` X = BAA All subsequences: Delete 0 character: BAA Delete 1 character: AA BA BA Delete 2 character: A A B Check: BAA AA BA A B ```

• Find the longest subsequence that is also a subsequence of Y:

 ``` Y = BBAB BAA Not subsequence AA Not subsequence BA YES ! Subsequence (No need to check shorter ones :)) ```

• Running time for the Brute Force method

• How many subsequences can you make from a string:

 ``` X = x0 x1 x2 .... xn-1 Subsequence: pick x0 OR do not pick x0 pick x1 OR do not pick x1 .... pick xn-1 OR do not pick xn-1 =====> O(2n) # subsequence !!! ```

• Running time brute force method for LCS:

 O(2n)

So: do not even try !!!

You won't live long enough to see the answer....