### Linear Space algorithm to find the length the LCS

• Introduction: space usage for the LCS problem

• We have previously studied the LCS algorithm to find the length of the LCS (longest common subsequence):

 ``` public static int solveLCS(String x, String y) { int i, j; /* =============================================== Initialize the base cases =============================================== */ for (i = 0; i < x.length()+1; i++) L[i][0] = 0; // y = "" ===> LCS = 0 for (j = 0; j < y.length()+1; j++) L[0][j] = 0; // x = "" ===> LCS = 0 /* ===================================================== Bottom-up (smaller to larger) computation of L[][j] ===================================================== */ for (i = 1; i < x.length()+1; i++) { for (j = 1; j < y.length()+1; j++) { if ( x.charAt(i-1) == y.charAt(j-1) ) { L[i][j] = L[i-1][j-1] + 1; } else { L[i][j] = max( L[i-1][j] , L[i][j-1] ); } } } return L[x.length()][y.length()]; } ```

The method will return the length of the LCS of the input strings x and y.

• Notice that this algorithm uses a 2-dimensional array of size m × n, where m and n are the number of characters in the 2 inputs strings.

• Therefore:

 The amount of computer memory (used for the variables) needed to solve the LCS problem is O(m × n).

• The memory utilization constraint can put a several limit on the usefulness of the LCS algorithm.

Example:

 To find the LCS in 2 strings of length 100, we need a 2-dimensional array with 100 × 100 = 10,000 elements To find the LCS in 2 strings of length 1,000, we need a 2-dimensional array with 1,000 × 1000 = 1,000,000 elements To find the LCS in 2 strings of length 10,000, we need a 2-dimensional array with 10,000 × 10000 = 100,000,000 (100M) elements To find the LCS in 2 strings of length 100,000, we need a 2-dimensional array with 100,000 × 100,000 = 10,000,000,000 (10G) elements

Facts:

 The LCS algorithm is used in Bio-informatics to compare DNA sequences A DNA sequence can contains millions of "letters" A O(m×n) LCS algorithm is useless in Bio-informatic research !!! (Not enough memory to run the algorithm) !!! We must reduce the memory utilization of the LCS algorithm to be useful !!!

• Constructing a Linear space algorithm

• In the LCS algorithm, the value L[i][j] are computed in the order of increasing values in i:

 ``` public static int solveLCS(String x, String y) { int i, j; /* =============================================== Initialize the base cases =============================================== */ for (i = 0; i < x.length()+1; i++) L[i][0] = 0; // y = "" ===> LCS = 0 for (j = 0; j < y.length()+1; j++) L[0][j] = 0; // x = "" ===> LCS = 0 /* ===================================================== Bottom-up (smaller to larger) computation of L[][j] ===================================================== */ for (i = 1; i < x.length()+1; i++) // Row i { for (j = 1; j < y.length()+1; j++) { if ( x.charAt(i-1) == y.charAt(j-1) ) { L[i][j] = L[i-1][j-1] + 1; // Use values in row i-1 } else { L[i][j] = max( L[i-1][j] , L[i][j-1] ); // Use values in row i-1 } } } return L[x.length()][y.length()]; } ```

• Observation:

• When the values in the row i (i.e., value L[i][j]) are being computed, the algorithm will only use the values in the row i-1 (i.e., the values L[i-1][j])

• What I mean is this:

 When the values in the row i (i.e., value L[i][j]) are being computed, the algorithm will not use the values in the rows prior to row i-1 (i.e., the values L[i-2][j], L[i-3][j], ...)

• Graphically illustrated:

To compute L[i][j], we will use:

 ``` L[ i−1 ][ j−1 ] or: max ( L[ i−1 ][ j ] , L[ i ][ j−1 ] ) ```

• Conclusion:

• We do not need to preserve the prior rows to compute the next row

• We only need to preserve the last computed row !!!

• Therefore:

• We can recylce (re-use) the rows of the array !!!

• We will use:

 An array K that consist of 2 rows Row 0 of array K will represent the values in the last computed row (i.e.: L[ i-1 ] [j] Row 1 of array K will be used to hold the newly computed values (i.e.: L[ i ] [j]

Graphically explained:

• The linear space LCS algorithm

• A problem in the original algorithm:

 ``` public static int solveLCS(String x, String y) { int i, j; /* =============================================== Initialize the base cases =============================================== */ for (i = 0; i < x.length()+1; i++) // ***** Problem ! ***** L[i][0] = 0; // We cannot initialize ALL rows at once ! for (j = 0; j < y.length()+1; j++) L[0][j] = 0; // x = "" ===> LCS = 0 /* ===================================================== Bottom-up (smaller to larger) computation of L[][j] ===================================================== */ for (i = 1; i < x.length()+1; i++) { for (j = 1; j < y.length()+1; j++) { if ( x.charAt(i-1) == y.charAt(j-1) ) { L[i][j] = L[i-1][j-1] + 1; } else { L[i][j] = max( L[i-1][j] , L[i][j-1] ); } } } return L[x.length()][y.length()]; } ```

• We must first move the initialization step into the second for-loop (because we will change the data structure !) and obtain:

 ``` public static int solveLCS(String x, String y) { int i, j; /* =============================================== Initialize the base cases =============================================== for (j = 0; j < y.length()+1; j++) L[0][j] = 0; // x = "" ===> LCS = 0 /* ===================================================== Bottom-up (smaller to larger) computation of L[][j] ===================================================== */ for (i = 1; i < x.length()+1; i++) { L[i][0] = 0; // <<<--- moved HERE first ****** // Because we can't initialize ALL row at once // The 0-th element in row i is now // initialized when row i is USED for (j = 1; j < y.length()+1; j++) { if ( x.charAt(i-1) == y.charAt(j-1) ) { L[i][j] = L[i-1][j-1] + 1; } else { L[i][j] = max( L[i-1][j] , L[i][j-1] ); } } } return L[x.length()][y.length()]; } ```

• Next we replace:

 ``` L[ i-1 ] [j] -----> K[ 0 ] [j] L[ i ] [j] -----> K[ 1 ] [j] L[ i ] [j-1] -----> K[ 1 ] [j-1] ```

in the above modified algorithm, we will obtain a linear space algorithm for the LCS problem:

 ``` public static int solveLCS(String x, String y) { int i, j; /* =============================================== Initialize the base cases =============================================== for (j = 0; j < y.length()+1; j++) K[0][j] = 0; // x = "" ===> LCS = 0 for (i = 1; i < x.length()+1; i++) { K[1][0] = 0; // Piecemeal initialization of L[i][0] for (j = 1; j < y.length()+1; j++) { if ( x.charAt(i-1) == y.charAt(j-1) ) { K[1][j] = K[0][j-1] + 1; } else { K[1][j] = max( K[0][j] , K[1][j-1] ); } } /* ===================================================== Recycle phase: copy row K[1][...] to row K[0][...] ===================================================== */ for ( j = 0; j < y.length()+1; j++) K[0][j] = K[1][j]; } // The value of LCS is in K[1][y.length()] return K[1][y.length()]; } ```

• Example Program: (Demo above code)

How to run the program:

 Right click on link and save in a scratch directory To compile:   javac LCS_lin_space.java To run:          java LCS_lin_space

• Note: why is the algorithm linear space ?

• The algorithm uses a 2-dimensional array K[ ][ ] of size 2×n, where n is the length of the second string.

 Because 2×n is O(n), the number of variables used in the algorithm increases linearly with the size of the input. That's why the algorithm is linear space.

(The previous algorithm uses a 2-dimensional array L[ ][ ] of size m×n, where m and n is the length of the first and the second string.)