
Input set: 11 21 24 61 81 39 89 56 12 51 After sorting: 11 12 21 24 39 51 56 61 81 89 The 0.1quantile = 11 The 0.2quantile = 12 etc. Special case: The median = 0.5 quantile = 39 
Then the sorted elements are scanned to find the one at position ⌊ φ × N ⌋



Input set: 11 21 24 61 81 39 89 56 12 51 After sorting: 11 12 21 24 39 51 56 61 81 89 



Example: ε = 0.1
Input set: 11 21 24 61 81 39 89 56 12 51 After sorting: 11 12 21 24 39 51 56 61 81 89 ^  #3 

Input: 45 89 98 12 13 55 14 24 26 After sorting: Rank: 1 2 3 4 5 6 7 8 9 + Input: 12 13 14 24 26 45 55 89 98 
Goal:

(Because every possible element can be queried and you cannot make any error !)
Original input: Rank: 1 2 3 4 5 6 7 8 9 + Ranked: 12 13 14 24 26 45 55 89 98 Retain only these elements: Rank: 1 2 3 4 5 6 7 8 9 + Ranked: 13 26 89 Approximate answers to quantile queries: Rank: 1 2 3 4 5 6 7 8 9 + Ranked: 13 13 13 26 26 26 89 89 89 
Example usage:

Conclusion:

(you will see this fact when we discuss the algorithm)
Original input: Rank: 1 2 3 4 5 6 7 8 9 + Ranked: 12 13 14 24 26 45 55 89 98 Retain only these elements: Rank: 1 2 3 4 5 6 7 8 9 + Ranked: 13 26 89 Approximate answers to quantile queries: Rank: 1 2 3 4 5 6 7 8 9 + Ranked: 13 13 13 26 26 26 89 89 89 
( [v_{1},min_{1},max_{1}], [v_{2},min_{2},max_{2}], ..., [v_{m},min_{m},max_{m}] ) where: v_{i} = the value that covers the φquantile range min_{i} = start position of the φquantile range max_{i} = ending position of the φquantile range 
Suppose the input stream is: 12 13 14 24 26 45 55 89 98 ...(more data coming) (For ease of understanding, here is the sorted list of the input number: 12 13 14 24 26 45 55 89 98 ) The algorithm represents the current state with: [13, 1, 3] [26, 4, 6] [89, 7, 9] 
Now suppose the next arriving value is 17:
The input stream is now: 12 13 14 24 26 45 55 89 98 17...(more data coming) (For ease of understanding, here is the sorted list of the input number: 12 13 14 17 24 26 45 55 89 98 ) The algorithm must modify the state in the data structure to: [13, 1, 3] [17, 4, 4] [26, 5, 7] [89, 8, 10] ^^^^^^^^^^ ^^^^^ ^^^^^ inserts 17 but must also change indices in later entries !!! 
This data structure requires a large number of operations per inserted value
Although it is useful, it is not efficient
( [v_{1},g_{1}], [v_{2},g_{2}], ..., [v_{m},g_{m}] ) where: v_{i} = the value that covers the φquantile range g_{i} = number of positions covered by the value 
Now suppose the next arriving value is 17:
The input stream is now: 12 13 14 24 26 45 55 89 98 17...(more data coming) (For ease of understanding, here is the sorted list of the input number: 12 13 14 17 24 26 45 55 89 98 ) The algorithm must modify the state in the data structure to: [13, 3] [17, 1] [26, 3] [89, 3] ^^^^^^^ ^^^ ^^^ inserts 17 but the other information does not need to be updated !!! 
How to read the data structure:


Original input: Rank: 1 2 3 4 5 6 7 8 9 + Ranked: 12 13 14 24 26 45 55 89 98 Retain only these elements: Rank: 1 2 3 4 5 6 7 8 9 + Ranked: 13 26 89 Coverage provided by each entry: Rank: 1 2 3 4 5 6 7 8 9 + Ranked: 13 13 13 26 26 26 89 89 89 

Example: in the above summary
Graphically:

( [v_{0},g_{0},Δ_{0}], [v_{1},g_{1},Δ_{1}], [v_{2},g_{2},Δ_{2}], ..., [v_{s1},g_{s1},Δ_{s1}] ) where: v_{i} = the value that covers the φquantile range g_{i} = see definition above Δ_{i} = see definition above 




(v_{0}, g_{0}, Δ_{0}) (v_{1}, g_{1}, Δ_{1}) (v_{2}, g_{2}, Δ_{2}) (5, 1, 4) (7, 3, 3) (10, 4, 0) r_{min}(v_{0}) = 1 r_{max}(v_{0}) = 1 + 4 = 5 r_{min}(v_{1}) = 1 + 3 = 4 r_{max}(v_{0}) = 4 + 3 = 7 r_{min}(v_{1}) = 4 + 4 = 8 r_{max}(v_{0}) = 8 + 0 = 8 



Case 1: r > ne

Case 2: r ≤ ne



N = 0; while ( not EOS ) { /*  Delete phase  */ if ( N mod ( 1/(2 ε) ) == 0 ) delete elements from summary; v = next value in stream; /*  Insert phase  */ insert v into summary; N++; } 
Important:

v = next value in input /*  Find insert position for v in S  */ Find a tuple (v_{i}, g_{i}, Δ_{i}) ∈ S such that: v_{i1} ≤ v < v_{i} if ( v < v_{0}  v > v_{s1} ) Δ = 0; // New min or max value else Δ = g_{i} + Δ_{i}  1; INSERT "(v, 1, Δ)" into S between v_{i} and v_{i+1}; 

Proof:


Proof:


So as long as we maintain this property, the information in the summary will allow us to answer any φquantile query with ε accuracy
But it is also the most complex part of the algorithm
I will discuss deleting one tuple first...

Proof:


Proof:




*** ε is the margin error (a parameter of the algorithm) S = {}; // S contains the summary structure, which is: // <(v_{0}, g_{0}, Δ_{0}), (v_{1}, g_{1}, Δ_{1}) ... > // NOTE: S is an ordered list !!! N = 0; // Number of items processed while ( not EOS ) { /*  Delete phase: executed once every 1/(2×ε) insertions  */ if ( N % ⌊1/(2×ε)⌋ == 0 ) { /*  Delete unnecessary entries in summary (while keeping the smallest and largest elements)  */ for ( i = s1; i ≥ 2; i = j  1 ) { j = i1; while ( j ≥ 1 && g_{j} + ... + g_{i} + Δ_{i} < 2εN ) { j; } j++; // We went one index too far in the while... if ( j < i ) { replace entries j, .., i with the entry (v_{i}, g_{j}+ ... + g_{i}, Δ_{i}); } } } /*  Insert phase  */ v = next value in input /*  Find insert position for v in S  */ Find a tuple (v_{i}, g_{i}, Δ_{i}) ∈ S such that: v_{i1} ≤ v < v_{i} if ( v is inserted at the head or tail of S ) Δ = 0; else Δ = g_{i} + Δ_{i}  1 // This is the allowable "wiggle room" INSERT "(v, 1, Δ)" into S between v_{i1} and v_{i}; N++; } 

