CS554 Syllabus & Progress


CS554 - Advanced Database Systems
Syllabus and Progress

  1. Introduction



  2. Review of undergraduate database material (my CS377 class notes is here: click here )

    • Review of Relational Algebra:
      • Relations: click here
      • The Relational (Data) Model click here

      • Overview of Relational Algebra
        • Intro: click here
        • The high school set operators (, , ): click here
        • The operators σ, π, × and : click here                                                  
        • The Aggregate (= set) Functions: click here                                                 (cs377 material)

      • Examples of Relational Algebra: click here                                       (cs377 material)


      • Bags, sets and the δ operator: click here

      • Notation for the grouping operator: (γ): click here


        Homework 1: click here



    • Review of SQL:
      • Accessing the SQL database software:

      • Intro to SQL: click here                                                                       (cs377 material)

      • Intro to the SELECT command                                                       (cs377 material)

      • Qualifying attributes and aliasing: click here                                             (cs377 material)

      • Set operations (∪, ∩, −) in SQL: click here                                       (cs377 material)
      • The tuple conditions that can be used in the where clause: click here                 (cs377 material)

      • Nested queries:                                                 (cs377 material)

      • Correlated nested queries:

      • Useful trick to get other attributes using a set of foreign keys in a nested query: click here


      • Set functions:

      • Forming groups based on grouping attributes and conditions on a group:

      • Virtual table (relation) or view: click here


        Homework 2: click here





    Accessing data stored on disk                      

  3. Secondary Storage (Disks) management





  4. Indexing





    Query processing                      

  5. Cost and constraint on query execution (= processing a physical query plan)

    • Overview of query processing: click here

    • Processing (SQL) queries:
      • The Physical Query Plan operators: click here

      • The cost and constraint in query processing:

      • The cost of the basic (relation) access operators (table-scan and index-scan): click here

      • Passing results using memory buffers -- Iterators: click here


    • Categories of query processing algorithms: click here



  6. One-pass Algorithms for Query execution (processing a physical query plan)




  7. The nested-loop Algorithms for Cartesian Product and Join

    • Intro: click here
    • The block-based nested-loop cartesian product (×) algorithm: click here
    • The tuple-based nested-loop join (⋈) algorithm (suitable as an iterator): click here
    • The block-based nested-loop join join (⋈) algorithm: click here




  8. 2-pass Algorithms for Query execution that are based on hashing

    • Intro to two-pass algorithms: click here

    • Introduction to 2-pass hashing-based algorithms: click here

    • Unary operators:
      • Selection σ: click here     ***
      • Projection π: click here    ***
      • Duplicate elimination δ: click here     +++ 1st presentation
        • Partition R using hashing
        • Process each partition using the one-pass algorithm
      • Grouping γ: click here     +++ 2nd presentation
        • Partition R using hashing
        • Process each partition using the one-pass algorithm


    • Binary operators:
      • Union ∪
        • Bag union: click here    ***
        • Set union: click here     +++ 1st presentation
          • Partition R and S using hashing
          • Process each partition using the one-pass algorithm
      • Intersection ∩
        • Bag intersection: click here     +++ 2nd presentation
          • Partition R and S using hashing
          • Process each partition using the one-pass algorithm
        • Set intersection: click here     +++ 3rd presentation
          • Partition R and S using hashing
          • Process each partition using the one-pass algorithm
      • Difference −
        • Bag difference: --- same procedure as in the +++ presentation
          • Partition R and S using hashing
          • Process each partition using the one-pass algorithm
        • Set difference: --- same procedure as in the +++ presentation
          • Partition R and S using hashing
          • Process each partition using the one-pass algorithm

      • Product (cartesian product) × click here    ***

      • Join ⋈: click here --- same procedure as in the +++ presentation
        • Partition R and S using hashing
        • Process each partition using the one-pass algorithm

    • Summary: click here




  9. 2-pass Algorithms for Query execution that are based on (TPMMS) sorting





  10. Multi-pass algorithms




  11. Algorithms that are based on indexing:





    Query optimization                      

  12. Overview: query optimization: click here




  13. Parsing and pre-processing




  14. Converting a Parse Tree into an initial logical query plan (tree)

    • Converting an SQL command that does not contain a sub-query: click here

    • Converting an SQL command that contains a sub-query:
      • Converting an SQL command with sub-query using 2-argument selection: click here
      • Replacing a 2-argument selection in an uncorrelated sub-query: click here
      • Replacing a 2-argument selection in an correlated sub-query: click here





  15. Algebraic Laws used to transform/optimize logical query plans




  16. Heuristic-based logical query plan optimization based on (only) Algebraic Laws




  17. Cost estimation of a Logical Query Plan (for join optimization)


    Homework 6: click here



  18. Ordering the join (and cartesian product) operations




  19. The physical query plan


    Homework 7: click here






    Ensuring database consistency                      

  20. Recoverability: protecting database from system failure (logging)




  21. Serializability: correctness of concurrent execution of transactions




  22. Serializability and Recoverability






  23. Deadlock





    High-performance (parallel and distributed) database systems                      

  24. Parallel Data Processing Algorithms




  25. The Map-Reduce Parallelism framework




  26. Distributed Databases: query processing




  27. Commiting Distributed Transactions




  28. "Big Data" systems (and NOSQL)