Week 1 Exercises

Week 1 Exercises

  1. Major issues RDBMSs need to concern itself with:
    • Storing/modifying data and meta-data.
    • Constraint definition/storage/maintenance/checking.
    • Declarative manipulation of data (SQL).
    • Extensibility via views, triggers, procedures.
    • Query re-writing rules, optimization (indexes).
    • Transaction processing, concurrency/recovery.
  2. Major stages in answering an SQL query:
    1. Obtain text string of SQL.
    2. Parsing and translation:
      • Converts SQL query into relational algebra expression.
      • Input: text string of SQL.
      • Output: expression tree for relational algebra expression.
    3. Optimization:
      • Converts RA expression into query plan.
      • Input: relational algebra expression tree.
      • Output: sequence of DBMS-specific relational operations.
    4. Execution:
      • Performs relational operations, via chained intermediate results.
      • Input: query plan (sequence of DBMS-specific relational operations).
      • Output: set of result tuples, stored in memory or disk.
    5. Output:
      • Convert result tuples into format useful for client.
      • Input: tuple data in memory buffers (and possibly on disk as well).
      • Output: stream of formatted tuples (format defined by library). .
  3. PostgreSQL is an “object-relational database management system”. Differences between PostgreSQL and conventional RDBMS:
    • Every Postgres database tuple has an associated object identifier (oid).
    • Postgres supports table inheritance:
        create table cities (
            name text,
            population real,
            altitude int
        );
      
        create table capitals (
            state char(2)
        ) inherits (cities);
      
        -- find all cities, both capital or not.
        select * from cities;
      
        -- find all cities, non-capitals only.
        select * from only cities;
      
    • Postgres supports defining new data types and operations on those types.
    • Postgres supports function overloading:
        create function test(int, real) returns ...
        create function test(smallint, double precision) returns ...
      
  4. Postgres scopes:
    • Databases (or catalogs):
      • Logical scope that*collects together a number of schemas.
      • Details held in pg_database.
    • Schemas (or namespaces):
      • Logical scope that contains a collection of database objects (tables, views, functions, etc).
      • Has no effect on the way data is actually stored.
      • Datails stored in pg_namespace.
    • Tablespaces:
      • Physical scope identifying a region of the host filesystem where data files are stored.
      • Has no effect on the logical organization of data.
      • Details stored in pg_tablespace.
  5. Command line arguments:
    • -l: Obtains a list of all databases.
      • Similar to meta-command \list.
    • -f: Takes an SQL file as input and executes it.
      • Equivalent to meta-command \i.
    • -a: Prints all nonempty input lines to standard output as they are read.
      • Equivalent to setting variable ECHO to all.
    • -E: Echos the actual queries generated by \d and other backslash commands.
      • Equivalent to setting variable ECHO_HIDDEN to on.
  6. Difference between insert and copy:
    • insert is standard SQL and takes a tuple of the required values. Each insert statement inserts a single row and each row is inserted in separate transactions.
    • copy is Postgres specific and takes a list of tab separated values to insert as rows. All the rows together form one single transaction.
  7. Timing questions: a. The time variations are likely caused by caching. The results of recently executed queries are cached and hence the database does not need to access secondary storage. These cached results are much much faster to access because of this. b. There is no meaningful time value to ascribe. Both the cold and hot executions of the queries are useful for different purposes. c. If the contexts (hot or cold) is the same at the start of each timing, the results will be plausibly close. Tests should be run on the same lightly-loaded machine (to minimize differences caused by Unix buffering) and the tests should be the only user of the database server. If multiple users are competing for the buffer pool, the times could vary substantially, depending on how much of the buffered data had been swapped out to service queries from other users. d. For comparable executions of the query (either buffers empty or buffers fully-loaded), it looks like it’s no more accurate than +/- 10ms. It may be better to forget precise time measures, and simply fit queries into “ball-park” categories.

  8. pg_catalog vs information_schema
    • pg_catalog: a global schema that contains low level representation for sysadmins. Contains a set of tables and views in that schema (e.g. pg_catalog.pg_tables).
                    View "pg_catalog.pg_tables"
        Column    |  Type   | Collation | Nullable | Default 
        -------------+---------+-----------+----------+---------
        schemaname  | name    |           |          | 
        tablename   | name    |           |          | 
        tableowner  | name    |           |          | 
        tablespace  | name    |           |          | 
        hasindexes  | boolean |           |          | 
        hasrules    | boolean |           |          | 
        hastriggers | boolean |           |          | 
        rowsecurity | boolean |           |          | 
      
    • information_schema: a system catalog available to users. Usually implemented as a collection of views on the native catalog schema (e.g. information_schema.tables).
                                               View "information_schema.tables"
                   Column            |               Type                | Collation | Nullable | Default 
       ------------------------------+-----------------------------------+-----------+----------+---------
       table_catalog                | information_schema.sql_identifier |           |          | 
       table_schema                 | information_schema.sql_identifier |           |          | 
       table_name                   | information_schema.sql_identifier |           |          | 
       table_type                   | information_schema.character_data |           |          | 
       self_referencing_column_name | information_schema.sql_identifier |           |          | 
       reference_generation         | information_schema.character_data |           |          | 
       user_defined_type_catalog    | information_schema.sql_identifier |           |          | 
       user_defined_type_schema     | information_schema.sql_identifier |           |          | 
       user_defined_type_name       | information_schema.sql_identifier |           |          | 
       is_insertable_into           | information_schema.yes_or_no      |           |          | 
       is_typed                     | information_schema.yes_or_no      |           |          | 
       commit_action                | information_schema.character_data |           |          | 
      
  9. Postgres OIDs are an (optional) system attribute hidden from the end user. These hidden attributes are not matched in standard SQL select * and must be manually specified. All pg_catalog tables contain an OID. Other system attributes include xmin and xmax which are NOT optional.

  10. SQL view: list of table names and table OIDs from the public namespace.
    select 
    cls.oid,
    relname,
    nspname
    from 
    pg_catalog.pg_class cls
    join pg_catalog.pg_namespace ns on cls.relnamespace = ns.oid
    where nspname = 'public';
    
  11. create or replace function tablePath(tablename name) returns text
    as $$
    declare
    table_id oid;
    tablespace_id oid;
    database_id oid;
    basepath text;
    begin
    -- Paths of the form: basepath/database_id/table_id
    
    select cls.oid, cls.reltablespace
    into table_id, tablespace_id
    from pg_class cls
    join pg_namespace nsp on cls.relnamespace = nsp.oid 
    where 
        cls.relname = tablename and
        cls.relkind = 'r' and
        nsp.nspname = 'public';
    
    if tablespace_id = 0 then
        basepath := 'PGDATA/data';
    else
        select spcname into basepath
        from pg_tablespace
        where oid = tablespace_id;
    
        if basepath is null then
            basepath := '???';
        end if;
    end if;
    
    if table_id is null then
        return 'Table does not exist: ' || tablename;
    else
        select db.oid into database_id
        from pg_database db
        where db.datname = current_database();
    
        return basepath || '/' || database_id::text || '/' || table_id::text;
    end if;
    end;
    $$ language plpgsql;
    
  12. create or replace function tableSchemas() returns setof text
    as $$
    declare
    c pg_class%rowtype;
    a pg_attribute%rowtype;
    begin
    for c in (
        select *
        from pg_class cls
        join pg_namespace nsp on cls.relnamespace = nsp.oid 
        where 
            cls.relkind = 'r' and
            nsp.nspname = 'public'
        order by relname
    ) loop
        return next (
            select c.relname || '(' || string_agg(attname, ', ' order by attnum) || ')'
            from pg_attribute
            where attrelid = c.oid and attnum > 0
        );
    end loop;
    end;
    $$ language plpgsql;
    
  13. ```sql create or replace function tableSchemas2() returns setof text as $$ declare c pg_class%rowtype; a pg_attribute%rowtype; begin – pg_attribute.atttypid: data type of this column, references pg_type.oid. – pg_attribute.atttypmod: – Records type specific data supplied at table creation time. – For example: maximum length of varchar. – Will generally by -1 for types that do not need this value.

    for c in ( select * from pg_class cls join pg_namespace nsp on cls.relnamespace = nsp.oid where cls.relkind = ‘r’ and nsp.nspname = ‘public’ order by relname ) loop return next ( select c.relname || ‘(‘ || string_agg(attname || ‘:’ || typeString(atttypid, atttypmod), ‘, ‘ order by attnum) || ‘)’ from pg_attribute where attrelid = c.oid and attnum > 0 ); end loop; end; $$ language plpgsql;

create or replace function typeString(typid oid, typmod integer) returns text as \(declare typ text; begin typ := pg_catalog.format_type(typid,typmod); if (substr(typ,1,17) = 'character varying') then typ := replace(typ, 'character varying', 'varchar'); elsif (substr(typ,1,9) = 'character') then typ := replace(typ, 'character', 'char'); end if; return typ; end;\) language plpgsql;


14.
```sql
-- Update pg_attribute.atttypmod.

update pg_attribute
set atttypmod = N
from pg_class
where 
    pg_attribute.attrelid = pg_class.oid and 
    pg_class.relname = TableName and 
    attname = 'ColumnName';