Week 1 Exercises

Major issues RDBMSs need to concern itself with:
- Storing/modifying data and meta-data.
- Constraint definition/storage/maintenance/checking.
- Declarative manipulation of data (SQL).
- Extensibility via views, triggers, procedures.
- Query re-writing rules, optimization (indexes).
- Transaction processing, concurrency/recovery.
Major stages in answering an SQL query:
1. Obtain text string of SQL.
2. Parsing and translation:
  - Converts SQL query into relational algebra expression.
  - Input: text string of SQL.
  - Output: expression tree for relational algebra expression.
3. Optimization:
  - Converts RA expression into query plan.
  - Input: relational algebra expression tree.
  - Output: sequence of DBMS-specific relational operations.
4. Execution:
  - Performs relational operations, via chained intermediate results.
  - Input: query plan (sequence of DBMS-specific relational operations).
  - Output: set of result tuples, stored in memory or disk.
5. Output:
  - Convert result tuples into format useful for client.
  - Input: tuple data in memory buffers (and possibly on disk as well).
  - Output: stream of formatted tuples (format defined by library). .

PostgreSQL is an “object-relational database management system”. Differences between PostgreSQL and conventional RDBMS:

Every Postgres database tuple has an associated object identifier (oid).

Postgres supports table inheritance:

  create table cities (
      name text,
      population real,
      altitude int
  );

  create table capitals (
      state char(2)
  ) inherits (cities);

  -- find all cities, both capital or not.
  select * from cities;

  -- find all cities, non-capitals only.
  select * from only cities;

Postgres supports defining new data types and operations on those types.

Postgres supports function overloading:

  create function test(int, real) returns ...
  create function test(smallint, double precision) returns ...

Postgres scopes:
- Databases (or catalogs):
  - Logical scope that*collects together a number of schemas.
  - Details held in pg_database.
- Schemas (or namespaces):
  - Logical scope that contains a collection of database objects (tables, views, functions, etc).
  - Has no effect on the way data is actually stored.
  - Datails stored in pg_namespace.
- Tablespaces:
  - Physical scope identifying a region of the host filesystem where data files are stored.
  - Has no effect on the logical organization of data.
  - Details stored in pg_tablespace.
Command line arguments:
- -l: Obtains a list of all databases.
  - Similar to meta-command \list.
- -f: Takes an SQL file as input and executes it.
  - Equivalent to meta-command \i.
- -a: Prints all nonempty input lines to standard output as they are read.
  - Equivalent to setting variable ECHO to all.
- -E: Echos the actual queries generated by \d and other backslash commands.
  - Equivalent to setting variable ECHO_HIDDEN to on.
Difference between insert and copy:
- insert is standard SQL and takes a tuple of the required values. Each insert statement inserts a single row and each row is inserted in separate transactions.
- copy is Postgres specific and takes a list of tab separated values to insert as rows. All the rows together form one single transaction.
Timing questions: a. The time variations are likely caused by caching. The results of recently executed queries are cached and hence the database does not need to access secondary storage. These cached results are much much faster to access because of this. b. There is no meaningful time value to ascribe. Both the cold and hot executions of the queries are useful for different purposes. c. If the contexts (hot or cold) is the same at the start of each timing, the results will be plausibly close. Tests should be run on the same lightly-loaded machine (to minimize differences caused by Unix buffering) and the tests should be the only user of the database server. If multiple users are competing for the buffer pool, the times could vary substantially, depending on how much of the buffered data had been swapped out to service queries from other users. d. For comparable executions of the query (either buffers empty or buffers fully-loaded), it looks like it’s no more accurate than +/- 10ms. It may be better to forget precise time measures, and simply fit queries into “ball-park” categories.

pg_catalog vs information_schema

pg_catalog: a global schema that contains low level representation for sysadmins. Contains a set of tables and views in that schema (e.g. pg_catalog.pg_tables).

              View "pg_catalog.pg_tables"
  Column    |  Type   | Collation | Nullable | Default 
  -------------+---------+-----------+----------+---------
  schemaname  | name    |           |          | 
  tablename   | name    |           |          | 
  tableowner  | name    |           |          | 
  tablespace  | name    |           |          | 
  hasindexes  | boolean |           |          | 
  hasrules    | boolean |           |          | 
  hastriggers | boolean |           |          | 
  rowsecurity | boolean |           |          | 

information_schema: a system catalog available to users. Usually implemented as a collection of views on the native catalog schema (e.g. information_schema.tables).

                                         View "information_schema.tables"
             Column            |               Type                | Collation | Nullable | Default 
 ------------------------------+-----------------------------------+-----------+----------+---------
 table_catalog                | information_schema.sql_identifier |           |          | 
 table_schema                 | information_schema.sql_identifier |           |          | 
 table_name                   | information_schema.sql_identifier |           |          | 
 table_type                   | information_schema.character_data |           |          | 
 self_referencing_column_name | information_schema.sql_identifier |           |          | 
 reference_generation         | information_schema.character_data |           |          | 
 user_defined_type_catalog    | information_schema.sql_identifier |           |          | 
 user_defined_type_schema     | information_schema.sql_identifier |           |          | 
 user_defined_type_name       | information_schema.sql_identifier |           |          | 
 is_insertable_into           | information_schema.yes_or_no      |           |          | 
 is_typed                     | information_schema.yes_or_no      |           |          | 
 commit_action                | information_schema.character_data |           |          | 

Postgres OIDs are an (optional) system attribute hidden from the end user. These hidden attributes are not matched in standard SQL select * and must be manually specified. All pg_catalog tables contain an OID. Other system attributes include xmin and xmax which are NOT optional.

SQL view: list of table names and table OIDs from the public namespace.

select 
cls.oid,
relname,
nspname
from 
pg_catalog.pg_class cls
join pg_catalog.pg_namespace ns on cls.relnamespace = ns.oid
where nspname = 'public';

create or replace function tablePath(tablename name) returns text
as $$
declare
table_id oid;
tablespace_id oid;
database_id oid;
basepath text;
begin
-- Paths of the form: basepath/database_id/table_id

select cls.oid, cls.reltablespace
into table_id, tablespace_id
from pg_class cls
join pg_namespace nsp on cls.relnamespace = nsp.oid 
where 
    cls.relname = tablename and
    cls.relkind = 'r' and
    nsp.nspname = 'public';

if tablespace_id = 0 then
    basepath := 'PGDATA/data';
else
    select spcname into basepath
    from pg_tablespace
    where oid = tablespace_id;

    if basepath is null then
        basepath := '???';
    end if;
end if;

if table_id is null then
    return 'Table does not exist: ' || tablename;
else
    select db.oid into database_id
    from pg_database db
    where db.datname = current_database();

    return basepath || '/' || database_id::text || '/' || table_id::text;
end if;
end;
$$ language plpgsql;

create or replace function tableSchemas() returns setof text
as $$
declare
c pg_class%rowtype;
a pg_attribute%rowtype;
begin
for c in (
    select *
    from pg_class cls
    join pg_namespace nsp on cls.relnamespace = nsp.oid 
    where 
        cls.relkind = 'r' and
        nsp.nspname = 'public'
    order by relname
) loop
    return next (
        select c.relname || '(' || string_agg(attname, ', ' order by attnum) || ')'
        from pg_attribute
        where attrelid = c.oid and attnum > 0
    );
end loop;
end;
$$ language plpgsql;

```sql create or replace function tableSchemas2() returns setof text as $$ declare c pg_class%rowtype; a pg_attribute%rowtype; begin – pg_attribute.atttypid: data type of this column, references pg_type.oid. – pg_attribute.atttypmod: – Records type specific data supplied at table creation time. – For example: maximum length of varchar. – Will generally by -1 for types that do not need this value.
for c in ( select * from pg_class cls join pg_namespace nsp on cls.relnamespace = nsp.oid where cls.relkind = ‘r’ and nsp.nspname = ‘public’ order by relname ) loop return next ( select c.relname || ‘(‘ || string_agg(attname || ‘:’ || typeString(atttypid, atttypmod), ‘, ‘ order by attnum) || ‘)’ from pg_attribute where attrelid = c.oid and attnum > 0 ); end loop; end; $$ language plpgsql;

create or replace function typeString(typid oid, typmod integer) returns text as $declare typ text; begin typ := pg_catalog.format_type(typid,typmod); if (substr(typ,1,17) = 'character varying') then typ := replace(typ, 'character varying', 'varchar'); elsif (substr(typ,1,9) = 'character') then typ := replace(typ, 'character', 'char'); end if; return typ; end;$ language plpgsql;

14.
```sql
-- Update pg_attribute.atttypmod.

update pg_attribute
set atttypmod = N
from pg_class
where 
    pg_attribute.attrelid = pg_class.oid and 
    pg_class.relname = TableName and 
    attname = 'ColumnName';

Tony Yong He

Week 1 Exercises