Relational Modeling

Logical view: 'relation'

Relational tables

Keys

"determines"

Determinants determine dependents [via] dependencies :)

'Full' functional dependence is a "good" thing.

Functional dependency

STU_ID[determinant] ->[functionally determines] STU_LNAME[dependent]

STU_ID,STU_LNAME -> GPA is NOT a 'full functional dependency' because the determinant contains an extra (unwanted) attr (STU_LNAME)

STU_LNAME,STU_FNAME -> GPA is a 'full functional dependency' (assuming lastname,firstname is unique)

Solemnly swear: "The key, the whole key, and nothing but the key, so help me Codd." :) :)

Composite key; entity integrity

A table 'cannot not' have entity integrity!!

Nulls; referential integrity

Whereas entity integrity has to do with a single table, referential integrity relates to two tables (loosely, 'don't allow invalid pointers').

Types (categories) of keys

Keys: many types

* primary (foreign) keys are a subset of candidate keys are a subset of superkeys (note - superkeys could be 'wasteful', ie. contain superfluous, non-needed attrs)

* simple keys vs compound keys vs composite keys

* natural keys - keys that are created from real-world entities (eg. for a US resident, their SSN could be a natural key)

* surrogate keys (just make up brand new unique keys)

* secondary, or 'alternate' keys

You can read a bit more keys here.

Example relation

Nulls - avoid where possible!

In RL, NULLs can't be entirely avoided (look here, for 'interpreted as any of the following').

Relational 'algebra' [fun with one, two or more tables]

What if a table were a datatype (similar to an int, Vec3D, ComplexNumber, etc)?! Specifically, what operations could be perform on them (eg. similar to addition, square root on doubles)?!

Operations on tables [table(s) in, table out, ie. "closure"]

There are (only) EIGHT 'relational set operators' (defined by Ed Codd, at IBM, in 1970), which are all used to operate ("perform relational algebra") on tables: Select, Project, Union, Intersect, Difference, Product, Join, Divide.

This is no exaggeration: these operators are the basis for SQL and the entire relational DB industry!

SELECT; PROJECT; UNION; INTERSECT

SELECT [outputs a subset of rows]

PROJECT [outputs a subset of cols]

UNION [eqvt to 'cat a b > c']

INTERSECT [rows common to a and b]

Difference; Product

Difference [a - b]

Product [multiply rows, add columns]

JOIN (several kinds); DIVIDE (?!)

JOIN

JOIN [cont'd]

Tables to illustrate JOIN operations

Natural join

A natural join links tables by selecting from two tables, only those rows that have common (identical) values for common attributes.

These three steps result in a natural join: create product, select, project.

Natural join: product

Cartesian product of the two tables (product of rows, juxtaposition of columns):

Natural join: select

Select only rows with identical values in the common (joining) columns:

Natural join: project

Project away, ie. remove one of the two duplicate columns:

Result (the table above): natural join.

Left outer join

Output all rows of the left (CUSTOMER) table, including ones for which there are no matching values in the join column in the other (AGENT) table:

Note that an outer join is an "inner join plus" [it is NOT an opposite of inner join].

Right outer join, full outer join

Output all rows of the right (AGENT) table, including ones for which there are no matching values in the join column in the other (CUSTOMER) table:

Outer joins are useful in exposing missing information [in our example, customers who don't seem to have an agent, and, agents who don't seem to have customers].

A 'full outer join' is a union of left outer join and right outer join - output all the rows from both tables, including ones for which there are no matches in the other table - this could result in nulls on the left side of some rows, as well as nulls on the right side of others.

This clip shows the various types of joins [thanks, Yash Gupta, for sending this]:

DIVIDE

We're dividing by A and B in the divisor (bottom) table. There's (A,5) and (B,5) in the dividend (top) table, so we output 5 as the result; if the dividend were to contain (A,9) and (B,9) also, then we'd output 5 9 as the result.

Dictionaries [hold metadata]

A data dictionary is metadata about tables (only); a system catalog, that includes (is a supeset of, although confusingly, the two are conflated in RL) the data dictionary, and more.