Relational Modeling |
'Full' functional dependence is a "good" thing.
STU_ID[determinant] ->[functionally determines] STU_LNAME[dependent]
STU_ID,STU_LNAME -> GPA is NOT a 'full functional dependency' because the determinant contains an extra (unwanted) attr (STU_LNAME)
STU_LNAME,STU_FNAME -> GPA is a 'full functional dependency' (assuming lastname,firstname is unique)
Solemnly swear: "The key, the whole key, and nothing but the key, so help me Codd." :) :)
A table 'cannot not' have entity integrity!!
Whereas entity integrity has to do with a single table, referential integrity relates to two tables (loosely, 'don't allow invalid pointers').
* primary (foreign) keys are a subset of candidate keys are a subset of superkeys (note - superkeys could be 'wasteful', ie. contain superfluous, non-needed attrs)
* simple keys vs compound keys vs composite keys
* natural keys - keys that are created from real-world entities (eg. for a US resident, their SSN could be a natural key)
* surrogate keys (just make up brand new unique keys)
* secondary, or 'alternate' keys
You can read a bit more keys here.
In RL, NULLs can't be entirely avoided (look here, for 'interpreted as any of the following').
What if a table were a datatype (similar to an int, Vec3D, ComplexNumber, etc)?! Specifically, what operations could be perform on them (eg. similar to addition, square root on doubles)?!
There are (only) EIGHT 'relational set operators' (defined by Ed Codd, at IBM, in 1970), which are all used to operate ("perform relational algebra") on tables: Select, Project, Union, Intersect, Difference, Product, Join, Divide.
This is no exaggeration: these operators are the basis for SQL and the entire relational DB industry!
A natural join links tables by selecting from two tables, only those rows that have common (identical) values for common attributes.
These three steps result in a natural join: create product, select, project.
Cartesian product of the two tables (product of rows, juxtaposition of columns):
Select only rows with identical values in the common (joining) columns:
Project away, ie. remove one of the two duplicate columns:
Result (the table above): natural join.
Output all rows of the left (CUSTOMER) table, including ones for which there are no matching values in the join column in the other (AGENT) table:
Note that an outer join is an "inner join plus" [it is NOT an opposite of inner join].
Output all rows of the right (AGENT) table, including ones for which there are no matching values in the join column in the other (CUSTOMER) table:
Outer joins are useful in exposing missing information [in our example, customers who don't seem to have an agent, and, agents who don't seem to have customers].
A 'full outer join' is a union of left outer join and right outer join - output all the rows from both tables, including ones for which there are no matches in the other table - this could result in nulls on the left side of some rows, as well as nulls on the right side of others.
This clip shows the various types of joins [thanks, Yash Gupta, for sending this]:
We're dividing by A and B in the divisor (bottom) table. There's (A,5) and (B,5) in the dividend (top) table, so we output 5 as the result; if the dividend were to contain (A,9) and (B,9) also, then we'd output 5 9 as the result.
A data dictionary is metadata about tables (only); a system catalog, that includes (is a supeset of, although confusingly, the two are conflated in RL) the data dictionary, and more.