SQL |
As mentioned earlier, SQL is an implementation of Ed Codd's relational set operators:
1. SELECT [formerly known as RESTRICT] 2. PROJECT 3. JOIN 4. PRODUCT 5. UNION 6. INTERSECT 7. DIFFERENCE 8. DIVIDE
Interestingly, Ed's ideas were ignored by IBM, his employer. Only after Oracle debuted (with SQL support right off the bat!) did IBM create DB2, its first relational DB.
char vs VARCHAR (reserved for future use by SQL - so don't use!) vs VARCHAR2 [versus NVARCHAR2 (for Unicode)]: http://www.orafaq.com/faq/what_is_the_difference_between_varchar_varchar2_and_char_data_types
Plus: ON UPDATE CASCADE and ON DELETE CASCADE - both affect [change] a secondary table (that has an FK), when a change is made in the primary table (with the corresponding PK). ON UPDATE will update the values in the secondary table when corresp. values in the primary table are changed; ON DELETE will delete rows in the secondary table, when linked rows are deleted in the primary table.
Vaguely similar to parameter passing/matching during a function call [where arguments need to match in position, type, count].
SELECT operates on 1 or more columns, and 0 or more rows from 1 or more tables - like many SQL commands, it is set-oriented and non procedural.
UPDATE modifies one or more columns of a table (on all rows, or on specific rows based on a condition specified by WHERE). Here is more.
Another ex: DELETE FROM PRODUCT WHERE P_MIN=5; (can be any condition on any attribute).
Q: what would DELETE do, if there isn't a WHERE condition?
Another example for pattern-matching: load this page, and enter and execute:
SELECT * FROM Customers WHERE CustomerName LIKE 'C%';
The EXISTS clause is used with a query, and returns TRUE if the subquery results in any output (non-zero # of rows being returned), or FALSE if the subquery results in no data. The rest of the query (the 'main' query) will (or will not) run, based on EXIST's output - if EXISTS returns false, the main query will get skipped.
You can loosely think of EXISTS as "ONLY WHEN". We use it to 'defensively' update (insert, modify, delete) parts of a table (after we determine it is updatable).
You can also think of 'WHERE EXISTS' as "such that there exists". Eg. in the first example below (SELECT * FROM VENDOR..), it reads as "Find all vendors such that there exists records for them in the PRODUCT TABLE (via V_CODE) where P_QOH<=P_MIN" [in other words, we are looking for vendors we need to re-order from].
In the second example (INSERT INTO CONTACTS..), read it as "get the supplier ID and name for all suppliers such that there exists order IDs for them, then insert them into the contacts table".
Exercise: what does #3 below do?
Sometimes we can update a table, by filling it with output from a query. The table to be filled in has to exist first (so we need to create it if necessary), and have type-compatible columns that can receive values from our data-fetching query.
The table creation and updating can happen in separate steps, or even be combined for compactness of expression.
Oftentimes the need to do this occurs when a table (eg. PART) is created via bulk-update, from another table (eg. PRODUCT).
Below is an example of ORDER BY, where by default, values are listed (sorted) in ascending order. To list the values in descending order, we'd do this: ORDER BY P_PRICE DESC;
The sequence (listing) obtained by specifying several comma-separated attributes for ORDER BY is called a cascading order sequence.
The 'DISTINCT' keyword is used to count unique/distinct occurences of an attribute:
COUNT, MIN, MAX, SUM, AVG are all functions that operate on a numerical attr/column, and produce a single scalar result (not a table).
How many different vendors supply our products? How many supply cheap (PRICE < 10) products?
Note the third query below: COUNT(*) counts the # of rows returned by a query (rows where the product costs < 10 units). In contrast, count(
What is the value for the most expensive item in the product table? Least expensive?
What is the most expensive item (details)? We can't do: WHERE P_PRICE = MAX(P_PRICE). This is because MAX() can only be used in a SELECT statement.
You can think of 'GROUP BY' to mean "CATEGORIZE BY" or "ITEMIZE BY": rather than just a single MIN, MAX, SUM, COUNT or AVG, we're asking for a value PER OCCURRENCE of a GROUP BY value, eg. minimum price PER VENDOR, max GPA PER DEPARTMENT, average earnings PER MAJOR, etc. Specifically, when used with SUM(), GROUP BY is used to request subtotals.
'GROUP BY' used without an aggregate function is meaningless, since there is no aggregate value (MIN, COUNT etc.) to itemize.
Here is a grocery store receipt where subtotals are displayed itemized, aggregated by product types DELI, GROCERY, MIXED NUTS and PRODUCE (presumably using SUM(), together with GROUP BY):
While you're at it, see what other DATA you can spot in the receipt! Even a single trip to the store can generate a LOT of data.
The HAVING clause is used to filter rows in a GROUP BY specification (only those rows HAVING met the specified condition will appear in the result).
Note that HAVING can only occur in a query that has a GROUP BY, which in turn can only occur when there is an aggregate function (MIN, MAX, SUM, COUNT or AVG).
Need different aliases for the table being queried so that we can use such aliases as namespaces for attributes.
Now that you're familiar with SQL syntax, you will appreciate knowing how it compares with a regular programming language such as C++. Shown below is such a comparison, using C# which lets us write code in an imperative (command-oriented) manner as well a declarative (result-oriented) one [this is from a book on functional programming]:
SQL is a data creation+manipulation language, so it's best learned HANDS ON (not just by looking at slides and reading about the syntax) - you need access to a relational database where you can create tables, enter data in them and do queries on the data (tables ← data ← queries).
There are three ways to get your hands on a DB so that you can learn/practice SQL using it:
1. Browser-based SQL IDEs
An easy way to practice running SQL queries is to use browser-based interfaces (nothing to download, no login needed) to create/query databases. Here are some sites that provide this form of access:
* SQL Tutorial: http://www.w3schools.com/sql/default.asp
* ideone: http://www.ideone.com
* sqlfiddle: http://sqlfiddle.com/
* Code School's Try SQL: http://campus.codeschool.com/courses/try-sql/contents
* SQLZOO: http://sqlzoo.net/ - has extensive tutorials
* another tutorial: http://www.sqltutorial.org/
* w3resource: http://www.w3resource.com/sql-exercises/ - more tutorials
* Khan Academy: https://www.khanacademy.org/computer-programming/new/sql
* 'Online SQL interpreter': https://sql-js.github.io/sql.js/examples/GUI/ - JavaScript-based! Also, a 'xem' version is here...
Below are examples for three of the above links.
Bring up the w3schools Tryit Editor , enter and run the following pair of sql command sets one at a time (these commands run on a set of pre-installed (by w3schools.com) collection of tables called the Northwind database) :
SELECT City FROM Customers WHERE Country="Germany"; SELECT City, CustomerID FROM Customers WHERE Country="Germany" AND CustomerID>60;
In ideone , select SQLite (for the choice of language), then enter and run this:
-- warmup create table tble(str varchar(20)); insert into tble values ('Hello world!'),('SQL is fun!!'); select * from tble; -- recipes CREATE TABLE recipes ( recipe_id INT NOT NULL, recipe_name VARCHAR(30) NOT NULL, PRIMARY KEY (recipe_id), UNIQUE (recipe_name) ); INSERT INTO recipes (recipe_id, recipe_name) VALUES (1,"Tacos"), (2,"Tomato Soup"), (3,"Grilled Cheese"); -- quick check SELECT recipe_name FROM recipes; -- ingredients CREATE TABLE ingredients ( ingredient_id INT NOT NULL, ingredient_name VARCHAR(30) NOT NULL, ingredient_price INT NOT NULL, PRIMARY KEY (ingredient_id), UNIQUE (ingredient_name) ); INSERT INTO ingredients (ingredient_id, ingredient_name, ingredient_price) VALUES (1, "Beef", 5), (2, "Lettuce", 1), (3, "Tomatoes", 2), (4, "Taco Shell", 2), (5, "Cheese", 3), (6, "Milk", 1), (7, "Bread", 2); -- recipe_ingredients CREATE TABLE recipe_ingredients ( recipe_id int NOT NULL, ingredient_id INT NOT NULL, amount INT NOT NULL, PRIMARY KEY (recipe_id,ingredient_id) ); INSERT INTO recipe_ingredients (recipe_id, ingredient_id, amount) VALUES (1,1,1), (1,2,2), (1,3,2), (1,4,3), (1,5,1), (2,3,2), (2,6,1), (3,5,1), (3,7,2); SELECT * FROM recipes ORDER BY recipe_id;
Here is the recipes db table and its queries again, this time in sqlfiddle. After the page loads (with our SQL commands above, filled in!), click 'Build Schema' on the left, then click 'Run SQL' on the right [and wait for a few seconds for the output].
2. Locally installed DBs
You can install Oracle on your machine, or install the light-weight but powerful DB Browser for SQLite, aka SQLite Browser. On Windows, this offers a 'notebook' environment: https://sqlnotebook.com/ [a mix of SQL and non-SQL commands]. It is also OK to use Postgres, or MySQL; or you could use some other DB...
Here is a quickstart guide for SQLite Browser - we create a table called CosineTable, and fill it with rows containing [angle, cos(angle)] pairs. And here is another clip that shows how to create a simple sqlite DB and query it.
The second clip above, also shows that the .db file that is created, contains data in a SQLite-native binary format [which is public and well-documented]. FYI, there are sqlite API calls in multiple languages that let you access the (binary) data using functions/methods that accept SQL commands as regular strings. This is mixed-language programming that lets us accomplish a lot, because we leverage the power of Java, Python etc., as well as that of SQL. Eg. here's how it works in Python:
>>> for row in c.execute('SELECT * FROM stocks ORDER BY price'): print row # or we could email this, put it up on an LED display, buy/sell the cheap/pricey stocks.. (u'2006-01-05', u'BUY', u'RHAT', 100, 35.14) (u'2006-03-28', u'BUY', u'IBM', 1000, 45.0) (u'2006-04-06', u'SELL', u'IBM', 500, 53.0) (u'2006-04-05', u'BUY', u'MSFT', 1000, 72.0)
You could also write a binary reader+parser in C++, Java etc. and extract the sqlite-native data that way - in other words, you'd never lose your data if you store it in a sqlite .db file!
As you can see, sqlite is very popular :)
And: you can locally run Docker 'images' [software, eg. a DB image such as MySQL], inside a Docker 'container' (a mini-OS, ie. an isolated process tree).
3. Cloud-based DBs
You can also work with a relational DB via the cloud, eg. using Amazon's RDS offering. RDS offers a way to create databases such as Amazon's own Aurora, Oracle, SQL Server, MariaDB, MySQL, Postgres.