CS186 SQL2 Berkeley

写在前面

Berkeley CS186 Intro to DB Systems

视频地址 课程记录

SQL 2

Conceptual SQL Evaluation

SQL查询从原始数据表提取和转换数据的执行顺序

  1. FROM Identify table 确定表

  2. WHERE Apply selections(eliminate rows) 应用选择条件(筛选行)

  3. SELECT Project away columns(keep only those used in SELECT, GBY, HAVING) SELECT子句指定了需要返回哪些列,或通过聚合函数进行计算

  4. GROUP BY Form groups & aggregate 用GROUP BY将数据分组,并对每个组应用聚合函数

  5. HAVING Eliminate groups 过滤不满足条件的组

  6. DISTINCT Eliminate duplicates 在最终结果中去除重复的行

  7. (Optinal) ORDER BY LIMIT applied at end, to “format” output 格式化输出 ORDER BY 用于控制升序/降序输出, LIMIT 用于获取特定数量的排序后结果,通常与ORDER BY一起使用

Putting it all together

1
2
3
4
5
6
SELECT S.dept, AVG(S.gpa), COUNT(*)
FROM Students S
WHERE S.gender = 'F'
GROUP BY S.dept
HAVING COUNT(*) >= 2
ORDER BY S.dept

解释:从别名为S的Students学生表中,筛选出性别为Female的学生,按照所在部门分组,并输出符合条件的学生的部门信息、平均GPA和学生总数,至少有两名女性学生的部门才会被输出,最终结果按照部门字典序排列

Join Queries

1
2
3
4
5
SELECT [DISTINCT] <column expression list>
FROM <table1 [AS t1], ..., tableN [AS tn]>
[WHERE <predicate>]
[GROUP BY <column list> [HAVING <predicate>] ]
[ORDER BY <column list>];

多表查询

区别就是FROM那里有多个表

Cross(Cartesian) Product

  • All pairs of tuples, concentrated

Example: Sailors who have reserved a boat

Sailors

sidsnameratingage
1Popeye1022
2OliveOyl1139
3Garfield127
4Bob519

Reserveds

SIDBIDDAY
11029//12
21029/13
110110/01

有效合并表的SQL语句 “预订了船的水手”

1
2
3
SELECT Sailors.sid, Sailors.sname, Reserves.bid
FROM Sailors, Reserves
WHERE Sailors.sid = Reserves.sid

合并后的表—— “预订了船的水手”

sidBIDday
11029/12
21029/13
110110/01

Column Names and Table Aliases

原来的:

1
2
3
SELECT Sailors.sid, Sailors.sname, Reserve.bid
FROM Sailors, Reserves
WHERE Sailors.sid = Reserve.sid

改进之后:

1
2
3
SELECT Sailors.sid, sname, bid
FROM Sailors AS S, Reserves AS R
WHERE S.sid = R.sid

因为sname只有Sailors里面有,bid只有Reserves里面有,所以可以省略前缀,但是sid是两个表都有,所以需要加前缀来区分是哪个表的

然后给表名取别名,有助于代码精简+可读性↑

More Aliases

别名也可以在输出中使用

1
2
3
SELECT x.sname, x.age, y.sname AS sname2, y.age AS age2
FROM Sailors AS x, Sailors AS y
WHERE x.age > y.age

给同一个表两个别名,把一个表分成两个表,其中左表的年龄比右表大

  • Table aliases in the FROM clause - Needed when the same table used multiple times (“self-join”) 当同一个表被多次使用(“自连接”)的时候需要使用表别名来区分
  • Column aliases in the SELECT clause 为列指定别名,以便在查询结果中使用更有意义的列名

Arithmetic Expressions

可以在``SELECT`语句中指定算术运算符和表达式

1
2
3
SELECT S.age, S.age - 5 AS age1, 2 * S.ages AS age2
FROM Sailors AS S
WHERE S.sname = 'Popeye'

在``WHERE`语句中也可以使用算术运算符和表达式

1
2
3
SELECT S1.sname AS name1, S2.sname AS name2
FROM Sailors AS S1, Sailors AS S2
WHERE 2*S1.rating = S2.rating-1

SQL Calculator!

1
2
3
4
5
SELECT
log(1000) as three,
exp(ln(2)) as two,
cos(0) as one,
ln(2*3) = ln(2) + ln(3) as sanity;

没有FROM语句,且只返回一行。也可以吧SQL语言当作计算器

以上语句会输出一行四列的表

String Comparisons

SQL还支持字符串函数,特别是WHERE子句中的字符串比较函数,以下是两种不同的方式来查找格式为以B开头、且有任意数量其他字符的字符串:

  • Old-school SQL
1
2
3
SELECT S.sname
FROM Sailors S
WHERE S.sname like 'B_%'
  • Standard Regular Expressions
1
2
3
SELECT S.sname
FROM Sailors S
WHERE S.sname ~ 'B.*'

好像用的是正则表达式 ~代表的是正则式匹配

Combining Predicates

使用布尔逻辑来组合多个谓词

  • Subtle connections between:
    • Boolean logic in WHERE(i.e., AND, OR)
    • Traditional Set operations(i.e., INTERSECT, UNION) 传统集合操作

Example1: Sid’s of sailors who reserved a red or green boat

1
2
3
SELECT R.sid
FROM Boats B, Reserves R
WHERE R.bid = B.bid AND (B.color = 'red' OR B.color = 'green')

另一种方式:

1
2
3
4
5
6
7
8
9
SELECT R.sid
FROM Boats B, Reserves R
WHERE R.bid = B.bid AND B.color = 'red'

UNION ALL

SELECT R.sid
FROM Boats B, Reserves R
WHERE R.bid = B.bid AND B.color = 'green'

UNION ALL 不移除重复项,相当于取并集,红or绿

Example2: Sid’s of sailors who reserved a red and green boat

1
2
3
SELECT R.sid
FROM Boats B, Reserves R
WHERE R.bid = B.bid AND (B.color = 'red' AND B.color = 'green')

但是此处第一种方式将不会返回任何东西,因为color不可能同时='red'='green'

所以此处只能使用另一种方式:

1
2
3
4
5
6
7
8
9
SELECT R.sid
FROM Boats B, Reserves R
WHERE R.bid = B.bid AND B.color = 'red'

INTERSECT

SELECT R.sid
FROM Boats B, Reserves R
WHERE R.bid = B.bid AND B.color = 'green'

INTERSECT 相当于取交集,红and绿

Set Semantics

  • Set: a collection of distinct elements 集合是由不同元素组成的集合
  • Standard ways of manipulating / combining sets
    • Union 并集
    • Intersect 交集
    • Except 差集
  • Treat tuples within a relation as elements of a set 将关系中的元组视为集合中的元素

Default: Set Semantics

R = {A, A, A, A, B, B, C, D}

S = {A, A, B, B, B, C, E}

  • UNION {A, B, C, D, E}
  • INTERSECT {A, B, C}
  • EXCEPT {D}

Note: Think of each other as being a tuple in relation

“ALL”: Multiset Semantics

R = {A, A, A, A, B, B, C, D} = {A(4), B(2), C(1), D(1)}

S = {A, A, B, B, B, C, E} = {A(2), B(3), C(1), E(1)}

  • UNION ALL: sum of cardinalities {A(4+2), B(2+3), C(1+1), D(1+0), E(0+1)} = {A, A, A, A, A, A, B, B, B, B, B, C, C, D, E}
  • INTERSECT ALL: min of cardinalities {A(min(4, 2)), B(min(2, 3)), C(min(1, 1)), D(min(1, 0)), E(min(0, 1))} = {A, A, B, B, C}
  • EXCEPT ALL: difference of cardinalities {A(4-2), B(2-3), C(1-1), D(1-0), E(0-1)} = {A, A, D}

Nested Queries

  • 嵌套查找

Nested Queries: IN

  • Example: Names of sailors who’ve reserved boat #102:
1
2
3
4
5
6
SELECT S.sname
FROM Sailors S
WHERE S.sid IN
(SELECT R.sid
FROM Reserves R
WHERE R.bid = 102)
  • 括号里的是subquery

Nested Queries: NOT IN

  • Example: Names of sailors who’ve not reserved boat #103:
1
2
3
4
5
6
SELECT S.sname
FROM Sailors S
WHERE S.sid NOT IN
(SELECT R.sid
FROM Reserves R
WHERE R.bid = 103)

Nested Queries: EXISTS

  • This is a bit odd, but it is legal:
1
2
3
4
5
6
SELECT S.sname
FROM Sailors S
WHERE EXISTS
(SELECT R.sid
FROM Reserves R
WHERE R.bid = 103)
  • 如果subquery有东西返回,就输出其中所有的sname,如果subquery返回为空,则query输出也为空

Nested Queries with Correlation

  • 相关子查询
  • Names of sailors who 've reserved boat #102:
1
2
3
4
5
6
SELECT S.sname
FROM Sailors S
WHERE EXISTS
(SELECT *
FROM Reserves R
WHERE R.bid = 102 AND S.sid = R.sid)
  • Correlated subquery is recomputed for each Sailors tuple 相关子查询针对每个Sailors元组(元组代表表中的一行 记录)重新计算
  • 这个相关子查询有点像函数

More on Set-Comparison Operators

  • We 've seen: IN EXISTS
  • Can also have: NOT IN NOT EXISTS
  • Other forms: op ANY op ALL
Example: Find sailors whose rating is greater than that of some sailor called Popeye
1
2
3
4
5
6
SELECT *
FROM Sailors S
WHERE S.rating > ANY
(SELECT S2.rating
FROM Sailors S2
WHERE S2.sname = 'Popeye')
  • 从Sailors表中,找出比Popeye的rating更高的人,输出其所有信息

A Tough One: “Division”——Relational Division

  • Relational Division: “Find sailors who 've reserved all boats.” Said differently: “sailors with no counterexample missing boats” 关系除法:“找出预订了所有船只的水手。” 以下有点像双重否定之后输出了正确的结果
1
2
3
4
5
6
7
8
9
SELECT S.sname
FROM Sailors S
WHERE NOT EXISTS
(SELECT B.bid
FROM Boats B
WHERE NOT EXISTS (SELECT R.bid
FROM Reserves R
WHERE R.bid = B.bid
AND R.sid = S.sid))

ARGMAX?这块没太懂

Pt 1

  • The sailor with the highest rating
  • Correct or Incorrect?
1
2
3
4
5
SELECT MAX(S.rating)
FROM Sailors S;

SELECT S.*, MAX(S.rating)
FROM Sailors S;
  • 第一个写法不对,因为只输出的最大值,而没有输出最大值对应的水手的姓名等信息
  • 第二个写法有S.* ,包含了S的所有列,包括姓名、ID、最大值等。但是第二个写法是非法的,该查询会被拒绝。因为不知道该输出哪个水手的信息
  • MAX是一个聚合函数,必须与GROUP NY子句一起使用,或者在子查询中使用,而不能直接与单个记录的列一起选择(?

Pt 2

以下两个query的结果一致,都是输出rating最大值的水手的信息

1
2
3
4
5
SELECT *
FROM Sailors S
WHERE S.rating >= ALL
(SELECT S2.rating
FROM Sailors S2)
1
2
3
4
SELECT *
FROM Sailors S
ORDER BY rating DESC
LIMIT 1

第一种写法可能会返回一大堆评级为最高级的水手信息,相比之下第二种写法只会返回一种水手信息

“Inner” Joins: Another Syntax

1
2
3
4
SELECT s.*, r.bid
FROM Sailors s, Reserves r
WHERE s.sid = r.sid
AND ...

以上是第一种写法

1
2
3
4
SELECT s.*, r.bid
FROM Sailors s INNER JOIN Reserves r
ON s.sid = r.sid
WHERE ...

以上是第二种写法,两种写法的效果一样

Inner Join是一种中缀表示法,位于两个表的名称之间,用于两个表联接

Join Variants

1
2
3
4
5
6
7
SELECT <column expression list>
FROM table_name
[INNER | NATURAL
| {LEFT | RIGHT | FULL } {OUTER}] JOIN
table_name
ON <qualification list>
WHERE ...
  • INNER is default 默认为内部连接

  • Inner join what we 've learned so far -Same thing, just with different syntax

Inner/Natural Joins

  • 内部连接/自然连接

  • 以下三种写法等效

  • 第一种:最开始的两表联合查询

1
2
3
4
SELECT s.sid, s.sname, r.bid
FROM Sailors s, Reserves r
WHERE s.sid = r.sid
AND s.age > 20
  • 第二种:Inner Join
1
2
3
4
SELECT s.sid, s.sname, r.bid
FROM Sailors s INNER JOIN Reserves r
ON s.sid = r.sid
WHERE s.age > 20
  • 第三种:Natural Join
1
2
3
SELECT s.sid, s.sname, r.bid
FROM Sailors s NATURAL JOIN Reserves r
WHERE s.age > 20
  • "NATURAL" means equi-join for pairs of attributes with the same name
  • Natural Join 是``Inner Join的一种特殊情况,Natural Joinon子句在后台被自动计算,自动匹配相等的列名称,也就是第二种写法中的s.sid = r.sid`
  • 所以谨慎选择使用natural join,因为这种方法会选择联合的表中所有列名称相同的匹配字段;而且随着向表中添加或删除列或重命名列,``natural join`会以不可预测的方式中断,所以一般不推荐使用

Left Outer Join

  • returns all matched rows, and preserves all unmatched rows from the table on the left of the join clause -(use nulls in fields of non-matching tuples) 返回所有匹配的行,并保留连接子句左侧表中未匹配的所有行 (在不匹配元组的字段中使用空值)
1
2
3
SELECT s.sid, s.sname, r.bid
FROM Sailors2 s LEFT OUTER JOIN Reserves2 r
On s.sid = r.sid;
  • returns all sailors & bid for boat in any of their reservations 返回了所有水手及其在Reserves中的船只编号r.bid
  • note: no match for s.sid? r.bid IS NULL 如果s.sid没有匹配项则r.bid = NULL 结果集将包含左侧表的所有记录,以及右侧表中对应的记录

Right Outer Join

  • returns all matched rows, and preserves all unmatched rows from the table on the rightof the join clause -(use nulls in fields of non-matching tuples) 返回所有匹配的行,并保留连接子句右侧表中未匹配的所有行 (在不匹配元组的字段中使用空值)
1
2
3
SELECT r.sid, b.bid, b.bname
FROM Reserves2 r RIGHT OUTER JOIN Boats2 b
On r.bid = b.bid
  • returns all boats& sid for any sailor associated with the reservation 返回了所有水手及其在Reserves中的船只编号r.bid
  • note: no match for s.sid? r.bid IS NULL 如果s.sid没有匹配项则r.bid = NULL 结果集将包含右侧表的所有记录,以及左侧表中对应的记录

Full Outer Join

  • Returns all (matched or unmatched) rows from the tables on both sides of the join clause 返回连接子句两侧表中的所有行(不管匹不匹配)
1
2
3
SELECT r.sid, b.bid, b.bname
FROM Reserves2 r FULL OUTER JOIN Boats2 b
ON r.bid = b.bid
  • returns all boats & all information on reservations 返回左右表所有信息 此处即所有船只和所有预订信息
  • No match for r.bid? - b.bid IS NULL AND b.bname IS NULL 如果r.bid没有匹配项?b.bidb.bname 都是NULL
  • No match for b.bid? - r.sid IS NULL

Views: Named Queries

  • 视图是一个虚拟的表,其内容由SQL查询定义。视图不存储数据,而是在查询视图时动态生成数据
1
2
CREATE VIEW view_name   // 创建视图
AS select_statement // 定义视图内容
  • Makes development simpler 简化复杂的查询,通过简单的视图名称来代替复杂的SQL语句
  • Often used for security 视图可以限制用户对某些数据的访问,隐藏其余数据
  • Not “materialized” 非物化,即视图不存储查询结果,每次访问视图时都会执行底层的SQL查询
  • 应用场景:数据聚合、数据过滤、简化复杂查询等
1
2
3
4
5
CREATE VIEW Redcount
AS SELECT B.bid, COUNT(*) AS scount
FROM Boats2 B, Reserves2 R
WHERE R.bid = B.bid AND B.color = 'red'
GROUP BY B.bid
  • Redcount视图通过查询Boats2Reserves2表,统计了颜色为红色的船只的预订数量,并按船只编号分组。如果B.bidReserves2表中没有匹配项,那么对应的计数将为0

Views Instead of Relations in Queries 这个没太懂

1
2
3
4
5
6
7
8
9
10
11
12
13
CREATE VIEW Redcount
AS SELECT B.bid, COUNT(*) AS scount
FROM Boats2 B, Reserves2 R
WHERE R.bid = B.bid AND B.color = 'red'
GROUP BY B.bid;

// 因为视图是一个虚拟的表,所以这条语句执行的是定义视图时的SQL查询
SELECT * from Redcount

SELECT bname, scount
FROM Redcount R, Boats2 B // 将视图和Boats2表进行查询
WHERE R.bid = B.bid
AND scount < 10
  • 基于Boats2Reserves2表,统计了颜色为红色的船只的预订数量,并按船只编号分组

Subqueries in FROM

  • 想创建一个只使用一次的“视图” “即时视图”
  • define some query in the FROM while define the outer query 定义外部查询时在FROM语句中定义子查询
  • Like a “view on the fly”
1
2
3
4
5
6
7
8
SELECT bname, scount
FROM Boats2 B,
(SELECT B.bid, COUNT(*)
FROM Boats2 B, Reserves2 R
WHERE R.bid = B.bid AND B.color = 'red'
GROUP BY B.bid) AS Reds(bid, scount)
WHERE Reds.bid = B.bid // 连接子查询
AND scount < 10

With a.k.a common table expression(CTE)

  • CTE Common Table Expression 公用表表达式
  • Another “view on the fly” syntax:
  • 在常规查询之前定义子查询,这个子查询可在后面的查询复用
1
2
3
4
5
6
7
8
9
10
11
// 定义了Reds这个子查询
WITH Reds(bid, scount) AS
(SELECT B.bid, COUNT (*)
FROM Boats2 B, Reserves2 R
WHERE R.bid = B.bid AND B.color = 'red'
GROUP BY B.bid)

SELECT bname, scount
FROM Boats2 B, Reds
WHERE Reds.bid = B.bid
AND scount < 10

Can have many queries in WITH

  • Another “view on the fly” syntax
  • 在常规查询前定义多个子查询,这些子查询也可以相互引用
1
2
3
4
5
6
7
8
9
10
11
12
13
	WITH Reds(bid, scount) AS
(SELECT B.bid, COUNT (*)
FROM Boats2 B, Reserves2 R
WHERE R.bid = B.bid AND B.color = 'red'
GROUP BY B.bid),

UnpopularReds AS
SELECT bname, scount
FROM Boats2 B, Reds
WHERE Reds.bid = B.bid
AND scount < 10

SELECT * FROM UnpopularReds;

Example: ARGMAX GROUP BY?

  • The sailor with the highest rating per age 找出每个年龄段最高的rating值
1
2
3
4
5
6
7
8
9
	WITH maxratings(age, maxrating) AS
(SELECT age, max(rating)
FROM Sailors
GROUP BY age)

SELECT S.*
FROM Sailors S, maxratings m
WHERE S.age = m.age
AND S.rating = m.maxrating

Testing SQL Queries

  • Not every database instance will reveal every bug in your query
    • Eg: database instance without any rows in it
  • Need to debug your queries
  • reasoning about them carefully
  • constructing test data

Brief Detour: Null Values

  • Field values are sometimes unknown
    • SQL provides a special value NULL for such situations
    • every data type can be NULL
  • The presence of null complicates many issues. E.g.: 有时有NULL会变得复杂
    • Selection predicates(WHERE) NULL如何与WHERE子句交互
    • Aggregation 聚合
  • But NULLs comes naturally from Outer joins NULL无法避免,NULL来自外部连接

NULL in the WHERE clause

Example
  • Consider a tuple where rating IS NULL

  • INSERT INTO sailors VALUES
    (11, 'Jack Sparrow', NULL, 35);
    
    SELECT * FROM sailors
    WHERE rating > 8;
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12

    - Is Jack Sparrow in the output? --The result is yes/no 有/没有,答案不能是含糊的

    #### NULL in comparators

    - 比较器中的NULL
    - Rules: (x op NULL) evaluates to ... NULL!

    ```sql
    SELECT rating = NULL FROM sailors;
    SELECT rating < NULL FROM sailors;
    SELECT rating >= NULL FROM sailors;
  • 每行SELECT语句返回的都是NULL

Explicit NULL Checks

1
2
SELECT * FROM sailors WHERE rating IS NULL;
SELECT * FROM sailors WHERE rating IS NOT NULL;
  • 显式NULL检查
  • 当将字段与NULL进行比较时,使用IS NULL/IS NOT NULL 而不是等号,返回值为true/false

NULL at top of WHERE

  • Rule: Do not output a tuple WHERE NULL
1
2
3
SELECT * FROM sailors;
SELECT * FROM sailors WHERE rating > 8; // NULL > 8
SELECT * FROM sailors WHERE rating <= 8; // NULL <= 8
  • 二三句查询语句的结果都是NULL,因此结果不会输出

NULL in boolean Logic

  • 布尔逻辑中的NULL
  • SQL遵循三值逻辑:True / False / NULL
nottfn
FTN
ANDTFN
TTFN
FFFF
NNFN
ORTFN
TTTT
FTFN
NTNN
  • General rule:
    • NULL can take on either T or F, so answer needs to accommodate either value NULL可以代表TRUE或FALSE,因此任何包含NULL的布尔表达式的结果都需要 能够适应这两种可能的值(好奇妙啊

NULL and Aggregation

  • General rule:
    • NULL column values are ignored by aggregate functions 聚合函数在处理输出时会忽略列中的NULL值。如果某个列的值是NULL,则其不会对聚合函数的结果产生影响(值都不知道啊不知道干脆忽视吧^^

NULLs: Summary

  • (x op NULL) is NULL
  • WHERE NULL: do not send to output
  • Boolean connectives: 3-valued logic
  • Aggregates ignore NULL-valued inputs

Summary

  • SQL is a declarative language 声明式语言
    • Somebody has to translate to algorithms though 需要将其翻译成算法
    • The RDBMS implementor … i.e. you 自分=数据库管理系统的实现者
  • The data structure and algorithms that make SQL possible also power: SQL背后的数据结构和算法也支持:
    • NoSQL, data mining, scalable ML, network routing NoSQL、数据挖掘、可扩展的机器学习、网络路由
    • A toolbox for scalable computing 一套用于可扩展计算的工具箱
  • Temporarily skirted questions of good database(schema) design 后续继续讨论数据库(架构)设计的问题
    • a topic we ‘ll consider in greater depth later’

写在后面

课的知识密度很高,算算时长两节课才两个多小时可能,我tm看了两三天。。。需要停一停沉淀一下了

哔哔:

感觉没什么东西能真正留住我,是我想留下,所以才给自己硬栓了一条绳子。但是刚刚,产生了五分钟拴不住的念头

可能是学累了吧嘿嘿

  • Copyrights © 2024-2025 brocademaple
  • 访问人数: | 浏览次数:

      请我喝杯咖啡吧~

      支付宝
      微信