CS186 SQL3 Berkeley

2024-07-08

notes / Database notes

字数统计: 3k | 阅读时长≈ 13 分钟

¶写在前面

Berkeley CS186 Intro to DB Systems

视频地址课程记录

SQL 3

¶Architecture of a DBMS: SQL Client

数据库管理系统的架构：SQL客户端
Database Management System

¶DBMS: Parsing & Optimization

数据库管理系统（DBMS）：解析与优化

¶Purpose:

Parse, check, and verify the SQL, and translate into an efficient relational query plan
解析、检查和验证SQL语句，将其转换为高效的关系查询方案

SELECT S.sid, S.sname, R.bid
FROM Sailors S, Reserves R
WHERE S.sid = R.sid and S.age > 30
GROUP BY age

¶DBMS: Relational Operators

数据库管理系统（DBMS）：关系操作符

¶Purpose:

Execute a dataflow by operating on records and files
通过在记录和文件上执行数据流来操作数据

¶DBMS: Files and Index Management

文件和索引管理

¶Purpose:

Organize tables and Records as groups of pages in a logical file
将表和记录组织成逻辑文件中的页面组

¶DBMS: Buffer Management

缓冲区管理层

¶Purpose:

Provide the illusion of operating in memory
提供一种操作内存的假象

¶DBMS: Disk Space Management

磁盘空间管理

¶Purpose:

Translate page requests into physical bytes on one or more device(s)
将缓冲区管理器的页面请求转换为一个或多个设备（如磁盘、固态磁盘）上的物理字节

¶Architecture of a DBMS

数据库管理系统的架构
- Organized in layers 层次架构
- Each layer abstracts the layer below 每一层都抽象了其下的层次
  - Manage complexity 管理复杂性
  - Performance assumptions 性能假设
- Example of good systems design DBMS是良好的系统设计示例

SQL CLIENT
Query Parsing & Optimization 解析与优化
Relational Operators 关系操作符
Files and Index Management 文件和索引管理
Buffer Management 缓冲区管理
Disk Space Management 磁盘空间管理
DBMS的架构通常分为几个层次，每一层都为上层提供服务，同时抽象了下层的复杂性。这种分层的设计有助于提高系统的可维护性和可扩展性
DATABASE

¶DBMS: Concurrency & Recovery

并发控制与恢复（？
Two cross-cutting issues related to storage and memory management

¶Before we begin: storage media

¶Disks

Most database systems were originally designed for magnetic disks 大多数数据库系统最初是为磁盘设计的
- Disk are a mechanical anachronism 磁盘是一种机械性的过时技术
- Instilled design ideas that apply to using solid state disks as well 它所灌输的设计思想也适用于固态硬盘的使用
Major implications 主要影响
- No “pointer derefs” .Instead, an API: 没有“指针解”引用
  - READ: transfer “page” of data from disk to RAM （读取）将数据的“页面”从磁盘传输到RAM
  - WRITE: transfer “page” of data from RAM to disk （写入）将数据的“页面”从RAM传输到磁盘
- Both API calls are very, very slow 这两个API调用都非常非常慢
  - Plan carefully 仔细规划
- An explicit API can be a good thing 明确的API
  - Minimizes the kind of pointer errors you see in C 最小化了你在C语言中看到的指针错误类型

¶Storage Hierarchy

存储层次结构
存储层次结构是计算机系统中不同类型存储设备的层级排列，从最快的到最慢的【从上到下】

registers 寄存器
L1 Cache L1缓存
L2 Cache L2缓存
RAM 随机存取存储器
SSD 固态硬盘
Disk 磁盘

emmm，理解下来就是读写到磁盘很慢（？

¶Components of a Disk, Pt. 6

Platters spin(say 15000 rpm)
Arm assembly moved in or out to position a head on a desired track
- Tracks under heads make a “cylinder”
Only one head reads/writes at any one time
Block/page size is a multiple of (fixed) sector size
这个地方是磁盘结构的解释，所以待插入一张结构图，就一览了然了，当然目前我还不会插图片，后面来

Untitled|400

¶Accessing a Disk page

访问磁盘界面
Time to access (read/write) a disk block: 读写磁盘块的时间以下也涉及到磁盘结构那张图
- seek time (moving arms to position disk head on track)
  - ~2-3 ms on average 寻道时间磁盘头读取所需的磁道的位置
- rotational delay (waiting for block to rotate under head)
  - ~0-4 ms (15000 RPM) 旋转延迟时间
- transfer time (actually moving data to/from disk surface)
  - ~0.25 ms per 64KB page 将数据从磁盘表面移动到磁盘头/从磁盘头移动到磁盘表面得到时间
Key to lower I/O cost: reduce seek/rotational delays 降低I/O成本的关键：减少寻道和旋转延迟的时间

¶Notes on Flash(SSD)

闪存/固态磁盘Solid State Disks，闪存通常组合在一起形成称为固态磁盘的磁盘驱动器
Issues in current generation (NAND)
- Fine-grain reads (4-8K reads), coarse-grain writes (1-2 MB writes) 细粒度读取（Fine-grain reads）：可以进行4-8KB的细粒度读取; 粗粒度写入（Coarse-grain writes）：写入操作通常是1-2MB的粗粒度
- Only 2k-3k erasures before failure, so keep moving hot write units around (“wear leveling”) 闪存的寿命限制：通常在2000到3000次擦除周期后可能发生故障，因此需要不断移动热点写入单元（“磨损均衡” wear leveling）
- write amplification: big units, need to reorg for wear & garbage collection 写入放大（Write amplification）：由于写入单元较大，需要重新组织数据以进行磨损均衡和垃圾收集
So read is fast and predictable
- Single read access time: 0.03 ms
- 4 KB random reads: ~500MB/sec
- Sequential reads: ~525MB/sec
- 64K: 0.48 ms
- 在闪存中，顺序读取和随机读取的带宽速度大致一样
But write is not . Slower for random
- Single write access time: 0.03 ms 单次写入挺快的
- 4 KB random writes: ~120 MB/sec
- Sequential writes: ~480 MB/sec
- 顺序写入相比随机写入速度快很多
Flash is faster than disk. But disk offers about 10x the capacity per $就是disk在相等价钱下提供的容量更大（自分の理解）

¶Storage Pragmatics & Trends

Many significant DBs are not big, but data sizes grow faster than Moore’s Law (“Big Data” is real) , so the role of disk, flash, RAM → to debate

¶Bottom Line (last few years)

Very large DBs: relatively traditional 非常大的数据库（Very large DBs）: 相对传统
- Disk still the best cost/MB by a lot 磁盘仍然是成本/每兆字节（cost/MB）比最高的存储介质
- SSDs improve performance and performance variance 固态硬盘（SSDs）提高了性能和性能的一致性
Smaller DB story is changing quickly 小型数据库
- Entry cost for disk is not cheap, so flash wins at the low end 磁盘的入门成本高，在低端市场闪存（flash）√
- Many interesting databases fit in RAM 许多有趣的数据库可以完全装入RAM

¶Disks and Files

Recall, most DBMSs stores information on Disks and SSDs 对以上Disk和SSD讨论的总结
- Disk are a mechanical anachronism(slow)
- SSDs faster, slow relative to memory, costly writes

¶Block Level Storage

块级存储
Read and Write large chunks of sequential bytes 读写大顺序字节块：块级存储系统设计为一次读取或写入大的连续字节块
Sequentially: “Next” disk block is fastest 顺序性：在顺序访问中，读取或写入“下一个”磁盘块是最快的
Maximize usage of data per Read/Write 最大化每次读写的数据使用量
- “Amortize” seek delays(HDDs) and writes (SSDs) 分摊硬盘驱动器上的寻道延迟以及旋转延迟翻译器：通过分摊硬盘驱动器（HDD）上的寻道延迟和固态硬盘（SSD）上的写入放大，来提高效率
Predict future behavior 预测未来行为
- Cache popular blocks 缓存热门块，将频繁访问的数据块存储在快速访问的缓存中
- Pre-fetch likely-to-be-accessed blocks 预取可能访问的块，根据访问模式预测并提前加载数据块
- Buffer writes to sequential blocks 缓冲顺序块的写入，将写入操作暂时存储，然后一次性写入到顺序的磁盘块中

¶A Note on Terminology

术语解释
Block = Unit of transfer for disk read/write 块：磁盘读写操作的数据传输单元
- 64 KB - 128 KB is a good number today 64KB到128KB是一个常见的块大小
- Book says 4 KB 书籍中可能提到4KB作为块大小的标准
Page: a common synonym for “block” 页面：通常是“块”的同义词
- In some texts, “page” = a block-sized chunk of RAM 在某些文献中，“页面”指的是RAM中与块大小相等的数据块
treat “block” and “page” as synonyms 将“块”和“页面”视为可以互换的同义词

¶Arranging Blocks on Disk

在磁盘上排列块
‘Next’ block concept:
- sequential blocks on same track, followed by 位于同一磁道上的顺序块
- blocks on same cylinder, followed by 位于同一柱面上的块
- blocks on adjacent cylinder 相邻柱面上的块
Arrange file pages sequentially by ‘next’ on disk 按磁盘上的“下一个”顺序排列文件页面
- minimize seek and rotational delay 最小化寻道和旋转延迟
For a sequential scan, pre-fetch 对于顺序扫描，进行预取
- several blocks at a time 一次预取多个块
Read large consecutive blocks 读取大的连续块

¶Disk Space Management

磁盘空间管理
Lowest layer of DBMS, manages space on disk
Purpose:
- Map pages to locations on disk 将页面映射到磁盘上的位置
- Load pages from disk to memory 从磁盘加载页面到内存
- Save pages back to disk & ensuring writes 将页面保存回磁盘并确保写入
Higher levels call upon this layer to : 更高层级调用这一层来：
- Read/write a page 读写页面
- Allocate/de-allocate logical pages 分配/释放逻辑页面

¶Requesting Pages

请求页面
Request for s sequences of pages best satisfied by pages stored sequentially on disk 对页面序列的请求最好由磁盘上顺序存储的页面满足
- Physical details hidden from higher levels of systems 物理细节对系统更高层级隐藏
- Higher levels may “safely” assume Next Page is fast, so they will simply expect sequential runs of pages to be quick to scan 更高层级可能“安全地”假设下一页很快，因此他们将期望页面的顺序运行快速扫描
- 尼玛好抽象……

¶Implementation

实现
Proposal 1：Talk to the storage device directly 数据库存储软件直接与存储设备对话，存储设备绕过操作系统的任何功能导出的协议
- could be very fast if u knew the device well 数据库系统直接连接到硬件，速度尽可能的快
- when devices change 设备可能变换，所以这个proposal不适用了
Proposal 2: Run over filesystems (FS)
- Allocate single large “contiguous” file on a nice empty disk and assume sequential / nearby byte access are fast 在一个干净的空磁盘上分配一个大的“连续”文件，并假设顺序/附近的字节访问是快速的
- Most FS optimize disk layout for sequential access 大多数文件系统优化磁盘布局以进行顺序访问
  - Gives us more or less what we want if we start with an empty disk
- DBMS “file” may span multiple FS files on multiple disks/machines DBMS的“文件”可能跨越多个磁盘/机器上的多个文件系统文件

¶Summary: Disk Space Management

provide API to read and write pages to device 提供API以向设备读写页面
Pages: block level organization of bytes on disk 页面：磁盘上字节的块级组织
Provides “next” locality and abstracts FS/device details 提供“下一个”局部性并抽象文件系统/设备细节

¶Disks and Files: Summary

Magnetic (hard) disks and SSDs 磁性（硬）磁盘和固态硬盘（SSDs）
- Basic HDD mechanics 基本的HDD机械特性
- SSD write amplification SSD写入放大
- Concept of “near” pages and how it relates to cost of access 近邻页面的概念以及相对成本
- Relative cost of
  - Random vs. sequential disk access (10x) 随机访问与顺序磁盘访问
  - Disk (pluto 冥王星) vs RAM (sacramento) vs. registers (your head) 磁盘（以冥王星为例）与RAM（以萨克拉门托为例？不懂）与寄存器（你的头脑）
    - Big differences
    - 不同存储介质的成本和性能有显著差异，寄存器最快但容量最小，RAM容量较大且速度很快，而磁盘提供了巨大的存储容量但速度相对较慢
DB File storage
- Typically over FS file(s) 通常通过文件系统（FS）上的一个或多个文件来实现
Disk space manager loads and stores pages
- Block level reasoning 基于块级逻辑进行操作
- Abstracts device and file system; provide fast “next” 抽象化了设备和文件系统的细节；提供快速的“下一个”页面访问

写在后面

抽象啊（流口水

不过也算有所收获