CS186 SQL3 Berkeley

写在前面

Berkeley CS186 Intro to DB Systems

视频地址 课程记录

SQL 3

Architecture of a DBMS: SQL Client

  • 数据库管理系统的架构:SQL客户端
  • Database Management System

DBMS: Parsing & Optimization

  • 数据库管理系统(DBMS):解析与优化

Purpose:

  • Parse, check, and verify the SQL, and translate into an efficient relational query plan

  • 解析、检查和验证SQL语句,将其转换为高效的关系查询方案

  • SELECT S.sid, S.sname, R.bid
    FROM Sailors S, Reserves R
    WHERE S.sid = R.sid and S.age > 30
    GROUP BY age
    

DBMS: Relational Operators

  • 数据库管理系统(DBMS):关系操作符

Purpose:

  • Execute a dataflow by operating on records and files
  • 通过在记录和文件上执行数据流来操作数据

DBMS: Files and Index Management

  • 文件和索引管理

Purpose:

  • Organize tables and Records as groups of pages in a logical file
  • 将表和记录组织成逻辑文件中的页面组

DBMS: Buffer Management

  • 缓冲区管理层

Purpose:

  • Provide the illusion of operating in memory
  • 提供一种操作内存的假象

DBMS: Disk Space Management

  • 磁盘空间管理

Purpose:

  • Translate page requests into physical bytes on one or more device(s)

  • 将缓冲区管理器的页面请求转换为一个或多个设备(如磁盘、固态磁盘)上的物理字节

Architecture of a DBMS

  • 数据库管理系统的架构
    • Organized in layers 层次架构
    • Each layer abstracts the layer below 每一层都抽象了其下的层次
      • Manage complexity 管理复杂性
      • Performance assumptions 性能假设
    • Example of good systems design DBMS是良好的系统设计示例
SQL CLIENT
Query Parsing & Optimization 解析与优化
Relational Operators 关系操作符
Files and Index Management 文件和索引管理
Buffer Management 缓冲区管理
Disk Space Management 磁盘空间管理
DBMS的架构通常分为几个层次,每一层都为上层提供服务,同时抽象了下层的复杂性。这种分层的设计有助于提高系统的可维护性和可扩展性
DATABASE

DBMS: Concurrency & Recovery

  • 并发控制与恢复(?

  • Two cross-cutting issues related to storage and memory management

Before we begin: storage media

Disks

  • Most database systems were originally designed for magnetic disks 大多数数据库系统最初是为磁盘设计的
    • Disk are a mechanical anachronism 磁盘是一种机械性的过时技术
    • Instilled design ideas that apply to using solid state disks as well 它所灌输的设计思想也适用于固态硬盘的使用
  • Major implications 主要影响
    • No “pointer derefs” .Instead, an API: 没有“指针解”引用
      • READ: transfer “page” of data from disk to RAM (读取)将数据的“页面”从磁盘传输到RAM
      • WRITE: transfer “page” of data from RAM to disk (写入)将数据的“页面”从RAM传输到磁盘
    • Both API calls are very, very slow 这两个API调用都非常非常慢
      • Plan carefully 仔细规划
    • An explicit API can be a good thing 明确的API
      • Minimizes the kind of pointer errors you see in C 最小化了你在C语言中看到的指针错误类型

Storage Hierarchy

  • 存储层次结构
  • 存储层次结构是计算机系统中不同类型存储设备的层级排列,从最快的到最慢的【从上到下】
registers 寄存器
L1 Cache L1缓存
L2 Cache L2缓存
RAM 随机存取存储器
SSD 固态硬盘
Disk 磁盘
  • emmm,理解下来就是读写到磁盘很慢(?

Components of a Disk, Pt. 6

  • Platters spin(say 15000 rpm)

  • Arm assembly moved in or out to position a head on a desired track

    • Tracks under heads make a “cylinder”
  • Only one head reads/writes at any one time

  • Block/page size is a multiple of (fixed) sector size

  • 这个地方是磁盘结构的解释,所以待插入一张结构图,就一览了然了,当然目前我还不会插图片,后面来

Untitled|400

Accessing a Disk page

  • 访问磁盘界面

  • Time to access (read/write) a disk block: 读写磁盘块的时间 以下也涉及到磁盘结构那张图

    • seek time (moving arms to position disk head on track)
      • ~2-3 ms on average 寻道时间 磁盘头读取所需的磁道的位置
    • rotational delay (waiting for block to rotate under head)
      • ~0-4 ms (15000 RPM) 旋转延迟时间
    • transfer time (actually moving data to/from disk surface)
      • ~0.25 ms per 64KB page 将数据从磁盘表面移动到磁盘头/从磁盘头移动到磁盘表面得到时间
  • Key to lower I/O cost: reduce seek/rotational delays 降低I/O成本的关键:减少寻道和旋转延迟的时间

Notes on Flash(SSD)

  • 闪存/固态磁盘Solid State Disks,闪存通常组合在一起形成称为固态磁盘的磁盘驱动器
  • Issues in current generation (NAND)
    • Fine-grain reads (4-8K reads), coarse-grain writes (1-2 MB writes) 细粒度读取(Fine-grain reads):可以进行4-8KB的细粒度读取; 粗粒度写入(Coarse-grain writes):写入操作通常是1-2MB的粗粒度
    • Only 2k-3k erasures before failure, so keep moving hot write units around (“wear leveling”) 闪存的寿命限制:通常在2000到3000次擦除周期后可能发生故障,因此需要不断移动热点写入单元(“磨损均衡” wear leveling)
    • write amplification: big units, need to reorg for wear & garbage collection 写入放大(Write amplification):由于写入单元较大,需要重新组织数据以进行磨损均衡和垃圾收集
  • So read is fast and predictable
    • Single read access time: 0.03 ms
    • 4 KB random reads: ~500MB/sec
    • Sequential reads: ~525MB/sec
    • 64K: 0.48 ms
    • 在闪存中,顺序读取和随机读取的带宽速度大致一样
  • But write is not . Slower for random
    • Single write access time: 0.03 ms 单次写入挺快的
    • 4 KB random writes: ~120 MB/sec
    • Sequential writes: ~480 MB/sec
    • 顺序写入相比随机写入速度快很多
  • Flash is faster than disk. But disk offers about 10x the capacity per $就是disk在相等价钱下提供的容量更大(自分の理解)
  • Many significant DBs are not big, but data sizes grow faster than Moore’s Law (“Big Data” is real) , so the role of disk, flash, RAM → to debate

Bottom Line (last few years)

  • Very large DBs: relatively traditional 非常大的数据库(Very large DBs): 相对传统
    • Disk still the best cost/MB by a lot 磁盘仍然是成本/每兆字节(cost/MB)比最高的存储介质
    • SSDs improve performance and performance variance 固态硬盘(SSDs)提高了性能和性能的一致性
  • Smaller DB story is changing quickly 小型数据库
    • Entry cost for disk is not cheap, so flash wins at the low end 磁盘的入门成本高,在低端市场闪存(flash)√
    • Many interesting databases fit in RAM 许多有趣的数据库可以完全装入RAM

Disks and Files

  • Recall, most DBMSs stores information on Disks and SSDs 对以上Disk和SSD讨论的总结
    • Disk are a mechanical anachronism(slow)
    • SSDs faster, slow relative to memory, costly writes

Block Level Storage

  • 块级存储

  • Read and Write large chunks of sequential bytes 读写大顺序字节块:块级存储系统设计为一次读取或写入大的连续字节块

  • Sequentially: “Next” disk block is fastest 顺序性:在顺序访问中,读取或写入“下一个”磁盘块是最快的

  • Maximize usage of data per Read/Write 最大化每次读写的数据使用量

    • “Amortize” seek delays(HDDs) and writes (SSDs) 分摊 硬盘驱动器上的寻道延迟 以及旋转延迟 翻译器:通过分摊硬盘驱动器(HDD)上的寻道延迟和固态硬盘(SSD)上的写入放大,来提高效率
  • Predict future behavior 预测未来行为

    • Cache popular blocks 缓存热门块,将频繁访问的数据块存储在快速访问的缓存中
    • Pre-fetch likely-to-be-accessed blocks 预取可能访问的块,根据访问模式预测并提前加载数据块
    • Buffer writes to sequential blocks 缓冲顺序块的写入,将写入操作暂时存储,然后一次性写入到顺序的磁盘块中

A Note on Terminology

  • 术语解释
  • Block = Unit of transfer for disk read/write :磁盘读写操作的数据传输单元
    • 64 KB - 128 KB is a good number today 64KB到128KB是一个常见的块大小
    • Book says 4 KB 书籍中可能提到4KB作为块大小的标准
  • Page: a common synonym for “block” 页面:通常是“块”的同义词
    • In some texts, “page” = a block-sized chunk of RAM 在某些文献中,“页面”指的是RAM中与块大小相等的数据块
  • treat “block” and “page” as synonyms 将“块”和“页面”视为可以互换的同义词

Arranging Blocks on Disk

  • 在磁盘上排列块

  • ‘Next’ block concept:

    • sequential blocks on same track, followed by 位于同一磁道上的顺序块
    • blocks on same cylinder, followed by 位于同一柱面上的块
    • blocks on adjacent cylinder 相邻柱面上的块
  • Arrange file pages sequentially by ‘next’ on disk 按磁盘上的“下一个”顺序排列文件页面

    • minimize seek and rotational delay 最小化寻道和旋转延迟
  • For a sequential scan, pre-fetch 对于顺序扫描,进行预取

    • several blocks at a time 一次预取多个块
  • Read large consecutive blocks 读取大的连续块

Disk Space Management

  • 磁盘空间管理

  • Lowest layer of DBMS, manages space on disk

  • Purpose:

    • Map pages to locations on disk 将页面映射到磁盘上的位置
    • Load pages from disk to memory 从磁盘加载页面到内存
    • Save pages back to disk & ensuring writes 将页面保存回磁盘并确保写入
  • Higher levels call upon this layer to : 更高层级调用这一层来:

    • Read/write a page 读写页面
    • Allocate/de-allocate logical pages 分配/释放逻辑页面

Requesting Pages

  • 请求页面

  • Request for s sequences of pages best satisfied by pages stored sequentially on disk 对页面序列的请求最好由磁盘上顺序存储的页面满足

    • Physical details hidden from higher levels of systems 物理细节对系统更高层级隐藏
    • Higher levels may “safely” assume Next Page is fast, so they will simply expect sequential runs of pages to be quick to scan 更高层级可能“安全地”假设下一页很快,因此他们将期望页面的顺序运行快速扫描
    • 尼玛好抽象……

Implementation

  • 实现

  • Proposal 1:Talk to the storage device directly 数据库存储软件直接与存储设备对话,存储设备绕过操作系统的任何功能导出的协议

    • could be very fast if u knew the device well 数据库系统直接连接到硬件,速度尽可能的快
    • when devices change 设备可能变换,所以这个proposal不适用了
  • Proposal 2: Run over filesystems (FS)

    • Allocate single large “contiguous” file on a nice empty disk and assume sequential / nearby byte access are fast 在一个干净的空磁盘上分配一个大的“连续”文件,并假设顺序/附近的字节访问是快速的
    • Most FS optimize disk layout for sequential access 大多数文件系统优化磁盘布局以进行顺序访问
      • Gives us more or less what we want if we start with an empty disk
    • DBMS “file” may span multiple FS files on multiple disks/machines DBMS的“文件”可能跨越多个磁盘/机器上的多个文件系统文件

Summary: Disk Space Management

  • provide API to read and write pages to device 提供API以向设备读写页面
  • Pages: block level organization of bytes on disk 页面:磁盘上字节的块级组织
  • Provides “next” locality and abstracts FS/device details 提供“下一个”局部性并抽象文件系统/设备细节

Disks and Files: Summary

  • Magnetic (hard) disks and SSDs 磁性(硬)磁盘和固态硬盘(SSDs)
    • Basic HDD mechanics 基本的HDD机械特性
    • SSD write amplification SSD写入放大
    • Concept of “near” pages and how it relates to cost of access 近邻页面的概念以及相对成本
    • Relative cost of
      • Random vs. sequential disk access (10x) 随机访问与顺序磁盘访问
      • Disk (pluto 冥王星) vs RAM (sacramento) vs. registers (your head) 磁盘(以冥王星为例)与RAM(以萨克拉门托为例?不懂)与寄存器(你的头脑)
        • Big differences
        • 不同存储介质的成本和性能有显著差异,寄存器最快但容量最小,RAM容量较大且速度很快,而磁盘提供了巨大的存储容量但速度相对较慢
  • DB File storage
    • Typically over FS file(s) 通常通过文件系统(FS)上的一个或多个文件来实现
  • Disk space manager loads and stores pages
    • Block level reasoning 基于块级逻辑进行操作
    • Abstracts device and file system; provide fast “next” 抽象化了设备和文件系统的细节;提供快速的“下一个”页面访问

写在后面

抽象啊(流口水

不过也算有所收获

  • Copyrights © 2024-2025 brocademaple
  • 访问人数: | 浏览次数:

      请我喝杯咖啡吧~

      支付宝
      微信