数据库学习,其一

微信2017年分享了他们客户端数据库的经验:

之前的文章中也提到过,我在使用WCDB中的 SQLiteRepairKit时遇到的一个问题。备份未损坏的数据库的sqlite_master,并利用备份恢复数据。即便数据库未损坏,恢复过程中仍然有Btree的节点解析(parse)失败。详情见例子。这个失败导致我的一个简单的单元测试无法通过,让我对于自己的实现没有信心,极为郁闷。文档和代码里注释有限,所以我只是参考他们的demo,所以对于出现这个问题,不知道是我实现错误没有正确使用SQLiteRepairKit,还是SQLiteRepairKit里面有bugs。

这个问题未能解决,始终是一个心病。之前几年没接触过数据库,所以工作之余开始自己一点点看起来 。

微信提到过有实用两个pragma。我们逐一看一下:

synchronous = FULL

这个设置可以保证系统崩溃和突然断电不会导致系统崩溃。很安全,但是很慢。

那常说的数据库同步sync是指的什么呢?

官方文档说:

In order to guarantee that database files are always consistent, SQLite will occasionally ask the operating system to flush all pending writes to persistent storage then wait for that flush to complete. This is accomplished using the fsync() system call under unix and FlushFileBuffers() under Windows. We call this flush of pending writes a "sync".
Actually, if one is only concerned with atomic and consistent writes and is willing to forego durable writes, the sync operation does not need to wait until the content is completely stored on persistent media. Instead, the sync operation can be thought of as an I/O barrier. As long as all writes that occur before the sync are completed before any write that happens after the sync, no database corruption will occur. If sync is operating as an I/O barrier and not as a true sync, then a power failure or system crash might cause one or more previously committed transactions to roll back (in violation of the "durable" property of "ACID") but the database will at least continue to be consistent, and that is what most people care about.

翻译一下就是:所谓的同步就是利用操作系统的系统调用fsync方法,将没有写入本地存储的写操作真实的写入本地文件。

文档还提到,绝大多数情况不需要等待写入完成,可以将同步理解成I/O barrier。只需要新的写操作写入文件之前,同步之前的写操作真实写入本地文件即可。这样虽然系统崩溃或者突然断电会导致丢失,但数据库中已经写入文件数据是完整的。提高了性能,损失部分容错性(durable)。

对于I/O barrier的理解,可以对比GCD的barrier,原理是相通的。

使用WAL模式时,推荐的是NORMAL:

  1. checkpoint操作之前,WAL文件同步
  2. checkpoint完成时,数据库文件完成同步
top Created with Sketch.