Redis 源码阅读 --- dict

Redis dict 实现相关源码阅读笔记，源码文件 dict.h & dict.c。

1. 小结

在 Redis 中，dict 不仅用于实现不同的对象类型，还用于数据库的实现。

可通过自定义哈希、复制、比较和析构函数实现不同的 dictType
哈希表使用链地址法处理哈希冲突
每个 dict 中含有两个哈希表 dictht，以便于实现渐进式 Rehash
在每次尝试往字典中新增 key-val pair 时，可能会触发扩容 Rehash:
- 一般情况下，载荷因子大于 $1$ 即触发扩容，但在使用 dict 实现键空间时还需考虑 Rehash 带来的内存使用量突增的情况
- 当 Redis 有子进程在做 RDB saving、AOF rewriting 或 module 相关的任务时，载荷因子需大于 $5$ 才会触发扩容
一般情况下，当哈希表的载荷因子小于 $0.1$ 时会触发缩容操作
- 当 Redis 有子进程在做 RDB saving、AOF rewriting 或 module 相关的任务时，不会进行缩容操作

Tips

对 $b = 2^n$ 取余
1

a % b = a & (b - 1)

2. 数据结构

dict 的数据结构如下所示：

1
2
3
4
5
6
7


typedef struct dict {
    dictType *type;
    void *privdata;
    dictht ht[2];
    long rehashidx; /* rehashing not in progress if rehashidx == -1 */
    int16_t pauserehash; /* If >0 rehashing is paused (<0 indicates coding error) */
} dict;

rehashidx 记录了当前 Rehash 的进度，小于该值的 bucket 均已被迁移至 ht[1]

dictType

其中字典类型 dictType 的结构如下，可以看出，用户可以传入各种自定义的操作函数以实现不同的字典类型。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16


typedef struct dictType {
    /* 自定义哈希值计算函数 */
    uint64_t (*hashFunction)(const void *key);
    /* 自定义键复制函数 */
    void *(*keyDup)(void *privdata, const void *key);
    /* 自定义值复制函数 */
    void *(*valDup)(void *privdata, const void *obj);
    /* 自定义键比较函数 */
    int (*keyCompare)(void *privdata, const void *key1, const void *key2);
    /* 自定义键析构函数 */
    void (*keyDestructor)(void *privdata, void *key);
    /* 自定义值析构函数 */
    void (*valDestructor)(void *privdata, void *obj);
    /* 自定义 Rehash 控制函数 */
    int (*expandAllowed)(size_t moreMem, double usedRatio);
} dictType;

dictht

可以看到在 dict 中有两个哈希表 dictht，方便实现渐进式 Rehash，以免在字典中数据量较大时 Rehash 操作带来较大延时。

dictht 的结构如下：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17


typedef struct dictht {
    dictEntry **table;
    unsigned long size; /* size of table */
    unsigned long sizemask; /* size - 1 */
    unsigned long used; /* total dictEntries in table */
} dictht;

typedef struct dictEntry {
    void *key;
    union {
        void *val;
        uint64_t u64;
        int64_t s64;
        double d;
    } v;
    struct dictEntry *next;
} dictEntry;

Redis 哈希表使用单向链表解决哈希冲突，新加入的节点 dictEntry 置于 table 中相应 bucket 的头部。

在 dictht 中，table 的长度 size 始终为 2 的整数倍(在 _dictNextPower 函数中实现)，而 sizemask = size - 1，从而可以将取余操作转化为位运算
在 dictEntry 中，使用 union 来存储不同类型的值 v

Tips: 对 $b = 2^n$ 取余

1

a % b = a & (b - 1)

迭代器

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12


/* If safe is set to 1 this is a safe iterator, that means, you can call
 * dictAdd, dictFind, and other functions against the dictionary even while
 * iterating. Otherwise it is a non safe iterator, and only dictNext()
 * should be called while iterating. */
typedef struct dictIterator {
    dict *d;
    long index;
    int table, safe;
    dictEntry *entry, *nextEntry;
    /* unsafe iterator fingerprint for misuse detection. */
    long long fingerprint;
} dictIterator;

在 dict 结构中的 pauserehash 表示的是当前正在运行的安全(safe=1)迭代器个数，其值不为 0 时，意味着此时不能进行 move bucket 操作。

3. 相关操作

3.1. 扩容

1

dictEntry *dictAddRaw(dict *d, void *key, dictEntry **existing);

在调用 dictAddRaw 尝试往哈希表中添加元素时，会先根据 key 及其哈希值调用 _dictKeyIndex 获取哈希表的 index。_dictKeyIndex 在查找 index 之前会先调用 _dictExpandIfNeeded 判断是否需要扩容。扩容需要同时满足以下条件：

dict_can_resize 为 $1$ 时，载荷因子大于 $1$；dict_can_resize 为 $0$ 时，载荷因子需大于 $5$
- 从 updateDictResizePolicy 可知当 Redis 有子进程在做 RDB saving、AOF rewriting 或 module 相关的任务时，dict_can_resize 为 0
- 当前的实现除了键空间以外的哈希表，仅需满足该条件即可扩容
实现键空间相关的哈希表考虑到 Rehash 时的内存使用量突增的情况，还需调用dictExpandAllowed判断是否可以扩容

如果正处于 Rehash 状态，新增的 dictEntry 会添加至 ht[1]。修改已存在 dictEntry 的值会就地修改。

PS: 从 redis/pull/7954 可以知道在该 PR 之前的 Rehash 操作后，哈希表的大小每次都是翻 $4$ 倍，而不是 $2$ 倍。

3.2. 缩容

1
2
3


/* Resize the table to the minimal size that contains all the elements,
 * but with the invariant of a USED/BUCKETS ratio near to <= 1 */
int dictResize(dict *d);

当 Redis 有子进程在做 RDB saving、AOF rewriting 或 module 相关的任务时，不会进行缩容操作
缩容后确保哈希表 size 大于等于 DICT_HT_INITIAL_SIZE (4)
缩容后确保载荷因子小于等于 $1$
从 htNeedsResize 可知，当哈希表的载荷因子小于 $0.1$ 时会触发缩容操作

在以下几种情况下会调用该函数：

Redis Server 在 databasesCron 中会调用 tryResizeHashTables 尝试对键空间相关哈希表进行缩容
哈希数据类型在调用 hashTypeDelete 删除一个元素后会判断是否需要缩容
集合数据类型在调用 setTypeRemove 删除一个元素后会判断是否需要缩容
有序集合数据类型在调用 zsetDel or zremrangeGenericCommand or zdiffAlgorithm2 删除元素后会判断是否需要缩容

3.3. Rehash

1

int dictRehash(dict *d, int n);

在每次调用 dictRehash 时，最多迁移 $n$ 个非空 bucket，最多迁移 $10 \cdot n$ 个空 bucket
以 bucket 为单位进行迁移，每次迁移完一个 bucket 会相应地增加 d->rehashidx 值
返回 $1$ 时表示 Rehash 操作还未完成，未处于 Rehash 状态或 Rehash 完成时返回 $0$

1
2
3


static void _dictRehashStep(dict *d) {
    if (d->pauserehash == 0) dictRehash(d,1);
}

在添加(dictAddRaw)、查找(dictFind, dictGetRandomKey, dictGetSomeKeys)、删除(dictGenericDelete) 时，会调用该函数执行渐进式 Rehash，表示如果此时无安全迭代器在遍历哈希表，则会调用 dictRehash(d,1) 迁移 $1$ 个非空 bucket (或 $10$ 个空 bucket)

1
2


/* Rehash for an amount of time between ms milliseconds and ms+1 milliseconds */
int dictRehashMilliseconds(dict *d, int ms);

在 databasesCron 中会调用该函数尝试对键空间相关的哈希表执行渐进式 Rehash，每次执行时间 ms 设置为 $1ms$，但由于在该函数的循环中是每次先调用 dictRehash(d,100) 再判断当前耗时是否超过 ms，因此这个执行时间限制只是一个软限制。

3.4. 访问

1
2
3


/* Return a random entry from the hash table. Useful to
 * implement randomized algorithms */
dictEntry *dictGetRandomKey(dict *d);

首先随机选择一个非空的 bucket，然后从该 bucket 中随机选择一个 dictEntry 作为返回值。可以看出，长度越短的 bucket 中的 dictEntry 被选中的概率越大。

1
2
3


/* This function samples the dictionary to return a few keys from random
 * locations. */
unsigned int dictGetSomeKeys(dict *d, dictEntry **des, unsigned int count);

函数返回时，在 des 中存储采样所得的 dictEntry 数组结果，不保证元素不会重复，des 所指的内存空闲需能容纳 count 个 dictEntry 指针
函数返回值为 des 中包含的 dictEntry 个数，其可能小于 count
比调用 count 次 dictGetRandomKey 高效

1
2
3


/* This is like dictGetRandomKey() from the POV of the API, but will do more
 * work to ensure a better distribution of the returned element. */
dictEntry *dictGetFairRandomKey(dict *d);

会先尝试调用 dictGetSomeKeys 获取 GETFAIR_NUM_ENTRIES 个 dictEntry，再随机挑选一个返回
调用 dictGetSomeKeys 返回值为 0 时，直接调用 dictGetRandomKey，返回其结果

1
2
3
4
5
6


/* dictScan() is used to iterate over the elements of a dictionary. */
unsigned long dictScan(dict *d,
                       unsigned long cursor,
                       dictScanFunction *fn,
                       dictScanBucketFunction *bucketfn,
                       void *privdata);

Scan 开始时，设置 cursor = 0 调用该函数
该函数每次迭代后，返回 cursor 用于下次调用
当返回的 cursor 等于 0 时，表示 Scan 结束
保证返回所有 dictEntry，但不保证不重复
如下所示，cursor 并不是按序递增，而是按位反转之后递增，利用 dictht->size 是 $2$ 的整数倍的条件，这保证了在 Rehash 时不会漏掉 bucket。
1 2 3

cursor = rev(cursor); cursor++; cursor = rev(cursor);

Table of Contents