- design-seckill.md: 秒杀系统设计 - design-shorturl.md: 短链接系统设计 - design-lbs.md: LBS附近的人系统设计 - design-im.md: 即时通讯系统设计 - design-feed.md: 社交信息流系统设计 Each document includes: - Requirements analysis and data volume assessment - Technical challenges - System architecture design - Database design - Caching strategies - Scalability considerations - Practical project experience - Alibaba P7 level additional points Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
650 lines
18 KiB
Markdown
650 lines
18 KiB
Markdown
# 跳表 (Skip List)
|
||
|
||
## 数据结构原理
|
||
|
||
### 什么是跳表?
|
||
跳表是一种概率性的数据结构,通过在多个层级上维护有序的链表来提供高效的查找、插入和删除操作。它是一种在平衡二叉搜索树和链表之间的折中方案,实现简单且性能优异。
|
||
|
||
### 跳表的核心概念
|
||
|
||
1. **层级**:跳表由多个层级组成,最高层是稀疏的,最低层是稠密的
|
||
2. **节点**:每个节点在不同层级中有多个指针
|
||
3. **索引**:高层级作为低层级的索引,快速定位
|
||
4. **概率性平衡**:通过随机算法保证树的平衡性
|
||
|
||
### 跳表的工作原理
|
||
|
||
1. **查找**:从最高层开始,向右查找,无法向右时向下继续
|
||
2. **插入**:随机决定插入的层级,在相应层级插入节点
|
||
3. **删除**:在所有层级中删除对应节点
|
||
4. **平衡**:通过随机概率保持树的平衡性
|
||
|
||
## 图解说明
|
||
|
||
```
|
||
跳表结构示例(最大层级 4):
|
||
|
||
Level 3: ---1---10---40---70---
|
||
Level 2: ---1-----10-----70---
|
||
Level 1: ---1------10-----70---
|
||
Level 0: 1->2->3->4->5->6->7->8->9->10->11->...->70
|
||
|
||
查找过程(查找 7):
|
||
- Level 3: 从 1 开始,无法向右,向下到 Level 2
|
||
- Level 2: 从 1 开始,无法向右,向下到 Level 1
|
||
- Level 1: 从 1 开始,无法向右,向下到 Level 0
|
||
- Level 0: 从 1 开始,遍历到 7
|
||
```
|
||
|
||
### 跂表节点结构
|
||
|
||
```
|
||
SkipListNode {
|
||
value: 7
|
||
next[0] -> 8
|
||
next[1] -> 10
|
||
next[2] -> 10
|
||
next[3] -> 40
|
||
}
|
||
```
|
||
|
||
### 层级选择算法
|
||
|
||
```java
|
||
// 随机生成层级
|
||
int level = 0;
|
||
while (random() < 0.5 && level < MAX_LEVEL) {
|
||
level++;
|
||
}
|
||
```
|
||
|
||
## Java 代码实现
|
||
|
||
### 节点类定义
|
||
|
||
```java
|
||
class SkipListNode<T extends Comparable<T>> {
|
||
T value;
|
||
SkipListNode<T>[] next;
|
||
|
||
@SuppressWarnings("unchecked")
|
||
public SkipListNode(T value, int level) {
|
||
this.value = value;
|
||
this.next = new SkipListNode[level + 1];
|
||
}
|
||
}
|
||
```
|
||
|
||
### 跳表实现
|
||
|
||
```java
|
||
import java.util.Random;
|
||
|
||
public class SkipList<T extends Comparable<T>> {
|
||
private static final int MAX_LEVEL = 16;
|
||
private static final double PROBABILITY = 0.5;
|
||
|
||
private SkipListNode<T> header;
|
||
private int level;
|
||
private int size;
|
||
private Random random;
|
||
|
||
@SuppressWarnings("unchecked")
|
||
public SkipList() {
|
||
this.header = new SkipListNode<>(null, MAX_LEVEL);
|
||
this.level = 0;
|
||
this.size = 0;
|
||
this.random = new Random();
|
||
}
|
||
|
||
// 随机生成层级
|
||
private int randomLevel() {
|
||
int lvl = 0;
|
||
while (lvl < MAX_LEVEL && random.nextDouble() < PROBABILITY) {
|
||
lvl++;
|
||
}
|
||
return lvl;
|
||
}
|
||
|
||
// 插入操作
|
||
public void insert(T value) {
|
||
SkipListNode<T>[] update = new SkipListNode[MAX_LEVEL + 1];
|
||
SkipListNode<T> current = header;
|
||
|
||
// 从最高层开始查找插入位置
|
||
for (int i = level; i >= 0; i--) {
|
||
while (current.next[i] != null &&
|
||
current.next[i].value.compareTo(value) < 0) {
|
||
current = current.next[i];
|
||
}
|
||
update[i] = current;
|
||
}
|
||
|
||
// 确定新节点的层级
|
||
int newLevel = randomLevel();
|
||
|
||
// 如果新层级大于当前层级,更新高层级的指针
|
||
if (newLevel > level) {
|
||
for (int i = level + 1; i <= newLevel; i++) {
|
||
update[i] = header;
|
||
}
|
||
level = newLevel;
|
||
}
|
||
|
||
// 创建新节点并插入
|
||
SkipListNode<T> newNode = new SkipListNode<>(value, newLevel);
|
||
for (int i = 0; i <= newLevel; i++) {
|
||
newNode.next[i] = update[i].next[i];
|
||
update[i].next[i] = newNode;
|
||
}
|
||
|
||
size++;
|
||
}
|
||
|
||
// 查找操作
|
||
public boolean contains(T value) {
|
||
SkipListNode<T> current = header;
|
||
|
||
for (int i = level; i >= 0; i--) {
|
||
while (current.next[i] != null &&
|
||
current.next[i].value.compareTo(value) < 0) {
|
||
current = current.next[i];
|
||
}
|
||
}
|
||
|
||
current = current.next[0];
|
||
return current != null && current.value.compareTo(value) == 0;
|
||
}
|
||
|
||
// 删除操作
|
||
public void delete(T value) {
|
||
SkipListNode<T>[] update = new SkipListNode[MAX_LEVEL + 1];
|
||
SkipListNode<T> current = header;
|
||
|
||
// 查找要删除的节点
|
||
for (int i = level; i >= 0; i--) {
|
||
while (current.next[i] != null &&
|
||
current.next[i].value.compareTo(value) < 0) {
|
||
current = current.next[i];
|
||
}
|
||
update[i] = current;
|
||
}
|
||
|
||
current = current.next[0];
|
||
if (current != null && current.value.compareTo(value) == 0) {
|
||
// 从所有层级中删除
|
||
for (int i = 0; i <= level; i++) {
|
||
if (update[i].next[i] != current) {
|
||
break;
|
||
}
|
||
update[i].next[i] = current.next[i];
|
||
}
|
||
|
||
// 更新当前层级
|
||
while (level > 0 && header.next[level] == null) {
|
||
level--;
|
||
}
|
||
|
||
size--;
|
||
}
|
||
}
|
||
|
||
// 获取最小值
|
||
public T getMin() {
|
||
SkipListNode<T> current = header.next[0];
|
||
return current != null ? current.value : null;
|
||
}
|
||
|
||
// 获取最大值
|
||
public T getMax() {
|
||
SkipListNode<T> current = header;
|
||
for (int i = level; i >= 0; i--) {
|
||
while (current.next[i] != null) {
|
||
current = current.next[i];
|
||
}
|
||
}
|
||
return current.value;
|
||
}
|
||
|
||
// 跳表大小
|
||
public int size() {
|
||
return size;
|
||
}
|
||
|
||
// 是否为空
|
||
public boolean isEmpty() {
|
||
return size == 0;
|
||
}
|
||
|
||
// 打印跳表结构
|
||
public void printList() {
|
||
for (int i = level; i >= 0; i--) {
|
||
SkipListNode<T> node = header.next[i];
|
||
System.out.print("Level " + i + ": ");
|
||
while (node != null) {
|
||
System.out.print(node.value + " ");
|
||
node = node.next[i];
|
||
}
|
||
System.out.println();
|
||
}
|
||
}
|
||
|
||
// 中序遍历
|
||
public void traverse() {
|
||
SkipListNode<T> current = header.next[0];
|
||
while (current != null) {
|
||
System.out.print(current.value + " ");
|
||
current = current.next[0];
|
||
}
|
||
System.out.println();
|
||
}
|
||
}
|
||
```
|
||
|
||
### 完整的实现(包括范围查询)
|
||
|
||
```java
|
||
// 跳表完整实现
|
||
public class EnhancedSkipList<T extends Comparable<T>> {
|
||
private static final int MAX_LEVEL = 16;
|
||
private static final double PROBABILITY = 0.5;
|
||
|
||
private static class Node<T> {
|
||
T value;
|
||
Node<T>[] next;
|
||
|
||
@SuppressWarnings("unchecked")
|
||
public Node(T value, int level) {
|
||
this.value = value;
|
||
this.next = new Node[level + 1];
|
||
}
|
||
}
|
||
|
||
private Node<T> header;
|
||
private int level;
|
||
private int size;
|
||
private Random random;
|
||
|
||
public EnhancedSkipList() {
|
||
this.header = new Node<>(null, MAX_LEVEL);
|
||
this.level = 0;
|
||
this.size = 0;
|
||
this.random = new Random();
|
||
}
|
||
|
||
// 范围查询
|
||
public List<T> rangeQuery(T start, T end) {
|
||
List<T> result = new ArrayList<>();
|
||
if (start.compareTo(end) > 0) {
|
||
return result;
|
||
}
|
||
|
||
Node<T> current = header;
|
||
|
||
// 找到 start 的位置
|
||
for (int i = level; i >= 0; i--) {
|
||
while (current.next[i] != null &&
|
||
current.next[i].value.compareTo(start) < 0) {
|
||
current = current.next[i];
|
||
}
|
||
}
|
||
|
||
current = current.next[0];
|
||
while (current != null && current.value.compareTo(end) <= 0) {
|
||
result.add(current.value);
|
||
current = current.next[0];
|
||
}
|
||
|
||
return result;
|
||
}
|
||
|
||
// 获取前驱节点
|
||
public T predecessor(T value) {
|
||
Node<T> current = header;
|
||
T predecessor = null;
|
||
|
||
for (int i = level; i >= 0; i--) {
|
||
while (current.next[i] != null &&
|
||
current.next[i].value.compareTo(value) < 0) {
|
||
current = current.next[i];
|
||
if (i == 0) {
|
||
predecessor = current.value;
|
||
}
|
||
}
|
||
}
|
||
|
||
return predecessor;
|
||
}
|
||
|
||
// 获取后继节点
|
||
public T successor(T value) {
|
||
if (!contains(value)) {
|
||
return null;
|
||
}
|
||
|
||
Node<T> current = header;
|
||
|
||
for (int i = level; i >= 0; i--) {
|
||
while (current.next[i] != null &&
|
||
current.next[i].value.compareTo(value) <= 0) {
|
||
current = current.next[i];
|
||
}
|
||
}
|
||
|
||
current = current.next[0];
|
||
return current != null ? current.value : null;
|
||
}
|
||
|
||
// 统计小于某值的元素个数
|
||
public int countLessThan(T value) {
|
||
Node<T> current = header;
|
||
int count = 0;
|
||
|
||
for (int i = level; i >= 0; i--) {
|
||
while (current.next[i] != null &&
|
||
current.next[i].value.compareTo(value) < 0) {
|
||
count += Math.pow(2, i); // 近似计算
|
||
current = current.next[i];
|
||
}
|
||
}
|
||
|
||
return count;
|
||
}
|
||
|
||
// 获取统计信息
|
||
public SkipListStats getStats() {
|
||
SkipListStats stats = new SkipListStats();
|
||
stats.size = size;
|
||
stats.height = level + 1;
|
||
|
||
// 计算每个层级的节点数
|
||
int[] levelCounts = new int[MAX_LEVEL + 1];
|
||
Node<T> current = header;
|
||
|
||
for (int i = 0; i <= level; i++) {
|
||
levelCounts[i] = 0;
|
||
}
|
||
|
||
current = header.next[0];
|
||
while (current != null) {
|
||
for (int i = 0; i <= level && i <= current.next.length - 1; i++) {
|
||
levelCounts[i]++;
|
||
}
|
||
current = current.next[0];
|
||
}
|
||
|
||
stats.levelCounts = levelCounts;
|
||
return stats;
|
||
}
|
||
|
||
public static class SkipListStats {
|
||
public int size;
|
||
public int height;
|
||
public int[] levelCounts;
|
||
}
|
||
}
|
||
```
|
||
|
||
## 时间复杂度分析
|
||
|
||
### 操作时间复杂度
|
||
|
||
| 操作 | 时间复杂度 | 说明 |
|
||
|------|------------|------|
|
||
| 查找 | O(log n) | 最多遍历 log n 层 |
|
||
| 插入 | O(log n) | 查找位置 + 随机决定层级 |
|
||
| 删除 | O(log n) | 查找节点 + 从所有层级删除 |
|
||
| 范围查询 | O(log n + k) | k 是结果集大小 |
|
||
| 最值查找 | O(1) | 直接访问首尾节点 |
|
||
|
||
### 空间复杂度
|
||
|
||
- O(n log n) - 每个节点平均存在 log n 层
|
||
- 需要额外空间维护多层级指针
|
||
|
||
### 概率分析
|
||
|
||
1. **期望层级**:对于 n 个元素,期望层级为 log n
|
||
2. **期望指针数量**:每个节点期望有 2 个指针
|
||
3. **查找效率**:O(log n) 概率保证
|
||
|
||
## 实际应用场景
|
||
|
||
### 1. Redis 有序集合
|
||
- **zset 实现**:Redis 的有序集合使用跳表实现
|
||
- **范围查询**:支持高效的区间查询
|
||
- **分数排序**:基于分数进行排序
|
||
|
||
```java
|
||
// Redis 有序集合模拟
|
||
public class RedisSortedSet {
|
||
private EnhancedSkipList<Double, String> skipList;
|
||
|
||
public void add(double score, String member) {
|
||
skipList.insert(score, member);
|
||
}
|
||
|
||
public List<String> rangeByScore(double min, double max) {
|
||
return skipList.rangeQuery(min, max);
|
||
}
|
||
|
||
public boolean contains(String member) {
|
||
return skipList.contains(member);
|
||
}
|
||
}
|
||
```
|
||
|
||
### 2. 数据库索引
|
||
- **内存索引**:用于内存数据库的索引
|
||
- **范围查询**:支持高效的范围查找
|
||
- **插入性能**:比 B 树实现简单
|
||
|
||
```java
|
||
// 数据库索引实现
|
||
public class DatabaseIndex {
|
||
private EnhancedSkipList<Object, Row> index;
|
||
|
||
public void insert(Object key, Row row) {
|
||
index.insert(key, row);
|
||
}
|
||
|
||
public List<Row> rangeQuery(Object start, Object end) {
|
||
return index.rangeQuery(start, end);
|
||
}
|
||
|
||
public Row find(Object key) {
|
||
return index.find(key);
|
||
}
|
||
}
|
||
```
|
||
|
||
### 3. 网络路由
|
||
- **路由表**:IP 地址范围查找
|
||
- **ACL 控制**:访问控制列表匹配
|
||
|
||
```java
|
||
// 路由表实现
|
||
public class RoutingTable {
|
||
private EnhancedSkipList<String, Route> routes;
|
||
|
||
public Route findRoute(String ip) {
|
||
return routes.find(ip);
|
||
}
|
||
|
||
public List<Route> findRoutesInSubnet(String subnet) {
|
||
return routes.rangeQuery(subnet, subnet + "255");
|
||
}
|
||
}
|
||
```
|
||
|
||
### 4. 缓存系统
|
||
- **多级缓存**:实现分层缓存
|
||
- **缓存查找**:快速定位缓存项
|
||
|
||
```java
|
||
// 多级缓存实现
|
||
public class MultiLevelCache {
|
||
private EnhancedSkipList<String, Object> l1Cache;
|
||
private EnhancedSkipList<String, Object> l2Cache;
|
||
|
||
public Object get(String key) {
|
||
Object value = l1Cache.find(key);
|
||
if (value != null) {
|
||
return value;
|
||
}
|
||
value = l2Cache.find(key);
|
||
if (value != null) {
|
||
l1Cache.insert(key, value);
|
||
}
|
||
return value;
|
||
}
|
||
}
|
||
```
|
||
|
||
## 与其他数据结构的对比
|
||
|
||
| 数据结构 | 查找时间 | 插入时间 | 删除时间 | 适用场景 |
|
||
|----------|----------|----------|----------|----------|
|
||
| 跳表 | O(log n) | O(log n) | O(log n) | 内存数据、范围查询 |
|
||
| 平衡二叉树 | O(log n) | O(log n) | O(log n) | 内存数据、查找密集 |
|
||
| 哈希表 | O(1) | O(1) | O(1) | 精确查找、内存数据 |
|
||
| B 树 | O(log n) | O(log n) | O(log n) | 磁盘存储、索引 |
|
||
| 数组 | O(n) | O(n) | O(n) | 小规模、有序数据 |
|
||
|
||
### 跳表的优势
|
||
|
||
1. **实现简单**:相比平衡二叉树,实现更简单
|
||
2. **并发友好**:部分操作可以并发执行
|
||
3. **内存效率**:空间使用比平衡树更合理
|
||
4. **概率平衡**:不需要复杂的旋转操作
|
||
5. **支持范围查询**:链表结构天然支持范围操作
|
||
|
||
### 跳表的劣势
|
||
|
||
1. **内存使用**:比普通链表使用更多内存
|
||
2. **最坏情况**:概率性数据结构,最坏情况较差
|
||
3. **常数因子**:比平衡树的常数因子大
|
||
|
||
## 常见面试问题
|
||
|
||
### Q1: 跳表和平衡二叉树有什么区别?
|
||
**答**:
|
||
**主要区别**:
|
||
1. **实现复杂度**:跳表实现简单,平衡树需要复杂的旋转操作
|
||
2. **内存使用**:跳表使用更多内存(多层级指针),平衡树内存使用更紧凑
|
||
3. **并发性能**:跳表某些操作更容易并发执行
|
||
4. **平衡机制**:跳表是概率性平衡,平衡树是确定性平衡
|
||
5. **查找性能**:平衡树常数因子更好,跳表略差
|
||
|
||
### Q2: 跳表的最大层级如何确定?为什么?
|
||
**答**:
|
||
最大层级的确定:
|
||
1. **经验值**:通常设置为 16-32,足够处理大多数情况
|
||
2. **概率保证**:对于 n 个元素,期望层级为 log n
|
||
3. **空间考虑**:最大层级决定最坏情况的内存使用
|
||
4. **性能平衡**:太高浪费内存,太低影响性能
|
||
|
||
### Q3: 跳表的时间复杂度如何证明?
|
||
**答**:
|
||
时间复杂度分析:
|
||
1. **查找分析**:每层需要遍历的节点数逐渐减少
|
||
2. **几何级数**:每层节点数呈几何级数减少
|
||
3. **期望层数**:每个节点的期望层数为 2
|
||
4. **总查找步数**:期望查找步数为 O(log n)
|
||
|
||
### Q4: Redis 为什么选择跳表实现有序集合?
|
||
**答**:
|
||
选择跳表的原因:
|
||
1. **实现简单**:相比平衡树更容易实现
|
||
2. **内存效率**:相比平衡树内存使用更合理
|
||
3. **并发性能**:跳表某些操作可以并发执行
|
||
4. **范围查询**:天然支持范围查询操作
|
||
5. **性能足够**:对于大多数场景性能足够好
|
||
|
||
### Q5: 跳表如何保证查找效率?
|
||
**答**:
|
||
保证效率的关键:
|
||
1. **层级设计**:高层级作为低层级的索引
|
||
2. **概率平衡**:通过随机算法保证树的平衡性
|
||
3. **快速定位**:从高层级快速定位到大致位置
|
||
4. **层数控制**:每个节点只存在于适当数量的层级中
|
||
|
||
### Q6: 如何优化跳表的内存使用?
|
||
**答**:
|
||
内存优化策略:
|
||
1. **动态层级**:根据实际数据动态调整最大层级
|
||
2. **压缩层级**:合并过空的层级
|
||
3. **节点复用**:复用不再使用的节点
|
||
4. **缓存友好**:优化内存布局,提高缓存命中率
|
||
5. **惰性删除**:延迟删除,减少内存碎片
|
||
|
||
### Q7: 跳表在并发环境下如何处理?
|
||
**答**:
|
||
并发处理方案:
|
||
1. **细粒度锁**:对不同层使用不同的锁
|
||
2. **无锁设计**:使用 CAS 操作实现无锁跳表
|
||
3. **版本控制**:使用版本号实现乐观并发控制
|
||
4. **分段锁**:将跳表分段,每段独立加锁
|
||
5. **读无锁**:读操作不加锁,写操作加锁
|
||
|
||
```java
|
||
// 并发跳表简化实现
|
||
public class ConcurrentSkipList<T extends Comparable<T>> {
|
||
private final ReentrantLock[] locks;
|
||
private final int lockCount;
|
||
|
||
public ConcurrentSkipList() {
|
||
this.lockCount = 16;
|
||
this.locks = new ReentrantLock[lockCount];
|
||
for (int i = 0; i < lockCount; i++) {
|
||
locks[i] = new ReentrantLock();
|
||
}
|
||
}
|
||
|
||
private int getLockIndex(T value) {
|
||
return Math.abs(value.hashCode() % lockCount);
|
||
}
|
||
|
||
public void insert(T value) {
|
||
int lockIndex = getLockIndex(value);
|
||
locks[lockIndex].lock();
|
||
try {
|
||
// 插入逻辑
|
||
} finally {
|
||
locks[lockIndex].unlock();
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
### Q8: 跳表的性能如何随数据量变化?
|
||
**答**:
|
||
性能变化规律:
|
||
1. **时间复杂度**:保持 O(log n) 不变
|
||
2. **空间复杂度**:随数据量线性增长
|
||
3. **常数因子**:数据量越大,常数因子影响越小
|
||
4. **缓存影响**:数据量较大时,缓存命中率下降
|
||
5. **内存压力**:大数据量时内存使用成为瓶颈
|
||
|
||
### Q9: 如何处理跳表中的重复数据?
|
||
**答**:
|
||
重复数据处理:
|
||
1. **允许重复**:修改插入逻辑,允许相同值存在
|
||
2. **去重处理**:在插入时检查是否已存在
|
||
3. **多值节点**:在节点中存储值的集合
|
||
4. **计数器**:在节点中增加计数器
|
||
5. **策略选择**:根据业务需求选择合适的处理方式
|
||
|
||
### Q10: 跳表和 B 树有什么相似之处?
|
||
**答**:
|
||
相似之处:
|
||
1. **分层结构**:都是多层结构,高层级作为索引
|
||
2. **查找效率**:都是 O(log n) 时间复杂度
|
||
3. **范围查询**:都支持高效的范围查询
|
||
4. **平衡性**:都维护数据的平衡性
|
||
5. **空间局部性**:都考虑数据的局部性原理
|
||
|
||
**主要区别**:
|
||
- 跳表是基于链表的概率结构
|
||
- B 树是基于数组块的确定性结构
|
||
- 跳表适用于内存,B 树适用于磁盘 |