interview/questions/08-算法与数据结构/跳表原理.md

# 跳表 (Skip List)

## 数据结构原理

### 什么是跳表？
跳表是一种概率性的数据结构，通过在多个层级上维护有序的链表来提供高效的查找、插入和删除操作。它是一种在平衡二叉搜索树和链表之间的折中方案，实现简单且性能优异。

### 跳表的核心概念

1. **层级**：跳表由多个层级组成，最高层是稀疏的，最低层是稠密的
2. **节点**：每个节点在不同层级中有多个指针
3. **索引**：高层级作为低层级的索引，快速定位
4. **概率性平衡**：通过随机算法保证树的平衡性

### 跳表的工作原理

1. **查找**：从最高层开始，向右查找，无法向右时向下继续
2. **插入**：随机决定插入的层级，在相应层级插入节点
3. **删除**：在所有层级中删除对应节点
4. **平衡**：通过随机概率保持树的平衡性

## 图解说明

```
跳表结构示例（最大层级 4）：

Level 3:  ---1---10---40---70---
Level 2:  ---1-----10-----70---
Level 1:  ---1------10-----70---
Level 0:  1->2->3->4->5->6->7->8->9->10->11->...->70

查找过程（查找 7）：
- Level 3: 从 1 开始，无法向右，向下到 Level 2
- Level 2: 从 1 开始，无法向右，向下到 Level 1
- Level 1: 从 1 开始，无法向右，向下到 Level 0
- Level 0: 从 1 开始，遍历到 7
```

### 跂表节点结构

```
SkipListNode {
    value: 7
    next[0] -> 8
    next[1] -> 10
    next[2] -> 10
    next[3] -> 40
}
```

### 层级选择算法

```java
// 随机生成层级
int level = 0;
while (random() < 0.5 && level < MAX_LEVEL) {
    level++;
}
```

## Java 代码实现

### 节点类定义

```java
class SkipListNode<T extends Comparable<T>> {
    T value;
    SkipListNode<T>[] next;

    @SuppressWarnings("unchecked")
    public SkipListNode(T value, int level) {
        this.value = value;
        this.next = new SkipListNode[level + 1];
    }
}
```

### 跳表实现

```java
import java.util.Random;

public class SkipList<T extends Comparable<T>> {
    private static final int MAX_LEVEL = 16;
    private static final double PROBABILITY = 0.5;

    private SkipListNode<T> header;
    private int level;
    private int size;
    private Random random;

    @SuppressWarnings("unchecked")
    public SkipList() {
        this.header = new SkipListNode<>(null, MAX_LEVEL);
        this.level = 0;
        this.size = 0;
        this.random = new Random();
    }

    // 随机生成层级
    private int randomLevel() {
        int lvl = 0;
        while (lvl < MAX_LEVEL && random.nextDouble() < PROBABILITY) {
            lvl++;
        }
        return lvl;
    }

    // 插入操作
    public void insert(T value) {
        SkipListNode<T>[] update = new SkipListNode[MAX_LEVEL + 1];
        SkipListNode<T> current = header;

        // 从最高层开始查找插入位置
        for (int i = level; i >= 0; i--) {
            while (current.next[i] != null &&
                   current.next[i].value.compareTo(value) < 0) {
                current = current.next[i];
            }
            update[i] = current;
        }

        // 确定新节点的层级
        int newLevel = randomLevel();

        // 如果新层级大于当前层级，更新高层级的指针
        if (newLevel > level) {
            for (int i = level + 1; i <= newLevel; i++) {
                update[i] = header;
            }
            level = newLevel;
        }

        // 创建新节点并插入
        SkipListNode<T> newNode = new SkipListNode<>(value, newLevel);
        for (int i = 0; i <= newLevel; i++) {
            newNode.next[i] = update[i].next[i];
            update[i].next[i] = newNode;
        }

        size++;
    }

    // 查找操作
    public boolean contains(T value) {
        SkipListNode<T> current = header;

        for (int i = level; i >= 0; i--) {
            while (current.next[i] != null &&
                   current.next[i].value.compareTo(value) < 0) {
                current = current.next[i];
            }
        }

        current = current.next[0];
        return current != null && current.value.compareTo(value) == 0;
    }

    // 删除操作
    public void delete(T value) {
        SkipListNode<T>[] update = new SkipListNode[MAX_LEVEL + 1];
        SkipListNode<T> current = header;

        // 查找要删除的节点
        for (int i = level; i >= 0; i--) {
            while (current.next[i] != null &&
                   current.next[i].value.compareTo(value) < 0) {
                current = current.next[i];
            }
            update[i] = current;
        }

        current = current.next[0];
        if (current != null && current.value.compareTo(value) == 0) {
            // 从所有层级中删除
            for (int i = 0; i <= level; i++) {
                if (update[i].next[i] != current) {
                    break;
                }
                update[i].next[i] = current.next[i];
            }

            // 更新当前层级
            while (level > 0 && header.next[level] == null) {
                level--;
            }

            size--;
        }
    }

    // 获取最小值
    public T getMin() {
        SkipListNode<T> current = header.next[0];
        return current != null ? current.value : null;
    }

    // 获取最大值
    public T getMax() {
        SkipListNode<T> current = header;
        for (int i = level; i >= 0; i--) {
            while (current.next[i] != null) {
                current = current.next[i];
            }
        }
        return current.value;
    }

    // 跳表大小
    public int size() {
        return size;
    }

    // 是否为空
    public boolean isEmpty() {
        return size == 0;
    }

    // 打印跳表结构
    public void printList() {
        for (int i = level; i >= 0; i--) {
            SkipListNode<T> node = header.next[i];
            System.out.print("Level " + i + ": ");
            while (node != null) {
                System.out.print(node.value + " ");
                node = node.next[i];
            }
            System.out.println();
        }
    }

    // 中序遍历
    public void traverse() {
        SkipListNode<T> current = header.next[0];
        while (current != null) {
            System.out.print(current.value + " ");
            current = current.next[0];
        }
        System.out.println();
    }
}
```

### 完整的实现（包括范围查询）

```java
// 跳表完整实现
public class EnhancedSkipList<T extends Comparable<T>> {
    private static final int MAX_LEVEL = 16;
    private static final double PROBABILITY = 0.5;

    private static class Node<T> {
        T value;
        Node<T>[] next;

        @SuppressWarnings("unchecked")
        public Node(T value, int level) {
            this.value = value;
            this.next = new Node[level + 1];
        }
    }

    private Node<T> header;
    private int level;
    private int size;
    private Random random;

    public EnhancedSkipList() {
        this.header = new Node<>(null, MAX_LEVEL);
        this.level = 0;
        this.size = 0;
        this.random = new Random();
    }

    // 范围查询
    public List<T> rangeQuery(T start, T end) {
        List<T> result = new ArrayList<>();
        if (start.compareTo(end) > 0) {
            return result;
        }

        Node<T> current = header;

        // 找到 start 的位置
        for (int i = level; i >= 0; i--) {
            while (current.next[i] != null &&
                   current.next[i].value.compareTo(start) < 0) {
                current = current.next[i];
            }
        }

        current = current.next[0];
        while (current != null && current.value.compareTo(end) <= 0) {
            result.add(current.value);
            current = current.next[0];
        }

        return result;
    }

    // 获取前驱节点
    public T predecessor(T value) {
        Node<T> current = header;
        T predecessor = null;

        for (int i = level; i >= 0; i--) {
            while (current.next[i] != null &&
                   current.next[i].value.compareTo(value) < 0) {
                current = current.next[i];
                if (i == 0) {
                    predecessor = current.value;
                }
            }
        }

        return predecessor;
    }

    // 获取后继节点
    public T successor(T value) {
        if (!contains(value)) {
            return null;
        }

        Node<T> current = header;

        for (int i = level; i >= 0; i--) {
            while (current.next[i] != null &&
                   current.next[i].value.compareTo(value) <= 0) {
                current = current.next[i];
            }
        }

        current = current.next[0];
        return current != null ? current.value : null;
    }

    // 统计小于某值的元素个数
    public int countLessThan(T value) {
        Node<T> current = header;
        int count = 0;

        for (int i = level; i >= 0; i--) {
            while (current.next[i] != null &&
                   current.next[i].value.compareTo(value) < 0) {
                count += Math.pow(2, i); // 近似计算
                current = current.next[i];
            }
        }

        return count;
    }

    // 获取统计信息
    public SkipListStats getStats() {
        SkipListStats stats = new SkipListStats();
        stats.size = size;
        stats.height = level + 1;

        // 计算每个层级的节点数
        int[] levelCounts = new int[MAX_LEVEL + 1];
        Node<T> current = header;

        for (int i = 0; i <= level; i++) {
            levelCounts[i] = 0;
        }

        current = header.next[0];
        while (current != null) {
            for (int i = 0; i <= level && i <= current.next.length - 1; i++) {
                levelCounts[i]++;
            }
            current = current.next[0];
        }

        stats.levelCounts = levelCounts;
        return stats;
    }

    public static class SkipListStats {
        public int size;
        public int height;
        public int[] levelCounts;
    }
}
```

## 时间复杂度分析

### 操作时间复杂度

| 操作 | 时间复杂度 | 说明 |
|------|------------|------|
| 查找 | O(log n) | 最多遍历 log n 层 |
| 插入 | O(log n) | 查找位置 + 随机决定层级 |
| 删除 | O(log n) | 查找节点 + 从所有层级删除 |
| 范围查询 | O(log n + k) | k 是结果集大小 |
| 最值查找 | O(1) | 直接访问首尾节点 |

### 空间复杂度

- O(n log n) - 每个节点平均存在 log n 层
- 需要额外空间维护多层级指针

### 概率分析

1. **期望层级**：对于 n 个元素，期望层级为 log n
2. **期望指针数量**：每个节点期望有 2 个指针
3. **查找效率**：O(log n) 概率保证

## 实际应用场景

### 1. Redis 有序集合
- **zset 实现**：Redis 的有序集合使用跳表实现
- **范围查询**：支持高效的区间查询
- **分数排序**：基于分数进行排序

```java
// Redis 有序集合模拟
public class RedisSortedSet {
    private EnhancedSkipList<Double, String> skipList;

    public void add(double score, String member) {
        skipList.insert(score, member);
    }

    public List<String> rangeByScore(double min, double max) {
        return skipList.rangeQuery(min, max);
    }

    public boolean contains(String member) {
        return skipList.contains(member);
    }
}
```

### 2. 数据库索引
- **内存索引**：用于内存数据库的索引
- **范围查询**：支持高效的范围查找
- **插入性能**：比 B 树实现简单

```java
// 数据库索引实现
public class DatabaseIndex {
    private EnhancedSkipList<Object, Row> index;

    public void insert(Object key, Row row) {
        index.insert(key, row);
    }

    public List<Row> rangeQuery(Object start, Object end) {
        return index.rangeQuery(start, end);
    }

    public Row find(Object key) {
        return index.find(key);
    }
}
```

### 3. 网络路由
- **路由表**：IP 地址范围查找
- **ACL 控制**：访问控制列表匹配

```java
// 路由表实现
public class RoutingTable {
    private EnhancedSkipList<String, Route> routes;

    public Route findRoute(String ip) {
        return routes.find(ip);
    }

    public List<Route> findRoutesInSubnet(String subnet) {
        return routes.rangeQuery(subnet, subnet + "255");
    }
}
```

### 4. 缓存系统
- **多级缓存**：实现分层缓存
- **缓存查找**：快速定位缓存项

```java
// 多级缓存实现
public class MultiLevelCache {
    private EnhancedSkipList<String, Object> l1Cache;
    private EnhancedSkipList<String, Object> l2Cache;

    public Object get(String key) {
        Object value = l1Cache.find(key);
        if (value != null) {
            return value;
        }
        value = l2Cache.find(key);
        if (value != null) {
            l1Cache.insert(key, value);
        }
        return value;
    }
}
```

## 与其他数据结构的对比

| 数据结构 | 查找时间 | 插入时间 | 删除时间 | 适用场景 |
|----------|----------|----------|----------|----------|
| 跳表 | O(log n) | O(log n) | O(log n) | 内存数据、范围查询 |
| 平衡二叉树 | O(log n) | O(log n) | O(log n) | 内存数据、查找密集 |
| 哈希表 | O(1) | O(1) | O(1) | 精确查找、内存数据 |
| B 树 | O(log n) | O(log n) | O(log n) | 磁盘存储、索引 |
| 数组 | O(n) | O(n) | O(n) | 小规模、有序数据 |

### 跳表的优势

1. **实现简单**：相比平衡二叉树，实现更简单
2. **并发友好**：部分操作可以并发执行
3. **内存效率**：空间使用比平衡树更合理
4. **概率平衡**：不需要复杂的旋转操作
5. **支持范围查询**：链表结构天然支持范围操作

### 跳表的劣势

1. **内存使用**：比普通链表使用更多内存
2. **最坏情况**：概率性数据结构，最坏情况较差
3. **常数因子**：比平衡树的常数因子大

## 常见面试问题

### Q1: 跳表和平衡二叉树有什么区别？
**答**：
**主要区别**：
1. **实现复杂度**：跳表实现简单，平衡树需要复杂的旋转操作
2. **内存使用**：跳表使用更多内存（多层级指针），平衡树内存使用更紧凑
3. **并发性能**：跳表某些操作更容易并发执行
4. **平衡机制**：跳表是概率性平衡，平衡树是确定性平衡
5. **查找性能**：平衡树常数因子更好，跳表略差

### Q2: 跳表的最大层级如何确定？为什么？
**答**：
最大层级的确定：
1. **经验值**：通常设置为 16-32，足够处理大多数情况
2. **概率保证**：对于 n 个元素，期望层级为 log n
3. **空间考虑**：最大层级决定最坏情况的内存使用
4. **性能平衡**：太高浪费内存，太低影响性能

### Q3: 跳表的时间复杂度如何证明？
**答**：
时间复杂度分析：
1. **查找分析**：每层需要遍历的节点数逐渐减少
2. **几何级数**：每层节点数呈几何级数减少
3. **期望层数**：每个节点的期望层数为 2
4. **总查找步数**：期望查找步数为 O(log n)

### Q4: Redis 为什么选择跳表实现有序集合？
**答**：
选择跳表的原因：
1. **实现简单**：相比平衡树更容易实现
2. **内存效率**：相比平衡树内存使用更合理
3. **并发性能**：跳表某些操作可以并发执行
4. **范围查询**：天然支持范围查询操作
5. **性能足够**：对于大多数场景性能足够好

### Q5: 跳表如何保证查找效率？
**答**：
保证效率的关键：
1. **层级设计**：高层级作为低层级的索引
2. **概率平衡**：通过随机算法保证树的平衡性
3. **快速定位**：从高层级快速定位到大致位置
4. **层数控制**：每个节点只存在于适当数量的层级中

### Q6: 如何优化跳表的内存使用？
**答**：
内存优化策略：
1. **动态层级**：根据实际数据动态调整最大层级
2. **压缩层级**：合并过空的层级
3. **节点复用**：复用不再使用的节点
4. **缓存友好**：优化内存布局，提高缓存命中率
5. **惰性删除**：延迟删除，减少内存碎片

### Q7: 跳表在并发环境下如何处理？
**答**：
并发处理方案：
1. **细粒度锁**：对不同层使用不同的锁
2. **无锁设计**：使用 CAS 操作实现无锁跳表
3. **版本控制**：使用版本号实现乐观并发控制
4. **分段锁**：将跳表分段，每段独立加锁
5. **读无锁**：读操作不加锁，写操作加锁

```java
// 并发跳表简化实现
public class ConcurrentSkipList<T extends Comparable<T>> {
    private final ReentrantLock[] locks;
    private final int lockCount;

    public ConcurrentSkipList() {
        this.lockCount = 16;
        this.locks = new ReentrantLock[lockCount];
        for (int i = 0; i < lockCount; i++) {
            locks[i] = new ReentrantLock();
        }
    }

    private int getLockIndex(T value) {
        return Math.abs(value.hashCode() % lockCount);
    }

    public void insert(T value) {
        int lockIndex = getLockIndex(value);
        locks[lockIndex].lock();
        try {
            // 插入逻辑
        } finally {
            locks[lockIndex].unlock();
        }
    }
}
```

### Q8: 跳表的性能如何随数据量变化？
**答**：
性能变化规律：
1. **时间复杂度**：保持 O(log n) 不变
2. **空间复杂度**：随数据量线性增长
3. **常数因子**：数据量越大，常数因子影响越小
4. **缓存影响**：数据量较大时，缓存命中率下降
5. **内存压力**：大数据量时内存使用成为瓶颈

### Q9: 如何处理跳表中的重复数据？
**答**：
重复数据处理：
1. **允许重复**：修改插入逻辑，允许相同值存在
2. **去重处理**：在插入时检查是否已存在
3. **多值节点**：在节点中存储值的集合
4. **计数器**：在节点中增加计数器
5. **策略选择**：根据业务需求选择合适的处理方式

### Q10: 跳表和 B 树有什么相似之处？
**答**：
相似之处：
1. **分层结构**：都是多层结构，高层级作为索引
2. **查找效率**：都是 O(log n) 时间复杂度
3. **范围查询**：都支持高效的范围查询
4. **平衡性**：都维护数据的平衡性
5. **空间局部性**：都考虑数据的局部性原理

**主要区别**：
- 跳表是基于链表的概率结构
- B 树是基于数组块的确定性结构
- 跳表适用于内存，B 树适用于磁盘