interview/16-LeetCode Hot 100/最长连续序列.md

# 最长连续序列 (Longest Consecutive Sequence)

LeetCode 128. Medium

## 题目描述

给定一个未排序的整数数组 `nums`，找出数字连续序列的最长长度。

**要求**：请设计时间复杂度为 O(n) 的算法。

**示例 1**：
```
输入：nums = [100, 4, 200, 1, 3, 2]
输出：4
解释：最长数字连续序列是 [1, 2, 3, 4]。它的长度为 4。
```

**示例 2**：
```
输入：nums = [0, 3, 7, 2, 5, 8, 4, 6, 0, 1]
输出：9
解释：最长的连续序列是 [0, 1, 2, 3, 4, 5, 6, 7, 8]。
```

## 思路推导

### 暴力解法分析

**最直观的思路**：排序后遍历，找到最长的连续序列。

```python
def longestConsecutive(nums):
    if not nums:
        return 0

    nums.sort()
    max_len = 1
    current_len = 1

    for i in range(1, len(nums)):
        if nums[i] == nums[i-1] + 1:
            current_len += 1
        elif nums[i] != nums[i-1]:  # 跳过重复
            current_len = 1

        max_len = max(max_len, current_len)

    return max_len
```

**时间复杂度**：O(n log n)
- 排序：O(n log n)
- 遍历：O(n)
- 总计：O(n log n) + O(n) = O(n log n)

**空间复杂度**：O(1) 或 O(n)，取决于排序算法

**问题分析**：
1. 不满足题目要求：题目要求 O(n)
2. 排序是最快的，但仍不够快
3. 需要寻找不排序的解法

### 优化思考 - 第一步：哈希表查找

**观察**：连续序列的特点是相邻元素相差 1

**问题**：如何快速判断一个数是否存在？

**解决方案**：使用哈希表（Set）

```python
num_set = set(nums)

for num in nums:
    # 检查 num+1 是否在集合中
    if num + 1 in num_set:
        # 继续检查 num+2, num+3, ...
```

**为什么这样思考？**
- 哈希表查找：O(1)
- 可以快速判断一个数是否存在
- 不需要排序

### 优化思考 - 第二步：寻找序列起点

**关键优化**：如何避免重复计算同一个序列？

**观察**：只有当一个数是序列的起点时，才需要计算

```python
# num 是序列起点
if num - 1 not in num_set:
    # 从 num 开始向后查找
    current_num = num
    current_len = 1

    while current_num + 1 in num_set:
        current_num += 1
        current_len += 1
```

**为什么这样思考？**
- 如果 `num-1` 存在，说明 `num` 不是起点
- 只有起点才需要计算，避免重复
- 每个序列只被计算一次

**时间复杂度**：O(n)
- 外层循环：O(n)
- 内层 while：总计 O(n)（每个元素只访问一次）
- 总计：O(n) + O(n) = O(n)

### 优化思考 - 第三步：空间换时间

**权衡**：
- 时间复杂度：O(n)
- 空间复杂度：O(n)
- 用空间换取时间

**为什么可以接受？**
- 题目要求 O(n) 时间
- O(n) 空间是可接受的
- 哈希表是实现 O(n) 的必要条件

## 解题思路

### 核心思想

**哈希表 + 序列起点判断**：用哈希表存储所有数字，只从序列起点开始计算长度。

**为什么这样思考？**

1. **哈希表的优势**：
   - O(1) 时间查找元素是否存在
   - 无需排序，保持原始数据

2. **序列起点判断**：
   - 如果 `num-1` 不在集合中，`num` 是起点
   - 只有起点才需要计算
   - 避免重复计算同一个序列

3. **时间复杂度保证**：
   - 每个元素最多被访问 2 次
   - 一次在哈希表中
   - 一次在 while 循环中

### 详细算法流程

**步骤1：构建哈希表**

```python
num_set = set(nums)
```

**作用**：
- 快速判断元素是否存在
- O(1) 时间复杂度

**步骤2：遍历所有数字**

```python
longest = 0

for num in num_set:
    # 判断是否为序列起点
    if num - 1 not in num_set:
        # 从起点开始计算序列长度
        current_num = num
        current_len = 1

        # 向后查找连续数字
        while current_num + 1 in num_set:
            current_num += 1
            current_len += 1

        # 更新最大长度
        longest = max(longest, current_len)
```

**关键点详解**：

1. **为什么判断 `num - 1 not in num_set`？**
   - 如果 `num-1` 存在，说明 `num` 不是起点
   - 只有起点才需要计算
   - 避免重复计算

   **示例**：
   ```
   nums = [1, 2, 3, 4]

   num=1: 1-1=0 不在集合中 → 起点，计算 [1,2,3,4]
   num=2: 2-1=1 在集合中 → 不是起点，跳过
   num=3: 3-1=2 在集合中 → 不是起点，跳过
   num=4: 4-1=3 在集合中 → 不是起点，跳过
   ```

2. **为什么用 `while` 而不是 `for`？**
   - 不知道序列有多长
   - 需要动态判断下一个数字是否存在
   - `while` 更灵活

3. **为什么可以保证 O(n)？**
   - 外层 for 循环：O(n)
   - 内层 while 循环：总计 O(n)
     - 每个元素只在 while 中被访问一次
     - 因为只有起点才会进入 while
   - 总计：O(n) + O(n) = O(n)

### 关键细节说明

**细节1：为什么用 `set` 而不是 `list`？**

```python
# 推荐：使用 set
num_set = set(nums)
if num - 1 in num_set:  # O(1)

# 不推荐：使用 list
if num - 1 in nums:  # O(n)
```

**原因**：
- `set` 的查找是 O(1)
- `list` 的查找是 O(n)
- 总复杂度会变成 O(n²)

**细节2：为什么遍历 `num_set` 而不是 `nums`？**

```python
# 推荐：遍历 num_set
for num in num_set:  # 自动去重

# 不推荐：遍历 nums
for num in nums:  # 可能有重复
```

**原因**：
- `nums` 可能有重复元素
- 重复元素会导致重复计算
- `num_set` 自动去重

**细节3：为什么需要 `longest` 变量？**

```python
longest = 0

for num in num_set:
    current_len = ...
    longest = max(longest, current_len)
```

**原因**：
- 需要记录全局最大值
- 每次计算完一个序列后更新
- 最终返回 `longest`

### 边界条件分析

**边界1：空数组**

```
输入：nums = []
输出：0
处理：
  num_set = set()
  for 循环不执行
  longest = 0
```

**边界2：单个元素**

```
输入：nums = [1]
输出：1
过程：
  num_set = {1}

  num=1: 1-1=0 不在集合中 → 起点
    current_num=1, current_len=1
    1+1=2 不在集合中 → 退出
    longest = max(0, 1) = 1

输出：1
```

**边界3：全部重复**

```
输入：nums = [1, 1, 1, 1]
输出：1
过程：
  num_set = {1}

  num=1: 1-1=0 不在集合中 → 起点
    current_num=1, current_len=1
    1+1=2 不在集合中 → 退出
    longest = 1

输出：1
```

**边界4：多个连续序列**

```
输入：nums = [100, 4, 200, 1, 3, 2]
输出：4
过程：
  num_set = {100, 4, 200, 1, 3, 2}

  num=100: 100-1=99 不在集合中 → 起点
    current_num=100, current_len=1
    101 不在集合中 → 退出
    longest = 1

  num=4: 4-1=3 在集合中 → 不是起点，跳过

  num=200: 200-1=199 不在集合中 → 起点
    current_num=200, current_len=1
    201 不在集合中 → 退出
    longest = max(1, 1) = 1

  num=1: 1-1=0 不在集合中 → 起点
    current_num=1, current_len=1
    2 在集合中 → current_len=2
    3 在集合中 → current_len=3
    4 在集合中 → current_len=4
    5 不在集合中 → 退出
    longest = max(1, 4) = 4

  num=3: 3-1=2 在集合中 → 不是起点，跳过

  num=2: 2-1=1 在集合中 → 不是起点，跳过

输出：4
```

**边界5：负数**

```
输入：nums = [-1, -2, 0, 1]
输出：4
过程：
  num_set = {-1, -2, 0, 1}

  num=-1: -1-1=-2 在集合中 → 不是起点，跳过

  num=-2: -2-1=-3 不在集合中 → 起点
    current_num=-2, current_len=1
    -1 在集合中 → current_len=2
    0 在集合中 → current_len=3
    1 在集合中 → current_len=4
    2 不在集合中 → 退出
    longest = 4

  num=0: 0-1=-1 在集合中 → 不是起点，跳过

  num=1: 1-1=0 在集合中 → 不是起点，跳过

输出：4
```

### 复杂度分析（详细版）

**时间复杂度**：
```
- 构建哈希表：O(n)
- 外层循环：O(n)，遍历所有元素
- 内层 while：总计 O(n)
  - 每个元素只在 while 中被访问一次
  - 因为只有起点才会进入 while
- 总计：O(n) + O(n) + O(n) = O(n)

为什么是 O(n)？
- 虽然有嵌套循环，但每个元素最多被访问 2 次
- 一次在哈希表中
- 一次在 while 循环中
- 总操作次数 = 2n = O(n)
```

**空间复杂度**：
```
- 哈希表：O(n)，存储所有元素
- 变量：O(1)
- 总计：O(n)
```

---

## 图解过程

```
nums = [100, 4, 200, 1, 3, 2]

构建哈希表：
num_set = {100, 4, 200, 1, 3, 2}

遍历：

步骤1: num = 100
       100-1=99 不在集合中 → 起点
       序列：[100]
       101 不在集合中 → 退出
       longest = 1

步骤2: num = 4
       4-1=3 在集合中 → 不是起点，跳过

步骤3: num = 200
       200-1=199 不在集合中 → 起点
       序列：[200]
       201 不在集合中 → 退出
       longest = 1

步骤4: num = 1
       1-1=0 不在集合中 → 起点
       序列：[1, 2, 3, 4]
       5 不在集合中 → 退出
       longest = 4

步骤5: num = 3
       3-1=2 在集合中 → 不是起点，跳过

步骤6: num = 2
       2-1=1 在集合中 → 不是起点，跳过

结果：longest = 4
```

---

## 代码实现

```go
func longestConsecutive(nums []int) int {
    // 构建哈希表
    numSet := make(map[int]bool)
    for _, num := range nums {
        numSet[num] = true
    }

    longest := 0

    // 遍历所有数字
    for num := range numSet {
        // 判断是否为序列起点
        if !numSet[num-1] {
            currentNum := num
            current := 1

            // 向后查找连续数字
            for numSet[currentNum+1] {
                currentNum++
                current++
            }

            // 更新最大长度
            if current > longest {
                longest = current
            }
        }
    }

    return longest
}
```

**关键点**：
1. 使用 map 实现 Set
2. 判断 `num-1` 是否存在
3. 只有起点才计算序列长度

---

## 执行过程演示

**输入**：nums = [100, 4, 200, 1, 3, 2]

```
初始化：numSet = {}, longest = 0

步骤1：构建哈希表
numSet = {
  100: true,
  4: true,
  200: true,
  1: true,
  3: true,
  2: true
}

步骤2：遍历哈希表

num=100:
  100-1=99 不在 numSet 中 → 起点
  currentNum=100, current=1
  101 不在 numSet 中 → 退出
  longest = max(0, 1) = 1

num=4:
  4-1=3 在 numSet 中 → 不是起点，跳过

num=200:
  200-1=199 不在 numSet 中 → 起点
  currentNum=200, current=1
  201 不在 numSet 中 → 退出
  longest = max(1, 1) = 1

num=1:
  1-1=0 不在 numSet 中 → 起点
  currentNum=1, current=1
  2 在 numSet 中 → currentNum=2, current=2
  3 在 numSet 中 → currentNum=3, current=3
  4 在 numSet 中 → currentNum=4, current=4
  5 不在 numSet 中 → 退出
  longest = max(1, 4) = 4

num=3:
  3-1=2 在 numSet 中 → 不是起点，跳过

num=2:
  2-1=1 在 numSet 中 → 不是起点，跳过

结果：longest = 4
```

---

## 常见错误

### 错误1：忘记去重

❌ **错误代码**：
```go
// 直接遍历 nums，可能有重复
for _, num := range nums {
    // ...
}
```

✅ **正确代码**：
```go
// 先构建 Set，自动去重
numSet := make(map[int]bool)
for _, num := range nums {
    numSet[num] = true
}

for num := range numSet {
    // ...
}
```

**原因**：
- `nums` 可能有重复元素
- 重复元素会导致重复计算
- 影响时间复杂度

---

### 错误2：没有判断序列起点

❌ **错误代码**：
```go
// 对每个数字都计算序列长度
for num := range numSet {
    current := 1
    for numSet[currentNum+1] {
        // ...
    }
}
```

✅ **正确代码**：
```go
// 只对起点计算序列长度
for num := range numSet {
    if !numSet[num-1] {  // 判断是否为起点
        // ...
    }
}
```

**原因**：
- 没有判断起点会重复计算
- 时间复杂度会变成 O(n²)
- 示例：[1,2,3,4] 会计算 4 次

---

### 错误3：使用 list 而不是 set

❌ **错误代码**：
```go
// 使用 list 查找
if contains(nums, num-1) {  // O(n)
    // ...
}
```

✅ **正确代码**：
```go
// 使用 set 查找
if numSet[num-1] {  // O(1)
    // ...
}
```

**原因**：
- `list` 查找是 O(n)
- `set` 查找是 O(1)
- 总复杂度会变成 O(n²)

---

## 进阶问题

### Q1: 如果需要返回最长序列本身？

```go
func longestConsecutiveSequence(nums []int) []int {
    numSet := make(map[int]bool)
    for _, num := range nums {
        numSet[num] = true
    }

    longest := 0
    start := 0  // 记录序列起点

    for num := range numSet {
        if !numSet[num-1] {
            currentNum := num
            current := 1

            for numSet[currentNum+1] {
                currentNum++
                current++
            }

            if current > longest {
                longest = current
                start = num
            }
        }
    }

    // 构建结果
    result := make([]int, longest)
    for i := 0; i < longest; i++ {
        result[i] = start + i
    }

    return result
}
```

---

### Q2: 如果数据量很大，如何优化内存？

**思路**：使用布隆过滤器（Bloom Filter）

```go
// 布隆过滤器可以节省内存，但有误判率
// 适用于大数据场景
```

**注意**：
- 布隆过滤器有误判率
- 需要根据场景调整参数
- 适合对准确性要求不高的场景

---

## P7 加分项

### 深度理解
- **哈希表的作用**：O(1) 查找，实现 O(n) 时间复杂度
- **序列起点判断**：避免重复计算，保证 O(n) 时间
- **空间换时间**：用 O(n) 空间换取 O(n) 时间

### 实战扩展
- **大数据场景**：分布式计算、分片处理
- **内存优化**：布隆过滤器、位图
- **业务场景**：用户活跃度分析、时间窗口统计

### 变形题目
1. 最长连续序列（允许重复）
2. 最长等差序列
3. 最长递增子序列

---

## 总结

**核心要点**：
1. **哈希表**：O(1) 查找，快速判断元素是否存在
2. **序列起点判断**：避免重复计算，保证 O(n) 时间
3. **空间换时间**：用 O(n) 空间换取 O(n) 时间

**易错点**：
- 忘记去重
- 没有判断序列起点
- 使用 list 而不是 set

**最优解法**：哈希表 + 序列起点判断，时间 O(n)，空间 O(n)