Organized 50 interview questions into 12 categories: - 01-分布式系统 (9 files): 分布式事务, 分布式锁, 一致性哈希, CAP理论, etc. - 02-数据库 (2 files): MySQL索引优化, MyBatis核心原理 - 03-缓存 (5 files): Redis数据结构, 缓存问题, LRU算法, etc. - 04-消息队列 (1 file): RocketMQ/Kafka - 05-并发编程 (4 files): 线程池, 设计模式, 限流策略, etc. - 06-JVM (1 file): JVM和垃圾回收 - 07-系统设计 (8 files): 秒杀系统, 短链接, IM, Feed流, etc. - 08-算法与数据结构 (4 files): B+树, 红黑树, 跳表, 时间轮 - 09-网络与安全 (3 files): TCP/IP, 加密安全, 性能优化 - 10-中间件 (4 files): Spring Boot, Nacos, Dubbo, Nginx - 11-运维 (4 files): Kubernetes, CI/CD, Docker, 可观测性 - 12-面试技巧 (1 file): 面试技巧和职业规划 All files renamed to Chinese for better accessibility and organized into categorized folders for easier navigation. Generated with [Claude Code](https://claude.com/claude-code) via [Happy](https://happy.engineering) Co-Authored-By: Claude <noreply@anthropic.com> Co-Authored-By: Happy <yesreply@happy.engineering>
1092 lines
25 KiB
Markdown
1092 lines
25 KiB
Markdown
# MySQL 索引优化
|
||
|
||
## 问题
|
||
|
||
**背景**:MySQL 索引是后端面试的必考题,也是实际工作中最常遇到的性能优化点。
|
||
|
||
**问题**:
|
||
1. 聚簇索引和非聚簇索引的区别是什么?
|
||
2. 什么是回表?如何通过覆盖索引避免回表?
|
||
3. 什么是最左前缀原则?为什么 `(a, b, c)` 联合索引无法支持 `WHERE b = 1`?
|
||
4. 如何分析 SQL 是否使用了索引?有哪些常见的索引失效场景?
|
||
5. 在实际项目中,你是如何设计索引的?有没有优化过慢 SQL?
|
||
|
||
---
|
||
|
||
## 标准答案
|
||
|
||
### 1. 聚簇索引 vs 非聚簇索引
|
||
|
||
#### **存储引擎差异**
|
||
|
||
```
|
||
InnoDB:
|
||
- 有聚簇索引
|
||
- 数据文件本身就是索引文件(.ibd)
|
||
- 一个表只能有一个聚簇索引(主键索引)
|
||
|
||
MyISAM:
|
||
- 没有聚簇索引
|
||
- 索引文件(.MYI) 和数据文件(.MYD) 分离
|
||
- 所有索引都是非聚簇索引
|
||
```
|
||
|
||
---
|
||
|
||
#### **聚簇索引 (Clustered Index)**
|
||
|
||
**定义**:
|
||
索引结构的叶子节点**直接存储了整行数据**。
|
||
|
||
**特点**:
|
||
- InnoDB 的**主键索引**就是聚簇索引
|
||
- 数据行的物理顺序与索引的逻辑顺序一致
|
||
- **一个表只能有一个聚簇索引**(因为数据只能按一种顺序存储)
|
||
|
||
**图解**:
|
||
```
|
||
聚簇索引 B+ 树结构:
|
||
|
||
[主键 100]
|
||
/ \
|
||
[主键 50] [主键 150]
|
||
/ \ / \
|
||
[主键 25] [主键 75] [主键 125] [主键 175]
|
||
↓ ↓ ↓ ↓
|
||
整行数据 整行数据 整行数据 整行数据
|
||
(id=25) (id=75) (id=125) (id=175)
|
||
```
|
||
|
||
**优点**:
|
||
- 范围查询效率高(数据在物理上连续)
|
||
- 主键查询速度最快(一次索引查找即可)
|
||
|
||
**缺点**:
|
||
- 主键更新代价大(需要移动数据行)
|
||
- 插入速度依赖主键顺序(UUID 会导致大量页分裂)
|
||
|
||
---
|
||
|
||
#### **非聚簇索引 (Secondary Index / 二级索引)**
|
||
|
||
**定义**:
|
||
叶子节点存储的是**主键值**,而不是整行数据。
|
||
|
||
**特点**:
|
||
- InnoDB 的**普通索引**(非主键索引)都是非聚簇索引
|
||
- 需要回表才能获取完整数据
|
||
- 一个表可以有多个非聚簇索引
|
||
|
||
**图解**:
|
||
```
|
||
二级索引 B+ 树结构:
|
||
|
||
[name '张三']
|
||
/ \
|
||
[name '李四'] [name '王五']
|
||
/ \ / \
|
||
[name 'A'] [name 'C'] [name 'D'] [name 'E']
|
||
↓ ↓ ↓ ↓
|
||
主键: 10 主键: 30 主键: 50 主键: 70
|
||
(需要回表) (需要回表) (需要回表) (需要回表)
|
||
```
|
||
|
||
---
|
||
|
||
#### **MyISAM 的索引结构**
|
||
|
||
```
|
||
MyISAM 主键索引(非聚簇):
|
||
|
||
[主键 100]
|
||
/ \
|
||
[主键 50] [主键 150]
|
||
/ \ / \
|
||
[主键 25] [主键 75] [主键 125] [主键 175]
|
||
↓ ↓ ↓ ↓
|
||
数据文件地址指针 (指向 .MYD 文件的物理位置)
|
||
```
|
||
|
||
**对比总结**:
|
||
|
||
| 特性 | InnoDB 聚簇索引 | InnoDB 二级索引 | MyISAM 索引 |
|
||
|------|----------------|----------------|-------------|
|
||
| 叶子节点存储 | 整行数据 | 主键值 | 数据文件地址指针 |
|
||
| 回表 | 不需要 | 需要 | 不需要(直接指针) |
|
||
| 数量 | 1 个 | 多个 | 多个 |
|
||
| 主键查询 | 最快 | 需要回表 | 快 |
|
||
|
||
---
|
||
|
||
### 2. 回表与覆盖索引
|
||
|
||
#### **什么是回表?**
|
||
|
||
**定义**:
|
||
通过二级索引找到主键值后,再回到聚簇索引(主键索引)中查找完整数据的过程。
|
||
|
||
**示例**:
|
||
```sql
|
||
-- 表结构
|
||
CREATE TABLE users (
|
||
id INT PRIMARY KEY, -- 聚簇索引
|
||
name VARCHAR(50), -- 二级索引 idx_name
|
||
age INT,
|
||
email VARCHAR(100)
|
||
);
|
||
|
||
-- 二级索引:idx_name
|
||
CREATE INDEX idx_name ON users(name);
|
||
|
||
-- 查询语句
|
||
SELECT * FROM users WHERE name = '张三';
|
||
```
|
||
|
||
**执行流程**(有回表):
|
||
```
|
||
1. 在 idx_name 索引树中查找 name = '张三'
|
||
└─ 找到主键 id = 100
|
||
|
||
2. 在聚簇索引(主键索引)中查找 id = 100
|
||
└─ 获取整行数据(id, name, age, email)
|
||
|
||
3. 返回结果
|
||
|
||
两次索引查找 = 回表
|
||
```
|
||
|
||
**性能问题**:
|
||
- 每次回表都是一次额外的磁盘 I/O
|
||
- 大量回表会导致性能下降
|
||
|
||
---
|
||
|
||
#### **什么是覆盖索引?**
|
||
|
||
**定义**:
|
||
一个索引包含了查询所需的所有字段,**无需回表**即可返回结果。
|
||
|
||
**示例**:
|
||
```sql
|
||
-- 使用覆盖索引优化
|
||
SELECT id, name FROM users WHERE name = '张三';
|
||
```
|
||
|
||
**执行流程**(无回表):
|
||
```
|
||
1. 在 idx_name 索引树中查找 name = '张三'
|
||
└─ 找到主键 id = 100 和 name = '张三'
|
||
|
||
2. 直接返回 id 和 name
|
||
|
||
一次索引查找 = 无回表 ✅
|
||
```
|
||
|
||
**原因**:
|
||
- `idx_name` 索引已经包含了 `name` 和 `id`(InnoDB 二级索引自动包含主键)
|
||
- 查询只需要 `id` 和 `name`,无需回表
|
||
|
||
---
|
||
|
||
#### **联合索引实现覆盖索引**
|
||
|
||
```sql
|
||
-- 创建联合索引
|
||
CREATE INDEX idx_name_age ON users(name, age);
|
||
|
||
-- 查询 1:使用覆盖索引(无回表)
|
||
SELECT id, name, age FROM users WHERE name = '张三' AND age = 25;
|
||
|
||
-- 查询 2:无法使用覆盖索引(需要回表)
|
||
SELECT id, name, age, email FROM users WHERE name = '张三' AND age = 25;
|
||
-- email 不在 idx_name_age 索引中,需要回表
|
||
```
|
||
|
||
**联合索引结构**:
|
||
```
|
||
idx_name_age 索引 B+ 树:
|
||
|
||
[('张三', 25)]
|
||
/ \
|
||
[('李四', 20)] [('王五', 30)]
|
||
/ \ / \
|
||
[('A', 18)] [('C', 22)] [('D', 28)] [('E', 35)]
|
||
↓ ↓ ↓ ↓
|
||
主键: 10 主键: 30 主键: 50 主键: 70
|
||
(name, age) 都在索引中
|
||
```
|
||
|
||
---
|
||
|
||
#### **覆盖索引的优势**
|
||
|
||
1. **避免回表**:减少磁盘 I/O
|
||
2. **减少随机 I/O**:索引扫描是顺序 I/O,回表是随机 I/O
|
||
3. **提升性能**:通常能提升 50% - 90% 的查询性能
|
||
|
||
**EXPLAIN 验证**:
|
||
```sql
|
||
EXPLAIN SELECT id, name FROM users WHERE name = '张三';
|
||
```
|
||
|
||
**输出**:
|
||
```
|
||
+----+-------------+-------+------+---------------+----------+---------+-------+------+-------------+
|
||
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
|
||
+----+-------------+-------+------+---------------+----------+---------+-------+------+-------------+
|
||
| 1 | SIMPLE | users | ref | idx_name | idx_name | 153 | const | 1 | Using index |
|
||
+----+-------------+-------+------+---------------+----------+---------+-------+------+-------------+
|
||
↑
|
||
Extra: Using index = 使用了覆盖索引
|
||
```
|
||
|
||
---
|
||
|
||
### 3. 最左前缀原则
|
||
|
||
#### **定义**
|
||
|
||
对于联合索引 `(a, b, c)`,查询必须从**最左侧**的列开始,才能使用索引。
|
||
|
||
**示例索引**:
|
||
```sql
|
||
CREATE INDEX idx_abc ON users(a, b, c);
|
||
```
|
||
|
||
---
|
||
|
||
#### **支持索引的查询**
|
||
|
||
```sql
|
||
-- ✅ 使用完整索引
|
||
WHERE a = 1 AND b = 2 AND c = 3
|
||
|
||
-- ✅ 使用索引前两列
|
||
WHERE a = 1 AND b = 2
|
||
|
||
-- ✅ 使用索引第一列
|
||
WHERE a = 1
|
||
|
||
-- ✅ 使用索引第一列 + 范围查询
|
||
WHERE a = 1 AND b > 2
|
||
|
||
-- ✅ 范围查询第一列
|
||
WHERE a > 1 AND a < 10
|
||
|
||
-- ✅ 只使用第一列排序
|
||
ORDER BY a
|
||
|
||
-- ✅ 使用前两列排序
|
||
ORDER BY a, b
|
||
|
||
-- ✅ 覆盖索引优化
|
||
SELECT a, b, c FROM users WHERE a = 1;
|
||
```
|
||
|
||
---
|
||
|
||
#### **无法使用索引的查询**
|
||
|
||
```sql
|
||
-- ❌ 跳过第一列,直接查询第二列
|
||
WHERE b = 2;
|
||
|
||
-- ❌ 跳过前两列,直接查询第三列
|
||
WHERE c = 3;
|
||
|
||
-- ❌ 跳过第二列
|
||
WHERE a = 1 AND c = 3;
|
||
|
||
-- ❌ 无法使用索引排序
|
||
ORDER BY b, c;
|
||
|
||
-- ❌ 范围查询后无法使用后续列
|
||
WHERE a = 1 AND b > 2 AND c = 3;
|
||
-- 只有 a 和 b 能使用索引,c 无法使用
|
||
```
|
||
|
||
---
|
||
|
||
#### **为什么无法支持 `WHERE b = 1`?**
|
||
|
||
**联合索引的排序方式**:
|
||
```
|
||
索引 idx_abc 按照 (a, b, c) 的顺序排序:
|
||
|
||
a=1: (1,1,1), (1,1,2), (1,2,1), (1,2,2), (1,3,1)
|
||
a=2: (2,1,1), (2,1,2), (2,2,1), (2,2,2), (2,3,1)
|
||
a=3: (3,1,1), (3,1,2), (3,2,1), (3,2,2), (3,3,1)
|
||
```
|
||
|
||
**查询 `WHERE b = 1`**:
|
||
- b 的值是**乱序**的(1,1,2,2,3,1,1,2,2,3...)
|
||
- 无法利用索引的有序性进行二分查找
|
||
- 只能全表扫描
|
||
|
||
**图解**:
|
||
```
|
||
idx_abc 索引:
|
||
|
||
Row 1: a=1, b=1, c=1
|
||
Row 2: a=1, b=1, c=2
|
||
Row 3: a=1, b=2, c=1 ← b 跳变
|
||
Row 4: a=1, b=2, c=2
|
||
Row 5: a=1, b=3, c=1 ← b 跳变
|
||
Row 6: a=2, b=1, c=1 ← a 跳变,b 回到 1
|
||
Row 7: a=2, b=1, c=2
|
||
...
|
||
|
||
查询 WHERE b = 1:
|
||
- 无法定位起始位置(b 的值不连续)
|
||
- 必须扫描所有行
|
||
```
|
||
|
||
---
|
||
|
||
#### **最左前缀原则的底层原理**
|
||
|
||
**B+ 树的比较逻辑**:
|
||
```sql
|
||
-- 索引 idx_abc 的比较顺序
|
||
(a1, b1, c1) < (a2, b2, c2)
|
||
等价于:
|
||
a1 < a2 OR (a1 = a2 AND b1 < b2) OR (a1 = a2 AND b1 = b2 AND c1 < c2)
|
||
```
|
||
|
||
**查询 `WHERE b = 2`**:
|
||
- 第一列 a 没有提供,无法确定索引的起始位置
|
||
- MySQL 优化器无法利用索引
|
||
|
||
**查询 `WHERE a = 1 AND c = 3`**:
|
||
- 可以利用 a = 1 定位到索引的起始位置
|
||
- 但 c 无法使用索引(因为 b 被跳过)
|
||
|
||
---
|
||
|
||
#### **实际优化案例**
|
||
|
||
**表结构**:
|
||
```sql
|
||
CREATE TABLE orders (
|
||
id INT PRIMARY KEY,
|
||
user_id INT,
|
||
status INT,
|
||
amount DECIMAL(10,2),
|
||
created_at DATETIME,
|
||
KEY idx_user_status_time (user_id, status, created_at)
|
||
);
|
||
```
|
||
|
||
**查询优化**:
|
||
|
||
```sql
|
||
-- ❌ 无法使用索引
|
||
SELECT * FROM orders WHERE status = 1 AND created_at > '2024-01-01';
|
||
|
||
-- ✅ 使用索引
|
||
SELECT * FROM orders WHERE user_id = 100 AND status = 1;
|
||
|
||
-- ✅ 部分使用索引(user_id)
|
||
SELECT * FROM orders WHERE user_id = 100 AND created_at > '2024-01-01';
|
||
```
|
||
|
||
**索引设计建议**:
|
||
1. 将区分度高的列放在前面
|
||
2. 将常用于查询条件的列放在前面
|
||
3. 将范围查询列放在最后
|
||
|
||
---
|
||
|
||
### 4. SQL 索引分析
|
||
|
||
#### **EXPLAIN 命令**
|
||
|
||
**基本用法**:
|
||
```sql
|
||
EXPLAIN SELECT * FROM users WHERE name = '张三';
|
||
```
|
||
|
||
**输出字段详解**:
|
||
|
||
```
|
||
+----+-------------+-------+------+---------------+----------+---------+-------+------+-------------+
|
||
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
|
||
+----+-------------+-------+------+---------------+----------+---------+-------+------+-------------+
|
||
```
|
||
|
||
**核心字段**:
|
||
|
||
| 字段 | 说明 | 好的值 |
|
||
|------|------|--------|
|
||
| **type** | 访问类型 | const > eq_ref > ref > range > index > ALL |
|
||
| **key** | 实际使用的索引 | 显示索引名称 |
|
||
| **key_len** | 使用的索引长度 | 越长越好(使用更多列) |
|
||
| **rows** | 预估扫描行数 | 越少越好 |
|
||
| **Extra** | 额外信息 | Using index(覆盖索引) |
|
||
|
||
---
|
||
|
||
#### **type 字段详解(从好到坏)**
|
||
|
||
```sql
|
||
-- 1. const:主键或唯一索引等值查询
|
||
EXPLAIN SELECT * FROM users WHERE id = 100;
|
||
-- type = const(最优,最多返回一行)
|
||
|
||
-- 2. eq_ref:连接时使用主键或唯一索引
|
||
EXPLAIN SELECT * FROM users u JOIN orders o ON u.id = o.user_id;
|
||
-- type = eq_ref(每个表只读取一行)
|
||
|
||
-- 3. ref:非唯一索引等值查询
|
||
EXPLAIN SELECT * FROM users WHERE name = '张三';
|
||
-- type = ref(使用二级索引)
|
||
|
||
-- 4. range:范围查询
|
||
EXPLAIN SELECT * FROM users WHERE id > 100 AND id < 200;
|
||
-- type = range(索引范围扫描)
|
||
|
||
-- 5. index:索引全扫描
|
||
EXPLAIN SELECT id FROM users;
|
||
-- type = index(遍历索引树)
|
||
|
||
-- 6. ALL:全表扫描(最差)
|
||
EXPLAIN SELECT * FROM users WHERE age = 25;
|
||
-- type = ALL(没有使用索引,需要优化)
|
||
```
|
||
|
||
**性能对比**:
|
||
```
|
||
const > eq_ref > ref > range > index > ALL
|
||
↑ ↑
|
||
最快 最慢(需要优化)
|
||
```
|
||
|
||
---
|
||
|
||
#### **Extra 字段详解**
|
||
|
||
| Extra 值 | 说明 | 建议 |
|
||
|----------|------|------|
|
||
| **Using index** | 使用覆盖索引,无需回表 | ✅ 优秀 |
|
||
| **Using where** | 使用 WHERE 过滤 | ⚠️ 可能需要优化 |
|
||
| **Using filesort** | 文件排序(内存或磁盘) | ❌ 需要优化 |
|
||
| **Using temporary** | 使用临时表 | ❌ 需要优化 |
|
||
| **Using index condition** | 索引条件下推 | ✅ 较好 |
|
||
| **Using join buffer** | 使用连接缓冲 | ⚠️ 可能需要优化 |
|
||
|
||
---
|
||
|
||
#### **常见索引失效场景**
|
||
|
||
##### **场景 1:使用函数**
|
||
|
||
```sql
|
||
-- ❌ 索引失效
|
||
SELECT * FROM users WHERE YEAR(created_at) = 2024;
|
||
|
||
-- ✅ 使用索引
|
||
SELECT * FROM users WHERE created_at >= '2024-01-01' AND created_at < '2025-01-01';
|
||
```
|
||
|
||
**原因**:
|
||
- 对索引列使用函数后,索引存储的值被改变
|
||
- 无法利用索引的有序性
|
||
|
||
---
|
||
|
||
##### **场景 2:隐式类型转换**
|
||
|
||
```sql
|
||
-- 表结构:phone VARCHAR(20)
|
||
-- 索引:idx_phone
|
||
|
||
-- ❌ 索引失效(字符串字段使用数字查询)
|
||
SELECT * FROM users WHERE phone = 13800138000;
|
||
|
||
-- ✅ 使用索引
|
||
SELECT * FROM users WHERE phone = '13800138000';
|
||
```
|
||
|
||
**原因**:
|
||
- MySQL 隐式将 phone 转换为数字(相当于使用函数)
|
||
- 索引失效
|
||
|
||
---
|
||
|
||
##### **场景 3:前缀模糊查询**
|
||
|
||
```sql
|
||
-- ❌ 索引失效
|
||
SELECT * FROM users WHERE name LIKE '%张三';
|
||
|
||
-- ❌ 索引失效
|
||
SELECT * FROM users WHERE name LIKE '%张三%';
|
||
|
||
-- ✅ 使用索引
|
||
SELECT * FROM users WHERE name LIKE '张三%';
|
||
```
|
||
|
||
**原因**:
|
||
- `%` 在开头无法利用索引的有序性
|
||
- 只能全表扫描
|
||
|
||
**优化方案**:
|
||
- 使用全文索引(FULLTEXT)
|
||
- 使用 Elasticsearch
|
||
|
||
---
|
||
|
||
##### **场景 4:OR 连接非索引列**
|
||
|
||
```sql
|
||
-- ❌ 索引失效(age 没有索引)
|
||
SELECT * FROM users WHERE name = '张三' OR age = 25;
|
||
|
||
-- ✅ 使用索引(两个条件都有索引)
|
||
SELECT * FROM users WHERE name = '张三' OR email = 'test@example.com';
|
||
|
||
-- ✅ 改写为 UNION
|
||
SELECT * FROM users WHERE name = '张三'
|
||
UNION
|
||
SELECT * FROM users WHERE age = 25;
|
||
```
|
||
|
||
---
|
||
|
||
##### **场景 5:不等于(!= 或 <>)**
|
||
|
||
```sql
|
||
-- ❌ 索引失效
|
||
SELECT * FROM users WHERE status != 1;
|
||
|
||
-- ✅ 使用索引
|
||
SELECT * FROM users WHERE status IN (2, 3, 4);
|
||
```
|
||
|
||
**原因**:
|
||
- 不等于无法利用索引的有序性
|
||
- 范围太广,优化器选择全表扫描
|
||
|
||
---
|
||
|
||
##### **场景 6:IS NULL 或 IS NOT NULL**
|
||
|
||
```sql
|
||
-- ❌ 索引可能失效(取决于 MySQL 版本和数据分布)
|
||
SELECT * FROM users WHERE name IS NULL;
|
||
|
||
-- ✅ 解决方案:设置默认值
|
||
ALTER TABLE users MODIFY name VARCHAR(50) NOT NULL DEFAULT '';
|
||
SELECT * FROM users WHERE name = '';
|
||
```
|
||
|
||
---
|
||
|
||
##### **场景 7:负向查询**
|
||
|
||
```sql
|
||
-- ❌ 索引失效
|
||
SELECT * FROM users WHERE NOT (status = 1);
|
||
|
||
-- ✅ 使用索引
|
||
SELECT * FROM users WHERE status IN (2, 3, 4);
|
||
```
|
||
|
||
---
|
||
|
||
##### **场景 8:排序优化**
|
||
|
||
```sql
|
||
-- 索引:idx_name_age (name, age)
|
||
|
||
-- ✅ 使用索引排序
|
||
SELECT * FROM users WHERE name = '张三' ORDER BY age;
|
||
|
||
-- ❌ 索引失效 + 文件排序
|
||
SELECT * FROM users WHERE age > 25 ORDER BY name;
|
||
-- 原因:跳过了 name,无法使用索引排序
|
||
|
||
-- ❌ 索引失效 + 文件排序
|
||
SELECT * FROM users WHERE name = '张三' ORDER BY age, email;
|
||
-- 原因:email 不在索引中
|
||
```
|
||
|
||
---
|
||
|
||
#### **慢查询分析工具**
|
||
|
||
##### **1. 慢查询日志**
|
||
|
||
```sql
|
||
-- 开启慢查询日志
|
||
SET GLOBAL slow_query_log = ON;
|
||
SET GLOBAL long_query_time = 2; -- 超过 2 秒记录
|
||
SET GLOBAL log_queries_not_using_indexes = ON; -- 记录未使用索引的查询
|
||
|
||
-- 查看慢查询日志位置
|
||
SHOW VARIABLES LIKE 'slow_query_log_file';
|
||
-- 输出:/var/lib/mysql/mysql-slow.log
|
||
```
|
||
|
||
**日志示例**:
|
||
```
|
||
# Time: 2024-02-28T10:30:00.123456Z
|
||
# User@Host: app[app] @ [192.168.1.100]
|
||
# Query_time: 5.234567 Lock_time: 0.000123 Rows_sent: 10 Rows_examined: 1000000
|
||
SET timestamp=1709100600;
|
||
SELECT * FROM orders WHERE user_id = 100;
|
||
```
|
||
|
||
---
|
||
|
||
##### **2. mysqldumpslow 分析工具**
|
||
|
||
```bash
|
||
# 查看最慢的 10 条 SQL
|
||
mysqldumpslow -s t -t 10 /var/lib/mysql/mysql-slow.log
|
||
|
||
# 查看访问次数最多的 10 条 SQL
|
||
mysqldumpslow -s c -t 10 /var/lib/mysql/mysql-slow.log
|
||
|
||
# 查看返回行数最多的 10 条 SQL
|
||
mysqldumpslow -s r -t 10 /var/lib/mysql/mysql-slow.log
|
||
```
|
||
|
||
**输出示例**:
|
||
```
|
||
Count: 150 Time=5.23s (784s) Lock=0.00s (0s) Rows=1000.0 (150000), app[app]@2hosts
|
||
SELECT * FROM orders WHERE user_id = N
|
||
```
|
||
|
||
---
|
||
|
||
### 5. 索引设计与实战优化
|
||
|
||
#### **索引设计原则**
|
||
|
||
##### **1. 选择合适的列建立索引**
|
||
|
||
```sql
|
||
-- ✅ 适合建立索引的列:
|
||
-- 1. 经常用于 WHERE、JOIN、ORDER BY 的列
|
||
-- 2. 区分度高的列(基数大)
|
||
CREATE INDEX idx_email ON users(email); -- 区分度高,适合索引
|
||
CREATE INDEX idx_gender ON users(gender); -- 区分度低(只有男/女),不适合索引
|
||
|
||
-- 3. 索引的选择性(唯一值数 / 总行数)
|
||
-- 计算选择性:
|
||
SELECT COUNT(DISTINCT email) / COUNT(*) FROM users; -- 接近 1,适合索引
|
||
SELECT COUNT(DISTINCT gender) / COUNT(*) FROM users; -- 接近 0.5,不适合索引
|
||
```
|
||
|
||
---
|
||
|
||
##### **2. 联合索引的列顺序**
|
||
|
||
**原则**:
|
||
1. **区分度高的列放在前面**
|
||
2. **常用于查询条件的列放在前面**
|
||
3. **范围查询列放在最后**
|
||
|
||
**示例**:
|
||
```sql
|
||
-- 表结构
|
||
CREATE TABLE orders (
|
||
id INT PRIMARY KEY,
|
||
user_id INT,
|
||
status INT, -- 1:待支付, 2:已支付, 3:已完成
|
||
amount DECIMAL(10,2),
|
||
created_at DATETIME
|
||
);
|
||
|
||
-- 分析选择性
|
||
SELECT
|
||
COUNT(DISTINCT user_id) / COUNT(*) AS user_id_selectivity,
|
||
COUNT(DISTINCT status) / COUNT(*) AS status_selectivity,
|
||
COUNT(DISTINCT DATE(created_at)) / COUNT(*) AS date_selectivity
|
||
FROM orders;
|
||
|
||
-- 输出:
|
||
-- user_id_selectivity: 0.9(高)
|
||
-- status_selectivity: 0.01(低)
|
||
-- date_selectivity: 0.3(中)
|
||
|
||
-- 索引设计
|
||
CREATE INDEX idx_user_status_date ON orders(user_id, status, created_at);
|
||
-- ↑ ↑ ↑
|
||
-- 高 低 中
|
||
-- user_id 优先,status 次之,date 最后
|
||
```
|
||
|
||
**常见查询优化**:
|
||
```sql
|
||
-- 查询 1:用户待支付订单
|
||
-- ✅ 使用索引
|
||
SELECT * FROM orders WHERE user_id = 100 AND status = 1;
|
||
|
||
-- 查询 2:用户某时间段订单
|
||
-- ✅ 部分使用索引(user_id)
|
||
SELECT * FROM orders WHERE user_id = 100 AND created_at > '2024-01-01';
|
||
```
|
||
|
||
---
|
||
|
||
##### **3. 覆盖索引优化**
|
||
|
||
```sql
|
||
-- 表结构
|
||
CREATE TABLE users (
|
||
id INT PRIMARY KEY,
|
||
name VARCHAR(50),
|
||
age INT,
|
||
email VARCHAR(100),
|
||
created_at DATETIME
|
||
);
|
||
|
||
-- 需求:频繁查询用户列表(只需要 id, name, age)
|
||
-- ❌ 原始 SQL(需要回表)
|
||
SELECT id, name, age FROM users WHERE age > 18;
|
||
|
||
-- ✅ 创建覆盖索引
|
||
CREATE INDEX idx_age_name ON users(age, name);
|
||
-- 索引已包含 age, name, id(id 自动包含)
|
||
-- 无需回表,性能提升 50% - 90%
|
||
```
|
||
|
||
---
|
||
|
||
##### **4. 前缀索引优化**
|
||
|
||
**场景**:
|
||
- 对长字符串(VARCHAR、TEXT)建立索引
|
||
- 索引占用空间过大,影响性能
|
||
|
||
**示例**:
|
||
```sql
|
||
-- 表结构
|
||
CREATE TABLE articles (
|
||
id INT PRIMARY KEY,
|
||
title VARCHAR(200),
|
||
content TEXT,
|
||
created_at DATETIME
|
||
);
|
||
|
||
-- ❌ 全长索引(占用空间大)
|
||
CREATE INDEX idx_title ON articles(title);
|
||
|
||
-- ✅ 前缀索引(只索引前 20 个字符)
|
||
CREATE INDEX idx_title_prefix ON articles(title(20));
|
||
|
||
-- 计算合适的前缀长度
|
||
SELECT
|
||
COUNT(DISTINCT LEFT(title, 10)) / COUNT(*) AS prefix_10,
|
||
COUNT(DISTINCT LEFT(title, 20)) / COUNT(*) AS prefix_20,
|
||
COUNT(DISTINCT LEFT(title, 30)) / COUNT(*) AS prefix_30
|
||
FROM articles;
|
||
|
||
-- 输出:
|
||
-- prefix_10: 0.75
|
||
-- prefix_20: 0.95 ← 足够高,选择 20
|
||
-- prefix_30: 0.98
|
||
```
|
||
|
||
**注意事项**:
|
||
- 前缀索引无法用于 ORDER BY 和 GROUP BY
|
||
- 前缀索引无法用于覆盖索引
|
||
|
||
---
|
||
|
||
##### **5. 避免冗余索引**
|
||
|
||
```sql
|
||
-- ❌ 冗余索引
|
||
CREATE INDEX idx_a ON users(a);
|
||
CREATE INDEX idx_a_b ON users(a, b);
|
||
-- idx_a 是冗余的(idx_a_b 已经包含 a)
|
||
|
||
-- ✅ 优化后
|
||
CREATE INDEX idx_a_b ON users(a, b);
|
||
-- 删除 idx_a
|
||
DROP INDEX idx_a ON users;
|
||
|
||
-- ❌ 冗余索引
|
||
CREATE INDEX idx_a ON users(a);
|
||
CREATE INDEX idx_a_id ON users(a, id);
|
||
-- InnoDB 二级索引自动包含主键,idx_a_id 是冗余的
|
||
```
|
||
|
||
---
|
||
|
||
#### **实战优化案例**
|
||
|
||
##### **案例 1:分页查询优化**
|
||
|
||
**问题 SQL**:
|
||
```sql
|
||
-- 查询第 100 万页的数据
|
||
SELECT * FROM orders ORDER BY id LIMIT 1000000, 20;
|
||
-- 执行时间:30 秒
|
||
```
|
||
|
||
**问题分析**:
|
||
- MySQL 需要扫描 1000000 + 20 行,然后抛弃前 1000000 行
|
||
- 大量无效扫描
|
||
|
||
**优化方案 1:延迟关联**
|
||
```sql
|
||
-- ✅ 优化后(0.5 秒)
|
||
SELECT o.* FROM orders o
|
||
JOIN (SELECT id FROM orders ORDER BY id LIMIT 1000000, 20) tmp
|
||
ON o.id = tmp.id;
|
||
-- 原理:先在覆盖索引中快速定位 ID,再回表查询完整数据
|
||
```
|
||
|
||
**优化方案 2:使用上次最大 ID**
|
||
```sql
|
||
-- ✅ 优化后(0.01 秒)
|
||
SELECT * FROM orders WHERE id > 1000000 ORDER BY id LIMIT 20;
|
||
-- 原理:利用主键索引直接定位起始位置
|
||
```
|
||
|
||
---
|
||
|
||
##### **案例 2:JOIN 优化**
|
||
|
||
**问题 SQL**:
|
||
```sql
|
||
-- 执行时间:10 秒
|
||
EXPLAIN SELECT *
|
||
FROM orders o
|
||
JOIN users u ON o.user_id = u.id
|
||
WHERE o.status = 1;
|
||
```
|
||
|
||
**EXPLAIN 输出**:
|
||
```
|
||
type: ALL(全表扫描)
|
||
rows: 1000000
|
||
```
|
||
|
||
**优化方案**:
|
||
```sql
|
||
-- 1. 在 join 列上建立索引
|
||
CREATE INDEX idx_user_id ON orders(user_id);
|
||
|
||
-- 2. 在过滤列上建立索引
|
||
CREATE INDEX idx_status ON orders(status);
|
||
|
||
-- 3. 优化后(0.2 秒)
|
||
EXPLAIN SELECT *
|
||
FROM orders o
|
||
JOIN users u ON o.user_id = u.id
|
||
WHERE o.status = 1;
|
||
|
||
-- 输出:
|
||
-- type: ref
|
||
-- key: idx_user_id, idx_status
|
||
-- rows: 100
|
||
```
|
||
|
||
---
|
||
|
||
##### **案例 3:COUNT 优化**
|
||
|
||
**问题 SQL**:
|
||
```sql
|
||
-- 执行时间:5 秒
|
||
SELECT COUNT(*) FROM orders WHERE status = 1;
|
||
```
|
||
|
||
**优化方案 1:使用索引**
|
||
```sql
|
||
-- ✅ 创建索引(0.05 秒)
|
||
CREATE INDEX idx_status ON orders(status);
|
||
SELECT COUNT(*) FROM orders WHERE status = 1;
|
||
```
|
||
|
||
**优化方案 2:使用覆盖索引**
|
||
```sql
|
||
-- ✅ 创建覆盖索引(0.02 秒)
|
||
CREATE INDEX idx_status_id ON orders(status, id);
|
||
SELECT COUNT(*) FROM orders WHERE status = 1;
|
||
-- 只需要扫描索引,无需回表
|
||
```
|
||
|
||
**优化方案 3:使用计数表**
|
||
```sql
|
||
-- ✅ 使用计数表(0.001 秒)
|
||
CREATE TABLE order_stats (
|
||
id INT PRIMARY KEY,
|
||
status INT,
|
||
count INT,
|
||
updated_at DATETIME
|
||
);
|
||
|
||
-- 定时更新计数
|
||
INSERT INTO order_stats (id, status, count, updated_at)
|
||
VALUES (1, 1, (SELECT COUNT(*) FROM orders WHERE status = 1), NOW())
|
||
ON DUPLICATE KEY UPDATE count = VALUES(count), updated_at = NOW();
|
||
|
||
-- 查询时直接读取计数
|
||
SELECT count FROM order_stats WHERE status = 1;
|
||
```
|
||
|
||
---
|
||
|
||
##### **案例 4:索引下推优化(ICP)**
|
||
|
||
**MySQL 5.6+ 的索引条件下推优化**
|
||
|
||
```sql
|
||
-- 表结构
|
||
CREATE TABLE users (
|
||
id INT PRIMARY KEY,
|
||
name VARCHAR(50),
|
||
age INT,
|
||
address VARCHAR(200),
|
||
KEY idx_name_age (name, age)
|
||
);
|
||
|
||
-- 查询
|
||
SELECT * FROM users WHERE name LIKE '张%' AND age > 25;
|
||
```
|
||
|
||
**无 ICP(MySQL 5.6 之前)**:
|
||
```
|
||
1. 使用索引定位 name LIKE '张%' 的所有记录
|
||
2. 回表查询完整数据
|
||
3. 过滤 age > 25
|
||
```
|
||
|
||
**有 ICP(MySQL 5.6+)**:
|
||
```
|
||
1. 使用索引定位 name LIKE '张%'
|
||
2. 在索引中直接过滤 age > 25(无需回表)
|
||
3. 只对符合条件的记录回表
|
||
```
|
||
|
||
**验证 ICP**:
|
||
```sql
|
||
EXPLAIN SELECT * FROM users WHERE name LIKE '张%' AND age > 25;
|
||
|
||
-- 输出:
|
||
-- Extra: Using index condition
|
||
-- ↑ 表示使用了索引条件下推
|
||
```
|
||
|
||
---
|
||
|
||
#### **索引维护**
|
||
|
||
##### **1. 分析索引使用情况**
|
||
|
||
```sql
|
||
-- 查看未使用的索引(MySQL 5.7+)
|
||
SELECT
|
||
object_schema AS database_name,
|
||
object_name AS table_name,
|
||
index_name
|
||
FROM performance_schema.table_io_waits_summary_by_index_usage
|
||
WHERE index_name IS NOT NULL
|
||
AND count_star = 0
|
||
AND object_schema = 'your_database'
|
||
ORDER BY object_schema, object_name;
|
||
|
||
-- 查看索引基数(选择性)
|
||
SHOW INDEX FROM users;
|
||
-- Cardinality 列:唯一值数量,越高越好
|
||
```
|
||
|
||
---
|
||
|
||
##### **2. 删除冗余索引**
|
||
|
||
```sql
|
||
-- 查看重复索引
|
||
SELECT
|
||
a.table_schema,
|
||
a.table_name,
|
||
a.index_name AS index1,
|
||
b.index_name AS index2,
|
||
a.column_name
|
||
FROM information_schema.statistics a
|
||
JOIN information_schema.statistics b
|
||
ON a.table_schema = b.table_schema
|
||
AND a.table_name = b.table_name
|
||
AND a.column_name = b.column_name
|
||
AND a.index_name != b.index_name
|
||
WHERE a.table_schema = 'your_database';
|
||
|
||
-- 删除冗余索引
|
||
DROP INDEX idx_user_id ON orders;
|
||
```
|
||
|
||
---
|
||
|
||
##### **3. 定期 ANALYZE TABLE**
|
||
|
||
```sql
|
||
-- 更新表的统计信息,帮助优化器选择正确的索引
|
||
ANALYZE TABLE users;
|
||
ANALYZE TABLE orders;
|
||
```
|
||
|
||
---
|
||
|
||
### 6. 总结
|
||
|
||
#### **索引优化检查清单**
|
||
|
||
- [ ] 是否创建了必要的索引(WHERE、JOIN、ORDER BY 列)
|
||
- [ ] 联合索引是否遵循最左前缀原则
|
||
- [ ] 是否使用了覆盖索引避免回表
|
||
- [ ] 索引的选择性是否足够高(> 0.1)
|
||
- [ ] 是否存在冗余索引
|
||
- [ ] 是否有索引失效的场景(函数、类型转换、%前导)
|
||
- [ ] 慢查询是否都经过 EXPLAIN 分析
|
||
- [ ] 是否定期 ANALYZE TABLE 更新统计信息
|
||
|
||
---
|
||
|
||
#### **EXPLAIN 判断标准**
|
||
|
||
```
|
||
✅ 优秀
|
||
- type: const, eq_ref, ref
|
||
- Extra: Using index
|
||
- rows: < 1000
|
||
|
||
⚠️ 需要关注
|
||
- type: range, index
|
||
- rows: 1000 - 10000
|
||
|
||
❌ 需要优化
|
||
- type: ALL
|
||
- Extra: Using filesort, Using temporary
|
||
- rows: > 10000
|
||
```
|
||
|
||
---
|
||
|
||
### 7. 阿里 P7 加分项
|
||
|
||
**深度理解**:
|
||
- 理解 B+ 树的底层实现(页分裂、页合并)
|
||
- 理解索引统计信息的更新机制
|
||
- 了解 MySQL 优化器的成本计算模型
|
||
|
||
**实战经验**:
|
||
- 有将慢 SQL 从秒级优化到毫秒级的案例
|
||
- 有处理千万级数据索引设计的经验
|
||
- 有索引重构和数据迁移的经验
|
||
|
||
**架构能力**:
|
||
- 能设计分库分表后的索引策略
|
||
- 有读写分离的索引同步经验
|
||
- 能设计索引变更的灰度方案
|
||
|
||
**监控和工具**:
|
||
- 搭建慢查询监控和告警系统
|
||
- 编写自动化索引分析工具
|
||
- 有 pt-index-usage、pt-duplicate-key-checker 等工具使用经验
|