Files
interview/questions/01-分布式系统/MySQL主从延迟.md
yasinshaw 0e46a367c4 refactor: rename files to Chinese and organize by category
Organized 50 interview questions into 12 categories:
- 01-分布式系统 (9 files): 分布式事务, 分布式锁, 一致性哈希, CAP理论, etc.
- 02-数据库 (2 files): MySQL索引优化, MyBatis核心原理
- 03-缓存 (5 files): Redis数据结构, 缓存问题, LRU算法, etc.
- 04-消息队列 (1 file): RocketMQ/Kafka
- 05-并发编程 (4 files): 线程池, 设计模式, 限流策略, etc.
- 06-JVM (1 file): JVM和垃圾回收
- 07-系统设计 (8 files): 秒杀系统, 短链接, IM, Feed流, etc.
- 08-算法与数据结构 (4 files): B+树, 红黑树, 跳表, 时间轮
- 09-网络与安全 (3 files): TCP/IP, 加密安全, 性能优化
- 10-中间件 (4 files): Spring Boot, Nacos, Dubbo, Nginx
- 11-运维 (4 files): Kubernetes, CI/CD, Docker, 可观测性
- 12-面试技巧 (1 file): 面试技巧和职业规划

All files renamed to Chinese for better accessibility and
organized into categorized folders for easier navigation.

Generated with [Claude Code](https://claude.com/claude-code)
via [Happy](https://happy.engineering)

Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Happy <yesreply@happy.engineering>
2026-03-01 00:10:53 +08:00

1378 lines
33 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# MySQL 主从延迟面试指南
## 1. 主从复制原理
### MySQL 主从复制架构
**主从复制的三种架构模式**
1. **主从复制**Master-Slave
```
Master → Slave1
→ Slave2
→ Slave3
```
2. **主主复制**Master-Master
```
Master1 ↔ Master2
Slave1
Slave2
```
3. **级联复制**Master-Slave-Slave
```
Master → Slave1 → Slave2
Slave3
```
### 主从复制流程
MySQL 主从复制基于二进制日志Binary Log主要分为三个步骤
```
1. 主库写操作:
主库执行 SQL 事务
写入 binlog
发送 binlog 到从库
2. 从库读取:
从库 I/O 线程读取 binlog
写入中继日志Relay Log
更新 master-info
3. 从库应用:
从库 SQL 线程执行中继日志
更新 slave-relay-info
应用完成
```
### 复制的核心组件
**主端组件**
- **binlog**:记录所有更改操作
- **dump thread**:发送 binlog 到从库
**从端组件**
- **I/O thread**:接收 binlog
- **SQL thread**:执行中继日志
- **relay log**:中继日志
- **master.info**:记录主库连接信息
- **relay-log.info**:记录中继日志位置
### 复制的配置示例
**主库配置my.cnf**
```ini
[mysqld]
# 启用二进制日志
server-id = 1
log-bin = mysql-bin
binlog-format = ROW
binlog-row-image = FULL
expire_logs_days = 7
max_binlog_size = 1G
# GTID 配置
gtid_mode = ON
enforce_gtid_consistency = ON
# 复制过滤
replicate-wild-ignore-table = mysql.%
replicate-wild-ignore-table = test.%
```
**从库配置my.cnf**
```ini
[mysqld]
# 从库配置
server-id = 2
relay-log = mysql-relay-bin
read-only = 1
# GTID 配置
gtid_mode = ON
enforce_gtid_consistency = ON
# 中继日志自动清理
relay_log_purge = 1
```
**主从复制建立**
```sql
-- 主库创建复制用户
CREATE USER 'repl'@'%' IDENTIFIED BY 'password';
GRANT REPLICATION SLAVE ON *.* TO 'repl'@'%';
-- 从库配置主库连接
CHANGE REPLICATION SOURCE TO
SOURCE_HOST = '192.168.1.100',
SOURCE_PORT = 3306,
SOURCE_USER = 'repl',
SOURCE_PASSWORD = 'password',
SOURCE_AUTO_POSITION = 1; -- 使用 GTID
-- 启动复制
START REPLICA;
-- 查看复制状态
SHOW REPLICA STATUS \G
```
## 2. 主从延迟的原因
### 硬件层面
**1. 磁盘 I/O 瓶颈**
```bash
# 查看磁盘性能
iostat -x 1 10
# 监控磁盘使用情况
df -h
```
**2. 网络延迟**
```bash
# 网络延迟测试
ping 192.168.1.100
traceroute 192.168.1.100
mtr 192.168.1.100
# 网络带宽监控
iftop
nload
```
**3. CPU 负载过高**
```bash
# CPU 使用率监控
top
htop
mpstat 1 10
# MySQL 相关进程
ps aux | grep mysql
```
### 配置层面
**1. 复制参数配置不当**
```ini
# 主库配置优化
[mysqld]
# 二进制日志相关
sync_binlog = 1 # 1: 每次事务提交都同步0: 操作系统决定
binlog_cache_size = 4M # binlog 缓冲区大小
binlog_stmt_cache_size = 4M # 语句缓存大小
# 从库配置优化
[mysqld]
# SQL 线程配置
slave_parallel_workers = 4 # MySQL 5.7+ 并行复制
slave_parallel_type = LOGICAL_CLOCK # 并行复制类型
slave_pending_jobs_size_max = 2G # 待处理任务队列大小
# 中继日志相关
relay_log_space_limit = 8G # 中继日志限制
```
**2. 存储引擎配置**
```sql
-- 主库使用 InnoDB 配置
SET GLOBAL innodb_flush_log_at_trx_commit = 1; -- 1: 每次事务提交都刷新
SET GLOBAL innodb_buffer_pool_size = 8G; -- 50-70% 内存
SET GLOBAL innodb_io_capacity = 2000; -- 根据 IOPS 调整
SET GLOBAL innodb_io_capacity_max = 4000; -- 最大 I/O capacity
-- 从库优化配置
SET GLOBAL read_only = 1; -- 只读模式
SET GLOBAL innodb_flush_log_at_trx_commit = 1; -- 主从一致性
```
### 业务层面
**1. 大事务处理**
```sql
-- 问题示例:大事务导致延迟
BEGIN;
-- 执行大量更新操作
UPDATE order_table SET status = 'completed' WHERE create_time < '2023-01-01';
UPDATE order_table SET status = 'shipped' WHERE create_time < '2023-02-01';
... -- 大量操作
COMMIT;
-- 优化方案:批量处理
BEGIN;
-- 分批处理
UPDATE order_table SET status = 'completed' WHERE create_time < '2023-01-01' LIMIT 1000;
COMMIT;
-- 或者使用事件调度
CREATE EVENT batch_update_order_status
ON SCHEDULE EVERY 1 MINUTE
DO
BEGIN
UPDATE order_table SET status = 'completed'
WHERE create_time < '2023-01-01'
LIMIT 1000;
END;
```
**2. 复杂查询影响复制**
```sql
-- 复杂查询可能导致 SQL 线程阻塞
SELECT o.* FROM order o
JOIN user u ON o.user_id = u.id
WHERE o.amount > 10000
AND u.create_time > '2023-01-01'
AND o.status IN ('pending', 'processing')
ORDER BY o.create_time DESC
LIMIT 1000;
-- 优化方案:创建索引
CREATE INDEX idx_order_user_status ON order(user_id, status, create_time);
CREATE INDEX idx_user_create_time ON user(create_time);
-- 或者使用物化视图
CREATE MATERIALIZED VIEW mv_order_user_status AS
SELECT o.*, u.name
FROM order o
JOIN user u ON o.user_id = u.id;
-- 定期刷新
CREATE EVENT refresh_mv_order_user_status
ON SCHEDULE EVERY 5 MINUTE
DO
REFRESH MATERIALIZED VIEW mv_order_user_status;
```
### 复制模式影响
**1. 语句复制STATEMENT**
```sql
-- 语句复制的问题
CREATE PROCEDURE update_order_amount(IN p_user_id INT, IN p_factor DECIMAL)
BEGIN
DECLARE done INT DEFAULT FALSE;
DECLARE v_order_id INT;
DECLARE v_amount DECIMAL;
DECLARE cur CURSOR FOR SELECT id, amount FROM order WHERE user_id = p_user_id;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE;
OPEN cur;
read_loop: LOOP
FETCH cur INTO v_order_id, v_amount;
IF done THEN
LEAVE read_loop;
END IF;
UPDATE order SET amount = v_amount * p_factor WHERE id = v_order_id;
END LOOP;
CLOSE cur;
END;
-- 每次执行都会在从库重复执行,导致不同结果
CALL update_order_amount(1, 1.1);
```
**2. 行复制ROW**
```sql
-- 行复制配置
[mysqld]
binlog_format = ROW
binlog_row_image = FULL
-- 优点:数据一致性好
-- 缺点binlog 体积大,复制性能较低
```
**3. 混合复制MIXED**
```ini
[mysqld]
binlog_format = MIXED
```
## 3. 如何监控主从延迟
### 基础监控命令
**1. 查看复制延迟**
```sql
-- MySQL 8.0+
SHOW REPLICA STATUS\G
-- 关键字段
Seconds_Behind_Master:
Replica_IO_Running: I/O 线
Replica_SQL_Running: SQL 线
-- MySQL 5.7
SHOW SLAVE STATUS\G
```
**2. GTID 延迟监控**
```sql
-- 使用 GTID 监控延迟
SELECT
master_executed_gtid_set,
received_gtid_set,
SUBSTRING(master_executed_gtid_set, 1, 20) as master_gtid,
SUBSTRING(received_gtid_set, 1, 20) as slave_gtid
FROM performance_schema.replication_connection_status
WHERE channel_name = '';
```
### 延迟监控脚本
**1. Python 监控脚本**
```python
#!/usr/bin/env python3
import pymysql
import time
import sys
from datetime import datetime
class MySQLReplicationMonitor:
def __init__(self, host, user, password, port=3306):
self.host = host
self.user = user
self.password = password
self.port = port
def get_replication_status(self):
try:
conn = pymysql.connect(host=self.host, user=self.user, password=self.password, port=self.port)
cursor = conn.cursor()
query = """
SELECT
Seconds_Behind_Master,
Slave_IO_Running,
Slave_SQL_Running,
Last_IO_Error,
Last_SQL_Error,
Last_IO_Error_Timestamp,
Last_SQL_Error_Timestamp
FROM information_schema.replica_status
"""
cursor.execute(query)
result = cursor.fetchone()
return {
'delay': result[0] if result[0] is not None else 0,
'io_running': result[1] == 'Yes',
'sql_running': result[2] == 'Yes',
'io_error': result[3],
'sql_error': result[4],
'io_error_time': result[5],
'sql_error_time': result[6]
}
except Exception as e:
print(f"Error: {e}")
return None
finally:
if 'conn' in locals():
conn.close()
def monitor(self, interval=60, threshold=300):
while True:
status = self.get_replication_status()
if status:
print(f"[{datetime.now()}] Delay: {status['delay']}s, IO: {status['io_running']}, SQL: {status['sql_running']}")
if status['delay'] > threshold:
print(f"ALERT: Replication delay exceeds threshold: {status['delay']}s")
if not status['io_running']:
print("ERROR: IO thread stopped")
if not status['sql_running']:
print("ERROR: SQL thread stopped")
if status['io_error']:
print(f"IO Error: {status['io_error']}")
if status['sql_error']:
print(f"SQL Error: {status['sql_error']}")
time.sleep(interval)
# 使用示例
if __name__ == "__main__":
monitor = MySQLReplicationMonitor(
host="192.168.1.200",
user="monitor",
password="password"
)
monitor.monitor(interval=30, threshold=60)
```
**2. Shell 监控脚本**
```bash
#!/bin/bash
# replication_monitor.sh
MYSQL_HOST="192.168.1.200"
MYSQL_USER="monitor"
MYSQL_PASSWORD="password"
MYSQL_PORT="3306"
THRESHOLD=300
while true; do
DELAY=$(mysql -h$MYSQL_HOST -u$MYSQL_USER -p$MYSQL_PASSWORD -P$MYSQL_PORT -e "SHOW REPLICA STATUS\G" | grep "Seconds_Behind_Master" | awk '{print $2}')
if [ -z "$DELAY" ]; then
DELAY="0"
fi
TIMESTAMP=$(date "+%Y-%m-%d %H:%M:%S")
echo "[$TIMESTAMP] Replication Delay: $DELAY seconds"
if [ "$DELAY" -gt "$THRESHOLD" ]; then
echo "ALERT: Replication delay exceeds threshold: $DELAY seconds"
# 发送告警
# curl -X POST -H "Content-Type: application/json" -d '{"text":"Replication delay: '$DELAY' seconds"}' https://your-webhook-url
fi
sleep 30
done
```
### 监控系统集成
**1. Prometheus + Grafana 监控**
```yaml
# prometheus.yml
global:
scrape_interval: 15s
scrape_configs:
- job_name: 'mysql_replication'
static_configs:
- targets: ['192.168.1.100:9104', '192.168.1.200:9104']
```
**2. Exporter 配置**
```python
# mysql_exporter 配置
collector_groups:
- replication
- process
- schema
- global_innodb_metrics
# 查询示例
SELECT
variable_value as seconds_behind_master
FROM performance_schema.global_status
WHERE variable_name = 'Seconds_Behind_Master';
SELECT
variable_name,
variable_value
FROM performance_schema.global_status
WHERE variable_name IN (
'Slave_running',
'Slave_io_running',
'Slave_sql_running'
);
```
**3. Grafana Dashboard**
```json
{
"dashboard": {
"title": "MySQL Replication Monitor",
"panels": [
{
"title": "Replication Delay",
"type": "graph",
"targets": [
{
"expr": "mysql_global_status_seconds_behind_master",
"legendFormat": "{{instance}}"
}
]
},
{
"title": "IO Thread Status",
"type": "singlestat",
"targets": [
{
"expr": "mysql_global_status_slave_io_running",
"legendFormat": "{{instance}}"
}
]
},
{
"title": "SQL Thread Status",
"type": "singlestat",
"targets": [
{
"expr": "mysql_global_status_slave_sql_running",
"legendFormat": "{{instance}}"
}
]
}
]
}
}
```
### 延迟告警配置
**1. Alertmanager 配置**
```yaml
# alertmanager.yml
groups:
- name: mysql_replication
rules:
- alert: MySQLReplicationLag
expr: mysql_global_status_seconds_behind_master > 300
for: 5m
labels:
severity: warning
annotations:
summary: "MySQL replication lag is {{ $value }} seconds"
description: "Replication delay exceeds 5 minutes"
- alert: MySQLReplicationStopped
expr: mysql_global_status_slave_io_running == 0
for: 1m
labels:
severity: critical
annotations:
summary: "MySQL replication IO thread stopped"
description: "IO thread is not running"
- alert: MySQLSQLThreadStopped
expr: mysql_global_status_slave_sql_running == 0
for: 1m
labels:
severity: critical
annotations:
summary: "MySQL replication SQL thread stopped"
description: "SQL thread is not running"
```
**2. 企业级监控告警**
```python
# 企业级监控服务
class EnterpriseReplicationMonitor:
def __init__(self, config):
self.config = config
self.alert_channels = []
def add_alert_channel(self, channel):
self.alert_channels.append(channel)
def check_replication_health(self):
status = self.get_replication_status()
alerts = []
if status['delay'] > self.config['threshold']:
alerts.append({
'level': 'warning',
'message': f"Replication delay: {status['delay']}s",
'timestamp': datetime.now()
})
if not status['io_running']:
alerts.append({
'level': 'critical',
'message': "IO thread stopped",
'timestamp': datetime.now()
})
if not status['sql_running']:
alerts.append({
'level': 'critical',
'message': "SQL thread stopped",
'timestamp': datetime.now()
})
# 发送告警
for alert in alerts:
self.send_alert(alert)
def send_alert(self, alert):
for channel in self.alert_channels:
channel.send(alert)
# 邮件告警
class EmailAlertChannel:
def send(self, alert):
# 发送邮件逻辑
pass
# 钉钉告警
class DingTalkAlertChannel:
def send(self, alert):
# 发送钉钉消息
pass
# 企业微信告警
class WeChatAlertChannel:
def send(self, alert):
# 发送企业微信消息
pass
```
## 4. 如何解决主从延迟
### 读写分离策略
**1. 基础读写分离**
```java
// Java 读写分离实现
public class DataSourceRouter {
private final DataSource masterDataSource;
private final List<DataSource> slaveDataSources;
private final AtomicInteger counter = new AtomicInteger(0);
public DataSource getDataSource(boolean isWrite) {
if (isWrite) {
return masterDataSource;
} else {
int index = counter.getAndIncrement() % slaveDataSources.size();
return slaveDataSources.get(index);
}
}
// 使用注解
@Target(ElementType.METHOD)
@Retention(RetentionPolicy.RUNTIME)
public @interface ReadOnly {
}
}
@Service
public class UserService {
@Autowired
@ReadOnly
public List<User> getUsers() {
// 从从库读取
}
@Autowired
public void createUser(User user) {
// 写入主库
}
}
```
**2. 动态数据源路由**
```java
@Configuration
public class DynamicDataSourceConfig {
@Bean
@ConfigurationProperties("spring.datasource.master")
public DataSource masterDataSource() {
return DataSourceBuilder.create().build();
}
@Bean
@ConfigurationProperties("spring.datasource.slave")
public DataSource slaveDataSource1() {
return DataSourceBuilder.create().build();
}
@Bean
@ConfigurationProperties("spring.datasource.slave")
public DataSource slaveDataSource2() {
return DataSourceBuilder.create().build();
}
@Bean
public DataSource dynamicDataSource() {
Map<Object, Object> targetDataSources = new HashMap<>();
targetDataSources.put("master", masterDataSource());
targetDataSources.put("slave1", slaveDataSource1());
targetDataSources.put("slave2", slaveDataSource2());
DynamicDataSource dynamicDataSource = new DynamicDataSource();
dynamicDataSource.setDefaultTargetDataSource(masterDataSource());
dynamicDataSource.setTargetDataSources(targetDataSources);
return dynamicDataSource;
}
}
public class DynamicDataSource extends AbstractRoutingDataSource {
private static final ThreadLocal<String> dataSourceKey = new ThreadLocal<>();
public static void setDataSourceKey(String key) {
dataSourceKey.set(key);
}
public static String getDataSourceKey() {
return dataSourceKey.get();
}
@Override
protected Object determineCurrentLookupKey() {
return getDataSourceKey();
}
}
```
**3. 中间件实现读写分离**
```yaml
# MyCat 配置
# schema.xml
<schema name="business_db" sqlMaxLimit="100">
<table name="user" dataNode="dn1,dn2,dn3" rule="sharding-by-mod"/>
<table name="order" dataNode="dn1,dn2,dn3" rule="sharding-by-mod"/>
</schema>
<dataNode name="dn1" dataHost="host1" database="business_db_1"/>
<dataNode name="dn2" dataHost="host1" database="business_db_2"/>
<dataNode name="dn3" dataHost="host2" database="business_db_3"/>
<dataHost name="host1" maxCon="1000" minCon="10" balance="1" writeType="0" dbType="mysql">
<heartbeat>select user()</heartbeat>
<writeHost host="master" url="192.168.1.100:3306" user="root" password="password"/>
<readHost host="slave" url="192.168.1.101:3306" user="root" password="password"/>
</dataHost>
<dataHost name="host2" maxCon="1000" minCon="10" balance="1" writeType="0" dbType="mysql">
<heartbeat>select user()</heartbeat>
<writeHost host="master" url="192.168.1.102:3306" user="root" password="password"/>
<readHost host="slave" url="192.168.1.103:3306" user="root" password="password"/>
</dataHost>
```
### 并行复制优化
**1. MySQL 5.7+ 并行复制**
```sql
-- 从库配置
SET GLOBAL slave_parallel_workers = 4;
SET GLOBAL slave_parallel_type = 'LOGICAL_CLOCK';
SET GLOBAL slave_pending_jobs_size_max = 1024M;
-- 查看并行复制状态
SHOW VARIABLES LIKE '%parallel%';
SHOW STATUS LIKE '%slave_parallel%';
```
**2. 基于库的并行复制**
```ini
# my.cnf 配置
[mysqld]
# MySQL 5.7.2+ 支持库级别并行复制
slave_parallel_workers = 8
slave_parallel_type = DATABASE
replicate_wild_ignore_table=mysql.%
replicate_wild_ignore_table=test.%
```
**3. 基于组提交的并行复制**
```ini
# 主库配置
[mysqld]
# 启用组提交
binlog_group_commit_sync_delay = 1000
binlog_group_commit_sync_no_delay_count = 10
# 优化二进制日志
sync_binlog = 1
innodb_flush_log_at_trx_commit = 1
# 从库配置
[mysqld]
# 启用并行复制
slave_parallel_workers = 16
slave_parallel_type = LOGICAL_CLOCK
slave_preserve_commit_order = 1
```
### 半同步复制
**1. 半同步复制配置**
```sql
-- 主库配置
-- 安装插件
INSTALL PLUGIN rpl_semi_sync_master SONAME 'semisync_master.so';
SET GLOBAL rpl_semi_sync_master_enabled = 1;
-- 从库配置
-- 安装插件
INSTALL PLUGIN rpl_semi_sync_slave SONAME 'semisync_slave.so';
SET GLOBAL rpl_semi_sync_slave_enabled = 1;
-- 查看半同步状态
SHOW STATUS LIKE 'Rpl_semi_sync%';
```
**2. 半同步复制超时设置**
```sql
-- 主库超时设置
SET GLOBAL rpl_semi_sync_master_timeout = 1000; -- 毫秒
-- 从库超时设置
SET GLOBAL rpl_semi_sync_slave_timeout = 1000;
-- 主库等待从库确认数
SET GLOBAL rpl_semi_sync_master_wait_no_slave = 1;
```
**3. 半同步复制监控**
```java
// 半同步复制监控组件
@Component
public class SemiSyncMonitor {
@Scheduled(fixeedRate = 5000)
public void monitorSemiSync() {
// 检查半同步状态
boolean isMasterSemiSync = checkMasterSemiSyncStatus();
boolean isSlaveSemiSync = checkSlaveSemiSyncStatus();
// 监控延迟
long delay = getReplicationDelay();
// 监控等待时间
long waitTime = getSemiSyncWaitTime();
// 告警检查
if (!isMasterSemiSync) {
alert("Master semi-sync disabled");
}
if (delay > 60) {
alert("Replication delay too high: " + delay + "s");
}
if (waitTime > 1000) {
alert("Semi-sync wait time too long: " + waitTime + "ms");
}
}
}
```
### 优化主库性能
**1. 主库配置优化**
```ini
# my.cnf 主库优化
[mysqld]
# 缓冲池
innodb_buffer_pool_size = 16G
innodb_buffer_pool_instances = 8
# 日志配置
innodb_log_file_size = 4G
innodb_log_buffer_size = 64M
innodb_flush_log_at_trx_commit = 1
# I/O 配置
innodb_io_capacity = 2000
innodb_io_capacity_max = 4000
innodb_read_io_threads = 16
innodb_write_io_threads = 16
# 二进制日志
binlog_format = ROW
sync_binlog = 1
binlog_cache_size = 32M
binlog_stmt_cache_size = 32M
expire_logs_days = 7
max_binlog_size = 1G
# 连接配置
max_connections = 1000
thread_cache_size = 100
```
**2. 主库SQL优化**
```sql
-- 优化主库查询
-- 避免全表扫描
CREATE INDEX idx_user_id_status ON order(user_id, status);
CREATE INDEX idx_order_time ON order(create_time);
-- 优化大表更新
-- 使用批量更新
UPDATE order SET status = 'completed'
WHERE status = 'pending'
AND create_time < NOW() - INTERVAL 1 DAY
LIMIT 1000;
-- 使用临时表处理大操作
CREATE TEMPORARY TABLE temp_order_update AS
SELECT id, user_id FROM order
WHERE status = 'pending'
AND create_time < NOW() - INTERVAL 1 DAY
LIMIT 1000;
UPDATE temp_order_update t
JOIN order o ON t.id = o.id
SET o.status = 'completed';
-- 定期优化表
ANALYZE TABLE order, user;
OPTIMIZE TABLE order;
```
### 从库优化策略
**1. 从库配置优化**
```ini
# my.cnf 从库优化
[mysqld]
# 只读模式
read_only = 1
super_read_only = 1
# 缓冲池(通常比主库大)
innodb_buffer_pool_size = 24G
innodb_buffer_pool_instances = 8
# 读取优化
innodb_read_io_threads = 32
innodb_write_io_threads = 16
# 中继日志
relay_log_space_limit = 8G
relay_log_purge = 1
# 复制优化
slave_parallel_workers = 16
slave_parallel_type = LOGICAL_CLOCK
slave_pending_jobs_size_max = 2G
# 查询缓存MySQL 8.0已移除)
query_cache_size = 0
```
**2. 从库SQL优化**
```sql
-- 从库专用索引
CREATE INDEX idx_query_user ON user(create_time);
CREATE INDEX idx_report_order ON order(create_time, amount);
-- 优化复杂查询
-- 使用覆盖索引
SELECT SQL_NO_CACHE id, name, email
FROM user
WHERE create_time > '2023-01-01'
AND status = 'active';
-- 使用物化视图
CREATE MATERIALIZED VIEW mv_user_active AS
SELECT id, name, email, create_time
FROM user
WHERE status = 'active'
AND create_time > '2023-01-01';
-- 定期刷新
CREATE EVENT refresh_mv_user_active
ON SCHEDULE EVERY 5 MINUTE
DO
REFRESH MATERIALIZED VIEW mv_user_active;
```
### 故障恢复方案
**1. 主从切换自动化**
```java
// 主从切换服务
@Service
public class MasterSlaveFailoverService {
@Autowired
private DataSource masterDataSource;
@Autowired
private List<DataSource> slaveDataSources;
@Autowired
private NotificationService notificationService;
public void failover() {
// 1. 检测主库故障
if (!checkMasterHealth()) {
// 2. 选择新的主库
DataSource newMaster = selectNewMaster();
// 3. 执行主从切换
executeFailover(newMaster);
// 4. 通知应用
notifyApplication(newMaster);
// 5. 发送告警
notificationService.sendAlert("Master-Slave failover completed");
}
}
private boolean checkMasterHealth() {
try {
Connection conn = masterDataSource.getConnection();
return conn.isValid(5);
} catch (Exception e) {
return false;
}
}
private DataSource selectNewMaster() {
// 选择最健康的从库
return slaveDataSources.stream()
.filter(this::checkSlaveHealth)
.findFirst()
.orElseThrow(() -> new RuntimeException("No healthy slave available"));
}
private void executeFailover(DataSource newMaster) {
// 1. 停止从库复制
stopReplication(newMaster);
// 2. 重新配置主库
reconfigureAsMaster(newMaster);
// 3. 更新应用配置
updateDataSourceConfig(newMaster);
}
}
```
**2. 主从切换脚本**
```bash
#!/bin/bash
# master_failover.sh
MASTER_HOST="192.168.1.100"
SLAVE_HOSTS=("192.168.1.101" "192.168.1.102" "192.168.1.103")
NEW_MASTER="192.168.1.101"
# 1. 检查主库状态
if ! mysql -h$MASTER_HOST -uadmin -padmin -e "SELECT 1" >/dev/null 2>&1; then
echo "Master database is down"
# 2. 选择新主库
for slave in "${SLAVE_HOSTS[@]}"; do
if mysql -h$slave -uadmin -padmin -e "SHOW REPLICA STATUS\G" | grep -q "Slave_SQL_Running: Yes"; then
NEW_MASTER=$slave
break
fi
done
echo "New master selected: $NEW_MASTER"
# 3. 停止从库复制
mysql -h$NEW_MASTER -uadmin -padmin -e "STOP REPLICA"
# 4. 重新配置为主库
mysql -h$NEW_MASTER -uadmin -padmin -e "
RESET MASTER;
RESET SLAVE;
CHANGE REPLICATION SOURCE TO
SOURCE_HOST='',
SOURCE_PORT=0,
SOURCE_USER='',
SOURCE_PASSWORD='',
SOURCE_AUTO_POSITION=0;
"
# 5. 通知应用
curl -X POST -H "Content-Type: application/json" \
-d '{"master_host": "'$NEW_MASTER'"}' \
http://localhost:8080/api/change-master
echo "Failover completed"
else
echo "Master is healthy, no failover needed"
fi
```
## 5. 实际项目中的解决方案
### 大型电商系统主从优化
**场景描述**
- 日订单量100万+
- 数据库MySQL 8.0
- 架构1主16从
**解决方案**
1. **分库分表 + 主从复制**
```java
// 分库分表配置
@Configuration
public class EcommerceShardingConfig {
@Bean
public DataSource shardingDataSource() {
// 8个分片每个分片1主3从
Map<String, DataSource> dataSourceMap = new HashMap<>();
for (int i = 0; i < 8; i++) {
// 主库
HikariDataSource master = createDataSource("192.168.1." + (100 + i), 3306, "master");
dataSourceMap.put("master_" + i, master);
// 从库
for (int j = 1; j <= 3; j++) {
HikariDataSource slave = createDataSource("192.168.1." + (200 + i * 3 + j), 3306, "slave");
dataSourceMap.put("slave_" + i + "_" + j, slave);
}
}
ShardingRuleConfiguration ruleConfig = new ShardingRuleConfiguration();
// 订单表分片
TableRuleConfiguration orderRule = new TableRuleConfiguration("order",
"order_ds_$->{0..7}.order_$->{order_id % 8}");
ruleConfig.getTableRuleConfigs().add(orderRule);
return ShardingDataSourceFactory.createDataSource(dataSourceMap, ruleConfig);
}
}
```
2. **多级缓存**
```java
// 多级缓存策略
@Service
public class OrderService {
@Autowired
private RedisTemplate redisTemplate;
@Autowired
private OrderRepository orderRepository;
@Cacheable(value = "order", key = "#orderId")
public OrderDTO getOrder(Long orderId) {
// 从数据库读取
Order order = orderRepository.findById(orderId);
// 缓存到Redis
redisTemplate.opsForValue().set("order:" + orderId, order, 1, TimeUnit.HOURS);
return convertToDTO(order);
}
@Cacheable(value = "order_list", key = "#userId + '_' + #page")
public List<OrderDTO> getUserOrders(Long userId, int page) {
// 从从库读取
List<Order> orders = orderRepository.findByUserId(userId, page);
// 缓存列表
redisTemplate.opsForList().leftPushAll("order:list:" + userId, orders);
return convertToDTOList(orders);
}
}
```
### 社交媒体平台优化
**场景描述**
- 日活跃用户5000万+
- 数据量TB级
- 读多写少
**解决方案**
1. **读写分离策略**
```java
// 智能读写分离
public class SmartDataSourceRouter {
private final DataSource masterDataSource;
private final List<DataSource> slaveDataSources;
private final LoadBalancer loadBalancer;
public DataSource getDataSource() {
// 根据延迟选择从库
List<DataSource> healthySlaves = getHealthySlaves();
if (healthySlaves.isEmpty()) {
return masterDataSource;
}
// 根据延迟选择最优从库
DataSource bestSlave = selectBestSlave(healthySlaves);
return bestSlave;
}
private DataSource selectBestSlave(List<DataSource> slaves) {
return slaves.stream()
.min(Comparator.comparingDouble(this::getSlaveDelay))
.orElse(slaves.get(0));
}
private double getSlaveDelay(DataSource dataSource) {
// 获取从库延迟
try (Connection conn = dataSource.getConnection()) {
long delay = conn.createStatement()
.executeQuery("SHOW REPLICA STATUS")
.getLong("Seconds_Behind_Master");
return delay;
} catch (Exception e) {
return Double.MAX_VALUE;
}
}
}
```
2. **数据预热**
```java
// 数据预热服务
@Component
public class DataWarmupService {
@Autowired
private UserService userService;
@Autowired
private PostService postService;
@Scheduled(cron = "0 0 1 * * ?") // 每天凌晨1点
public void warmupData() {
// 预热热门用户
List<User> hotUsers = userService.getHotUsers();
hotUsers.forEach(user -> {
userService.getUserCache(user.getId());
});
// 预热热门帖子
List<Post> hotPosts = postService.getHotPosts();
hotPosts.forEach(post -> {
postService.getPostCache(post.getId());
});
}
}
```
### 金融系统主从优化
**场景描述**
- 数据一致性要求高
- 不能丢失数据
- 低延迟要求
**解决方案**
1. **半同步复制 + 事务同步**
```java
// 金融系统数据同步
@Service
public class FinancialTransactionService {
@Autowired
private DataSource masterDataSource;
@Autowired
private DataSource slaveDataSource;
@Transactional
public void transferMoney(String fromAccount, String toAccount, BigDecimal amount) {
try {
// 1. 执行转账
transfer(fromAccount, toAccount, amount);
// 2. 同步到从库
syncToSlave(fromAccount, toAccount, amount);
// 3. 记录日志
logTransaction(fromAccount, toAccount, amount);
} catch (Exception e) {
// 回滚事务
throw new FinancialException("Transfer failed", e);
}
}
private void syncToSlave(String fromAccount, String toAccount, BigDecimal amount) {
try (Connection conn = slaveDataSource.getConnection()) {
// 同步转账记录
String sql = "INSERT INTO transaction_log (from_account, to_account, amount, status) VALUES (?, ?, ?, 'SUCCESS')";
try (PreparedStatement ps = conn.prepareStatement(sql)) {
ps.setString(1, fromAccount);
ps.setString(2, toAccount);
ps.setBigDecimal(3, amount);
ps.executeUpdate();
}
} catch (Exception e) {
throw new RuntimeException("Sync to slave failed", e);
}
}
}
```
2. **数据校验**
```java
// 数据一致性校验
@Service
public class DataConsistencyService {
@Scheduled(fixedRate = 300000) // 每5分钟校验一次
public void checkConsistency() {
// 1. 检查数据一致性
List<Inconsistency> inconsistencies = findInconsistencies();
// 2. 修复不一致数据
for (Inconsistency issue : inconsistencies) {
fixInconsistency(issue);
}
// 3. 发送告警
if (!inconsistencies.isEmpty()) {
notificationService.sendAlert("Data consistency issues found: " + inconsistencies.size());
}
}
private List<Inconsistency> findInconsistencies() {
List<Inconsistency> result = new ArrayList<>();
// 比较主从数据
String masterQuery = "SELECT COUNT(*) FROM account";
String slaveQuery = "SELECT COUNT(*) FROM account";
// 执行查询并比较
if (!compareQueryResults(masterQuery, slaveQuery)) {
result.add(new Inconsistency("account", "COUNT_MISMATCH"));
}
return result;
}
private boolean compareQueryResults(String masterQuery, String slaveQuery) {
// 比较查询结果
// 实现细节
return true;
}
}
```
### 总结
解决 MySQL 主从延迟需要从多个层面考虑:
1. **硬件层面**优化磁盘、网络、CPU性能
2. **配置层面**合理配置MySQL参数
3. **架构层面**:设计合理的读写分离策略
4. **业务层面**:优化查询,避免大事务
5. **监控层面**:建立完善的监控体系
6. **运维层面**:自动化故障恢复
在实际项目中,需要根据业务特点选择合适的解决方案,并持续优化和改进。