# 数据库分库分表面试指南 ## 1. 垂直分库、水平分库的区别 ### 垂直分库 **定义**:按照业务模块将表拆分到不同的数据库中。 **特点**: - 每个数据库包含不同的业务表 - 解决单表数据量过大问题 - 便于数据管理和权限控制 - 减少单个数据库的连接数压力 **图解**: ``` 单数据库 垂直分库后 ┌─────────┐ ┌─────────┐ │ 用户表 │ │ 用户DB │ ├─────────┤ ├─────────┤ │ 订单表 │ → │ 订单DB │ ├─────────┤ ├─────────┤ │ 商品表 │ │ 商品DB │ ├─────────┤ ├─────────┤ │ 支付表 │ │ 支付DB │ └─────────┴──────────────┴─────────┘ ``` **代码示例**: ```java // 垂直分库配置 @Configuration public class VerticalShardingConfig { @Bean @ConfigurationProperties("spring.datasource.user") public DataSource userDataSource() { return DataSourceBuilder.create().build(); } @Bean @ConfigurationProperties("spring.datasource.order") public DataSource orderDataSource() { return DataSourceBuilder.create().build(); } @Bean public ShardingDataSource shardingDataSource( @Qualifier("userDataSource") DataSource userDataSource, @Qualifier("orderDataSource") DataSource orderDataSource) { Map dataSourceMap = new HashMap<>(); dataSourceMap.put("user_ds", userDataSource); dataSourceMap.put("order_ds", orderDataSource); ShardingRuleConfiguration shardingRuleConfig = new ShardingRuleConfiguration(); // 用户表路由规则 TableRuleConfiguration userTableRule = new TableRuleConfiguration("user", "user_ds.user_$->{user_id % 4}"); shardingRuleConfig.getTableRuleConfigs().add(userTableRule); // 订单表路由规则 TableRuleConfiguration orderTableRule = new TableRuleConfiguration("order", "order_ds.order_$->{order_id % 4}"); shardingRuleConfig.getTableRuleConfigs().add(orderTableRule); return ShardingDataSourceFactory.createDataSource(dataSourceMap, shardingRuleConfig); } } ``` ### 水平分库 **定义**:将同一个表的数据按照某种规则拆分到不同的数据库中。 **特点**: - 每个数据库包含相同的表结构 - 解决单表数据量过大问题 - 提升查询性能和并发能力 - 解决单机存储瓶颈 **图解**: ``` 单数据库 水平分库后 ┌─────────┐ ┌─────────┐ │ 用户表 │ │ user_0 │ │ 100W │ ├─────────┤ ├─────────┤ │ user_1 │ │ 订单表 │ → ├─────────┤ │ 500W │ │ user_2 │ ├─────────┤ ├─────────┤ │ 商品表 │ │ user_3 │ │ 200W │ └─────────┘ └─────────┴───────────────────────┘ ``` **代码示例**: ```java // 水平分库配置 @Configuration public class HorizontalShardingConfig { @Bean public DataSource horizontalShardingDataSource() { Map dataSourceMap = new HashMap<>(); // 创建4个分库 for (int i = 0; i < 4; i++) { HikariDataSource dataSource = new HikariDataSource(); dataSource.setJdbcUrl(String.format("jdbc:mysql://127.0.0.1:3306/user_%d", i)); dataSource.setUsername("root"); dataSource.setPassword("password"); dataSourceMap.put(String.format("user_ds_%d", i), dataSource); } ShardingRuleConfiguration shardingRuleConfig = new ShardingRuleConfiguration(); // 水平分库规则 ShardingTableRuleConfiguration tableRule = new ShardingTableRuleConfiguration(); tableLogicTable = "user"; actualDataNodes = "user_ds_$->{0..3}.user_$->{user_id % 4}"; shardingRuleConfig.getTableRuleConfigs().add(tableRule); // 分片算法 StandardShardingAlgorithm shardingAlgorithm = new CustomModShardingAlgorithm(); shardingRuleConfig.setDefaultDatabaseShardingStrategyConfig( new StandardShardingStrategyConfiguration("user_id", shardingAlgorithm)); return ShardingDataSourceFactory.createDataSource(dataSourceMap, shardingRuleConfig); } public static class CustomModShardingAlgorithm implements StandardShardingAlgorithm { @Override public String doSharding(Collection availableTargetNames, PreciseShardingValue shardingValue) { int index = shardingValue.getValue() % 4; return availableTargetNames.stream() .filter(target -> target.endsWith("_" + index)) .findFirst() .orElseThrow(() -> new IllegalArgumentException("no database available")); } } } ``` ## 2. 分库分表的策略 ### 范围分片 **特点**: - 按照 ID 或时间范围进行分片 - 查询效率高,范围查询方便 - 数据分布不均匀,热点问题 **示例**: ```sql -- 按用户ID范围分片 CREATE TABLE user ( id BIGINT, name VARCHAR(50), age INT ) PARTITION BY RANGE (id) ( PARTITION p0 VALUES LESS THAN (1000000), PARTITION p1 VALUES LESS THAN (2000000), PARTITION p2 VALUES LESS THAN (3000000), PARTITION p3 VALUES LESS THAN MAXVALUE ); ``` ### Hash 分片 **特点**: - 数据分布均匀 - 范围查询需要全表扫描 - 扩容困难,数据迁移量大 **示例**: ```java // Hash分片算法 public class HashShardingAlgorithm implements StandardShardingAlgorithm { private final int shardingCount; public HashShardingAlgorithm(int shardingCount) { this.shardingCount = shardingCount; } @Override public String doSharding(Collection availableTargetNames, PreciseShardingValue shardingValue) { int hash = shardingValue.getValue() % shardingCount; if (hash < 0) { hash = Math.abs(hash); } String target = availableTargetNames.stream() .filter(name -> name.endsWith("_" + hash)) .findFirst() .orElseThrow(() -> new IllegalArgumentException("no database available")); return target; } } ``` ### 一致性哈希分片 **特点**: - 扩容时数据迁移量小 - 节点增减时只需迁移少量数据 - 实现相对复杂 **图解**: ``` 一致性哈希环 ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ 节点1 │ │ 节点2 │ │ 节点3 │ │ 节点4 │ │ Hash: │ │ Hash: │ │ Hash: │ │ Hash: │ │ 1000 │ │ 3000 │ │ 5000 │ │ 7000 │ └─────────┘ └─────────┘ └─────────┘ └─────────┘ ↓ ↓ ↓ ↓ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ 1500 │ │ 3500 │ │ 5500 │ │ 7500 │ └─────────┘ └─────────┘ └─────────┘ └─────────┘ ↓ ↓ ↓ ↓ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ 2000 │ │ 4000 │ │ 6000 │ │ 8000 │ └─────────┘ └─────────┘ └─────────┘ └─────────┘ ``` **代码示例**: ```java public class ConsistentHashSharding { private final SortedMap circle = new TreeMap<>(); private final int virtualNodeCount; public ConsistentHashSharding(List nodes, int virtualNodeCount) { this.virtualNodeCount = virtualNodeCount; for (String node : nodes) { addNode(node); } } private void addNode(String node) { for (int i = 0; i < virtualNodeCount; i++) { String virtualNode = node + "#" + i; int hash = hash(virtualNode); circle.put(hash, virtualNode); } } public String getNode(String key) { if (circle.isEmpty()) { return null; } int hash = hash(key); SortedMap tailMap = circle.tailMap(hash); if (tailMap.isEmpty()) { return circle.get(circle.firstKey()); } return tailMap.get(tailMap.firstKey()); } private int hash(String key) { final int p = 16777619; int hash = (int) 2166136261L; for (int i = 0; i < key.length(); i++) { hash = (hash ^ key.charAt(i)) * p; } hash += hash << 13; hash ^= hash >> 7; hash += hash << 3; hash ^= hash >> 17; hash += hash << 5; return hash < 0 ? -hash : hash; } } ``` ### 地理位置(GeoHash)分片 **特点**: - 按地理位置进行分片 - 适合有地理属性的业务 - 查询效率高 **示例**: ```java // GeoHash分片算法 public class GeoHashSharding { private final Geohash geohash = new Geohash(); private final Map geoToShard = new HashMap<>(); public String getShardByLocation(double lat, double lng) { String geoCode = geohash.encode(lat, lng, 8); return geoToShard.get(geoCode.substring(0, 2)); // 前两位决定分片 } } ``` ## 3. 分库分表后的问题 ### 跨库 JOIN 问题 **问题**:无法直接跨库执行 JOIN 操作 **解决方案**: 1. **应用层 JOIN** ```java @Service public class OrderService { public OrderDTO getOrderWithUser(Long orderId) { // 1. 查询订单 Order order = orderMapper.selectById(orderId); // 2. 查询用户 User user = userMapper.selectById(order.getUserId()); // 3. 组装结果 OrderDTO dto = new OrderDTO(); BeanUtils.copyProperties(order, dto); dto.setUser(user); return dto; } } ``` 2. **中间件自动路由** ```java // 使用 MyCAT 自动路由 @Configuration @ShardingTable("order_detail") public class OrderShardingConfig { @Bean public TableRule orderDetailRule() { TableRule rule = new TableRule(); rule.setLogicTable("order_detail"); rule.setActualDataNodes("order_ds_$->{0..3}.order_detail_$->{order_id % 4}"); return rule; } } ``` 3. **ER 分片** ```sql -- 使用 ER 分片,保证父子表在同一分片 CREATE TABLE user ( id BIGINT AUTO_INCREMENT, name VARCHAR(50), PRIMARY KEY (id) ); CREATE TABLE order ( id BIGINT AUTO_INCREMENT, user_id BIGINT, amount DECIMAL(10,2), PRIMARY KEY (id), FOREIGN KEY (user_id) REFERENCES user(id) ) PARTITION BY HASH(user_id); ``` ### 分布式事务问题 **问题**:跨多个数据库的事务一致性 **解决方案**: 1. **TCC 模式** ```java @Service public class OrderService { @Transactional public void createOrder(OrderDTO orderDTO) { // Try 阶段 orderRepository.createOrder(orderDTO); // 预扣库存 inventoryService.reserveInventory(orderDTO.getItems()); // 预扣账户余额 paymentService.reserveAmount(orderDTO.getUserId(), orderDTO.getTotalAmount()); } @Transactional public void confirmOrder(Long orderId) { // Confirm 阶段 orderRepository.confirmOrder(orderId); } @Transactional public void cancelOrder(Long orderId) { // Cancel 阶段 orderRepository.cancelOrder(orderId); inventoryService.cancelReserve(orderId); paymentService.cancelReserve(orderId); } } ``` 2. **XA 分布式事务** ```java // 使用 Atomikos 实现XA事务 @Component public class XATransactionManager { @Resource private UserTransactionManager userTransactionManager; @Resource private UserTransaction userTransaction; public void execute(Runnable operation) throws SystemException { userTransaction.begin(); try { operation.run(); userTransaction.commit(); } catch (Exception e) { userTransaction.rollback(); throw new SystemException("XA transaction failed"); } } } ``` ### 分页问题 **问题**:跨分页查询和排序 **解决方案**: 1. **Limit 分页** ```java @Service public class UserService { public PageResult getPageUsers(int page, int size) { // 查询所有分库 List allUsers = new ArrayList<>(); for (int i = 0; i < 4; i++) { List users = userMapper.selectByShard(i, page, size); allUsers.addAll(users); } // 内存分页 int fromIndex = (page - 1) * size; int toIndex = Math.min(fromIndex + size, allUsers.size()); List pageUsers = allUsers.subList(fromIndex, toIndex); return new PageResult<>(pageUsers, allUsers.size()); } } ``` 2. **游标分页** ```java @Service public class CursorPagingService { public List getOrdersByCursor(Long lastId, int limit) { List orders = new ArrayList<>(); for (int i = 0; i < 4; i++) { List shardOrders = orderMapper.selectByIdGreaterThan(lastId, limit, i); orders.addAll(shardOrders); } // 按ID排序并去重 orders.sort(Comparator.comparingLong(OrderVO::getId)); return orders.stream().limit(limit).collect(Collectors.toList()); } } ``` ## 4. 分库分表的中间件 ### MyCAT **特点**: - 支持 SQL 路由和读写分离 - 支持分片规则和分片算法 - 兼容 MySQL 协议 **配置示例**: ```xml id io.mycat.route.function.PartitionByMod 3 select user() ``` ### ShardingSphere **特点**: - 轻量级、可插拔 - 支持多种数据库 - 提供治理和监控能力 **配置示例**: ```java // Java 配置 @Configuration public class ShardingSphereConfig { @Bean public DataSource shardingDataSource() { ShardingRuleConfiguration shardingRuleConfig = new ShardingRuleConfiguration(); // 分片规则 TableRuleConfiguration userTableRule = new TableRuleConfiguration("user", "user_ds_$->{0..3}.user_$->{user_id % 4}"); shardingRuleConfig.getTableRuleConfigs().add(userTableRule); // 绑定表 shardingRuleConfig.getBindingTableGroups().add("user,order"); // 广播表 shardingRuleConfig.getBroadcastTables().add("dict"); return ShardingDataSourceFactory.createDataSource(createDataSourceMap(), shardingRuleConfig); } private Map createDataSourceMap() { Map result = new HashMap<>(); // 创建4个分片数据源 for (int i = 0; i < 4; i++) { result.put("user_ds_" + i, createDataSource("localhost", 3306 + i, "root", "password", "user_" + i)); } return result; } private DataSource createDataSource(String host, int port, String username, String password, String database) { HikariDataSource dataSource = new HikariDataSource(); dataSource.setJdbcUrl(String.format("jdbc:mysql://%s:%d/%s?useUnicode=true&characterEncoding=utf-8", host, port, database)); dataSource.setUsername(username); dataSource.setPassword(password); return dataSource; } } ``` ### Proxy 模式 vs JDBC 模式 **Proxy 模式**: - 通过代理层转发SQL - 无需修改应用代码 - 性能损耗较大 **JDBC 模式**: - 直接集成JDBC驱动 - 性能更高 - 需要修改应用配置 ## 5. 实际项目中的分库分表设计 ### 电商系统分片设计 **业务场景**: - 用户表:按用户ID哈希分片 - 订单表:按用户ID分片,保证用户订单在同一分片 - 商品表:按商品ID范围分片 - 交易表:按时间分片 **配置示例**: ```java // 电商系统分片配置 @Configuration public class EcommerceShardingConfig { @Bean public DataSource ecommerceShardingDataSource() { Map dataSourceMap = new HashMap<>(); // 创建分库 for (int i = 0; i < 8; i++) { HikariDataSource dataSource = new HikariDataSource(); dataSource.setJdbcUrl(String.format("jdbc:mysql://127.0.0.1:3306/ecommerce_%d", i)); dataSource.setUsername("root"); dataSource.setPassword("password"); dataSourceMap.put("ecommerce_ds_" + i, dataSource); } ShardingRuleConfiguration ruleConfig = new ShardingRuleConfiguration(); // 用户表 - 哈希分片 TableRuleConfiguration userRule = new TableRuleConfiguration("user", "ecommerce_ds_$->{0..7}.user_$->{user_id % 8}"); ruleConfig.getTableRuleConfigs().add(userRule); // 订单表 - 用户ID分片 TableRuleConfiguration orderRule = new TableRuleConfiguration("order", "ecommerce_ds_$->{0..7}.order_$->{user_id % 8}"); ruleConfig.getTableRuleConfigs().add(orderRule); // 商品表 - 范围分片 TableRuleConfiguration productRule = new TableRuleConfiguration("product", "ecommerce_ds_$->{0..7}.product_$->{product_id % 8}"); ruleConfig.getTableRuleConfigs().add(productRule); // 交易表 - 时间分片 TableRuleConfiguration tradeRule = new TableRuleConfiguration("trade", "ecommerce_ds_$->{0..7}.trade_$->{YEAR(create_time) * 12 + MONTH(create_time)}"); ruleConfig.getTableRuleConfigs().add(tradeRule); // 绑定表 ruleConfig.getBindingTableGroups().add("user,order"); ruleConfig.getBindingTableGroups().add("product,trade_detail"); // 广播表 ruleConfig.getBroadcastTables().add("sys_config"); return ShardingDataSourceFactory.createDataSource(dataSourceMap, ruleConfig); } } ``` ### 社交系统分片设计 **业务场景**: - 用户表:按用户ID分片 - 好友关系表:按用户ID哈希分片 - 动态表:按用户ID分片 - 评论表:按业务ID分片 **设计原则**: 1. **数据访问模式**:根据查询模式选择分片键 2. **数据量均衡**:确保各分片数据量大致相等 3. **跨分片查询少**:减少需要跨分片的查询 4. **分片键选择**:选择区分度高的字段 ### 扩容方案 **垂直扩容**: - 增加分库数量 - 重新分配数据 - 需要数据迁移 **水平扩容**: - 增加分片数量 - 使用一致性哈希减少迁移 **代码示例**: ```java // 动态扩容的哈希分片算法 public class DynamicShardingAlgorithm implements StandardShardingAlgorithm { private volatile int shardingCount = 4; private final ReadWriteLock rwLock = new ReentrantReadWriteLock(); public void updateShardingCount(int newCount) { rwLock.writeLock().lock(); try { this.shardingCount = newCount; } finally { rwLock.writeLock().unlock(); } } @Override public String doSharding(Collection availableTargetNames, PreciseShardingValue shardingValue) { rwLock.readLock().lock(); try { int index = shardingValue.getValue() % shardingCount; if (index < 0) { index = Math.abs(index); } return availableTargetNames.stream() .filter(name -> name.endsWith("_" + index)) .findFirst() .orElseThrow(() -> new IllegalArgumentException("no database available")); } finally { rwLock.readLock().unlock(); } } } ``` ## 6. 阿里 P7 加分项 ### 分库分表监控体系 ```java // 分库分表监控组件 @Component public class ShardingMonitor { private final MeterRegistry meterRegistry; private final ShardingDataSource shardingDataSource; @Scheduled(fixedRate = 5000) public void monitorShardingMetrics() { // 监控各分片性能 for (int i = 0; i < 8; i++) { DataSource dataSource = shardingDataSource.getDataSource("ecommerce_ds_" + i); // 连接池监控 HikariDataSource hikariDataSource = (HikariDataSource) dataSource; meterRegistry.gauge("sharding.pool.active", i, hikariDataSource::getHikariPoolMXBean); meterRegistry.gauge("sharding.pool.idle", i, hikariDataSource::getHikariPoolMXBean); // 慢查询监控 monitorSlowQueries(i); } } private void monitorSlowQueries(int shardIndex) { // 查询慢查询 List> slowQueries = jdbcTemplate.queryForList( "SELECT * FROM slow_log WHERE execution_time > 1000 ORDER BY execution_time DESC LIMIT 10"); slowQueries.forEach(query -> { meterRegistry.counter("sharding.slow.query", "shard", String.valueOf(shardIndex), "sql", (String) query.get("query")) .increment(); }); } } ``` ### 自动化运维平台 ```java // 分库分表自动化迁移工具 @Service public class ShardingMigrationService { private final ShardingDataSource shardingDataSource; private final ExecutorService executorService; public void migrateData(String table, int oldShardCount, int newShardCount) { List> futures = new ArrayList<>(); // 并行迁移各分片 for (int oldShard = 0; oldShard < oldShardCount; oldShard++) { for (int newShard = 0; newShard < newShardCount; newShard++) { final int fOldShard = oldShard; final int fNewShard = newShard; futures.add(executorService.submit(() -> { migrateShardData(table, fOldShard, fNewShard); })); } } // 等待迁移完成 for (Future future : futures) { try { future.get(); } catch (Exception e) { log.error("Migration failed", e); } } } private void migrateShardData(String table, int oldShard, int newShard) { // 查询源数据 List> sourceData = jdbcTemplate.queryForList( "SELECT * FROM " + table + " WHERE id % ? = ?", oldShard, oldShard); // 目标数据源 DataSource targetDataSource = shardingDataSource.getDataSource("ecommerce_ds_" + newShard); JdbcTemplate targetJdbcTemplate = new JdbcTemplate(targetDataSource); // 批量插入 BatchPreparedStatementSetter setter = new BatchPreparedStatementSetter() { @Override public void setValues(PreparedStatement ps, int i) throws SQLException { Map row = sourceData.get(i); // 设置参数 } @Override public int getBatchSize() { return sourceData.size(); } }; targetJdbcTemplate.batchUpdate("INSERT INTO " + table + " VALUES (?, ?, ?)", setter); } } ``` ### 高级分片策略 ```java // 基于业务规则的复合分片策略 @Component public class BusinessRuleShardingAlgorithm implements StandardShardingAlgorithm { @Override public String doSharding(Collection availableTargetNames, PreciseShardingValue shardingValue) { BusinessKey businessKey = shardingValue.getValue(); // 复合分片规则:用户ID + 时间 + 业务类型 String shardKey = businessKey.getUserId() + "_" + businessKey.getCreateTime() + "_" + businessKey.getBusinessType(); // 使用加密哈希保证分布均匀 int hash = murmurHash(shardKey); int index = Math.abs(hash % availableTargetNames.size()); return availableTargetNames.stream() .filter(name -> name.endsWith("_" + index)) .findFirst() .orElseThrow(() -> new IllegalArgumentException("no database available")); } private int murmurHash(String key) { // MurmurHash 实现 return key.hashCode(); } } // 分片键实体 @Data @AllArgsConstructor public class BusinessKey { private Long userId; private LocalDateTime createTime; private String businessType; } ``` ### 总结 分库分表是大型数据库架构的必经之路,需要: 1. **合理选择分片策略**:根据业务特点选择合适的分片算法 2. **解决技术难题**:重点关注跨库JOIN、分布式事务、分页等问题 3. **完善监控体系**:建立完善的监控和告警机制 4. **自动化运维**:实现自动化的分片迁移和扩容 5. **性能优化**:持续优化查询性能和系统稳定性 在面试中,除了技术细节,还要体现对业务的理解、系统的架构能力和性能优化的经验。