diff --git a/questions/cache-problems.md b/questions/cache-problems.md
new file mode 100644
index 0000000..fe5b721
--- /dev/null
+++ b/questions/cache-problems.md
@@ -0,0 +1,1093 @@
+# Redis 缓存穿透、击穿、雪崩
+
+## 问题
+
+**背景**：Redis 在高并发系统中广泛应用，但不当的使用会导致严重的线上问题。
+
+**问题**：
+1. 什么是缓存穿透、缓存击穿、缓存雪崩？它们的区别是什么？
+2. 分别有哪些解决方案？请详细说明每种方案的优缺点
+3. 布隆过滤器解决缓存穿透的原理是什么？有什么局限性？
+4. 你在实际项目中遇到过这些问题吗？如何监控和预警？
+
+---
+
+## 标准答案
+
+### 1. 三者的定义和区别
+
+#### **缓存穿透 (Cache Penetration)**
+
+**定义**：
+客户端请求的数据**在缓存和数据库中都不存在**，导致每次请求都会穿透缓存直接打到数据库。
+
+**场景示例**：
+- 恶意攻击：构造大量不存在的 key（如 user_id = -1、-2、-3...）
+- 业务逻辑：查询未发布的商品 ID、已删除的用户信息
+
+**影响**：
+- 数据库压力剧增，可能被打挂
+- 缓存命中率下降，失去了缓存的意义
+
+---
+
+#### **缓存击穿 (Cache Breakdown)**
+
+**定义**：
+某个**热点 key**（被高频访问）在缓存中**突然过期**，此时大量并发请求同时击穿缓存，直接访问数据库。
+
+**场景示例**：
+- 秒杀活动的商品信息缓存刚好过期
+- 热点新闻缓存过期
+- 明星微博缓存失效瞬间
+
+**影响**：
+- 数据库瞬间压力激增
+- 可能导致数据库响应变慢甚至宕机
+- 用户体验下降
+
+**与缓存穿透的区别**：
+- 缓存击穿：**数据存在**，只是缓存过期
+- 缓存穿透：数据**根本不存在**
+
+---
+
+#### **缓存雪崩 (Cache Avalanche)**
+
+**定义**：
+**大量 key** 在**同一时间集中过期**，或者 Redis 宕机，导致大量请求直接打到数据库。
+
+**场景示例**：
+- 系统重启后，大量缓存同时失效
+- 批量导入缓存时设置了相同的过期时间
+- Redis 实例故障
+
+**影响**：
+- 数据库承受巨大压力
+- 可能级联故障，导致整个系统崩溃
+
+**与缓存击穿的区别**：
+- 缓存击穿：**单个热点 key** 过期
+- 缓存雪崩：**大量 key** 集中过期
+
+---
+
+### 2. 解决方案详解
+
+#### **缓存穿透的解决方案**
+
+##### **方案一：缓存空对象（Null Cache）**
+
+**原理**：
+当查询数据库发现数据不存在时，仍然将这个 key 缓存起来，值设为 null 或特殊值。
+
+**代码示例**：
+```java
+public User getUserById(Long userId) {
+    // 1. 查询缓存
+    String userKey = "user:" + userId;
+    User user = redis.get(userKey);
+
+    if (user != null) {
+        // 判断是否是空对象
+        if (user == NULL_VALUE) {
+            return null;
+        }
+        return user;
+    }
+
+    // 2. 查询数据库
+    user = db.findById(userId);
+
+    // 3. 缓存结果（包括 null）
+    if (user == null) {
+        redis.set(userKey, NULL_VALUE, 5); // 短时间缓存
+    } else {
+        redis.set(userKey, user, 30);
+    }
+
+    return user;
+}
+```
+
+**优点**：
+- 实现简单
+- 立即生效
+
+**缺点**：
+- 如果恶意攻击构造大量不同 key，会占用大量内存
+- 需要设置过期时间，否则可能导致数据不一致
+
+**优化**：
+- 空对象的过期时间设置较短（如 30 秒 - 5 分钟）
+- 使用布隆过滤器预判
+
+---
+
+##### **方案二：布隆过滤器 (Bloom Filter)**
+
+**原理**：
+布隆过滤器是一个**空间效率极高的概率型数据结构**，用于判断一个元素是否在一个集合中。
+
+**核心特点**：
+- **可能误判**：可能说存在，但实际不存在（假阳性）
+- **不会漏判**：如果说不存在，那一定不存在
+
+**工作流程**：
+```
+1. 系统启动时，将所有有效的 key 加载到布隆过滤器
+2. 用户请求到达时，先查布隆过滤器
+   - 判断 key 不存在 → 直接返回，不查数据库
+   - 判断 key 存在 → 继续查缓存和数据库
+```
+
+**代码示例**（使用 Redisson）：
+```java
+// 初始化布隆过滤器
+RBloomFilter<Long> bloomFilter = redisson.getBloomFilter("userFilter");
+bloomFilter.tryInit(1000000L, 0.01); // 预计100万元素，误判率1%
+
+// 加载所有有效用户ID到布隆过滤器
+List<Long> allUserIds = db.getAllUserIds();
+for (Long userId : allUserIds) {
+    bloomFilter.add(userId);
+}
+
+// 查询逻辑
+public User getUserById(Long userId) {
+    // 1. 布隆过滤器判断
+    if (!bloomFilter.contains(userId)) {
+        log.warn("Illegal user id: {}", userId);
+        return null;
+    }
+
+    // 2. 查询缓存
+    User user = redis.get("user:" + userId);
+    if (user != null) {
+        return user;
+    }
+
+    // 3. 查询数据库
+    user = db.findById(userId);
+    if (user != null) {
+        redis.set("user:" + userId, user, 30);
+    }
+
+    return user;
+}
+```
+
+**优点**：
+- 内存占用极小（1000 万数据只需约 15MB）
+- 查询效率高（O(k)，k 是哈希函数个数）
+- 完全拦截无效请求
+
+**缺点**：
+- **不支持删除**：传统布隆过滤器无法删除元素
+- **有误判率**：需要根据业务设置合理的误判率（通常 1% - 3%）
+- **需要预热**：系统启动时加载所有有效 key
+- **数据变更需要更新**：新增数据需要加入布隆过滤器
+
+**布隆过滤器参数计算**：
+```
+n = 预计元素数量
+p = 期望误判率（如 0.01）
+m = 需的位数组大小（bit） = -n * ln(p) / (ln(2)^2)
+k = 哈希函数个数 = m/n * ln(2)
+
+示例：n = 100万，p = 1%
+m = -1000000 * ln(0.01) / (ln(2)^2) ≈ 9,585,059 bit ≈ 1.14 MB
+k = 9585059 / 1000000 * ln(2) ≈ 7
+```
+
+**支持删除的方案**：
+- **Counting Bloom Filter**：每个位改为计数器
+- **RedisBf**：Redis 模块提供了支持删除的布隆过滤器
+
+**适用场景**：
+- 数据量大的场景（百万级以上）
+- 能接受极少数误判
+- 数据相对稳定，变更频率低
+
+---
+
+##### **方案三：接口校验**
+
+**原理**：
+在接口层做基础校验，拦截明显非法的请求。
+
+**示例**：
+```java
+public User getUserById(Long userId) {
+    // 基础校验
+    if (userId == null || userId <= 0) {
+        throw new IllegalArgumentException("Invalid user id");
+    }
+
+    // ... 其他逻辑
+}
+```
+
+**优点**：
+- 简单直接
+- 零成本
+
+**缺点**：
+- 只能拦截明显非法请求
+- 无法拦截合法范围内但不存在的数据（如 user_id = 10001，但最大只有 10000）
+
+---
+
+#### **缓存击穿的解决方案**
+
+##### **方案一：热点数据永不过期**
+
+**原理**：
+对于热点 key，不设置过期时间，或者设置很长的过期时间。
+
+**代码示例**：
+```java
+// 热点数据永不过期
+public User getHotUser(Long userId) {
+    User user = redis.get("user:" + userId);
+    if (user != null) {
+        return user;
+    }
+
+    // 数据库查询后永不过期
+    user = db.findById(userId);
+    if (user != null) {
+        redis.set("user:" + userId, user); // 不设置过期时间
+    }
+
+    return user;
+}
+```
+
+**优点**：
+- 实现简单
+- 完全避免击穿
+
+**缺点**：
+- 内存占用增加
+- 数据更新时需要主动更新缓存
+- 不适合变化频繁的数据
+
+**数据一致性处理**：
+- 数据更新时，先更新数据库再删除缓存
+- 使用 Canal 监听 MySQL binlog，自动更新缓存
+
+---
+
+##### **方案二：互斥锁 (Mutex Lock / Distributed Lock)**
+
+**原理**：
+当缓存失效时，只允许一个线程查询数据库，其他线程等待。
+
+**代码示例**：
+```java
+public User getUserById(Long userId) {
+    String key = "user:" + userId;
+
+    // 1. 查询缓存
+    User user = redis.get(key);
+    if (user != null) {
+        return user;
+    }
+
+    // 2. 缓存不存在，获取分布式锁
+    String lockKey = "lock:user:" + userId;
+    try {
+        boolean locked = redis.tryLock(lockKey, 10, TimeUnit.SECONDS);
+        if (!locked) {
+            // 获取锁失败，等待 100ms 后重试
+            Thread.sleep(100);
+            return getUserById(userId); // 递归重试
+        }
+
+        // 3. 双重检查（其他线程可能已经重建缓存）
+        user = redis.get(key);
+        if (user != null) {
+            return user;
+        }
+
+        // 4. 查询数据库
+        user = db.findById(userId);
+        if (user != null) {
+            redis.set(key, user, 30);
+        }
+
+        return user;
+
+    } finally {
+        redis.unlock(lockKey);
+    }
+}
+```
+
+**Redisson 实现**：
+```java
+public User getUserById(Long userId) {
+    String key = "user:" + userId;
+    User user = redis.get(key);
+    if (user != null) {
+        return user;
+    }
+
+    RLock lock = redisson.getLock("lock:user:" + userId);
+    try {
+        // 尝试加锁，最多等待 5 秒，锁自动释放时间 10 秒
+        if (lock.tryLock(5, 10, TimeUnit.SECONDS)) {
+            try {
+                // 双重检查
+                user = redis.get(key);
+                if (user != null) {
+                    return user;
+                }
+
+                // 查询数据库
+                user = db.findById(userId);
+                if (user != null) {
+                    redis.set(key, user, 30);
+                }
+            } finally {
+                lock.unlock();
+            }
+        }
+    } catch (InterruptedException e) {
+        Thread.currentThread().interrupt();
+    }
+
+    return user;
+}
+```
+
+**优点**：
+- 避免大量请求打到数据库
+- 保证数据一致性
+
+**缺点**：
+- 增加了系统复杂度
+- 可能导致部分请求等待
+- 锁超时时间设置需要权衡
+
+**注意事项**：
+- 锁的过期时间要大于数据库查询时间
+- 需要考虑锁释放失败的情况（使用 Redisson 的 watchdog 机制自动续期）
+
+---
+
+##### **方案三：逻辑过期 (Logical Expiration)**
+
+**原理**：
+缓存中不设置 Redis 的 TTL，而是在 value 中存储逻辑过期时间。
+
+**数据结构**：
+```json
+{
+  "data": { "id": 1, "name": "张三" },
+  "expireTime": 1672531200000
+}
+```
+
+**代码示例**：
+```java
+public class CacheData<T> {
+    private T data;
+    private long expireTime;
+
+    public boolean isExpired() {
+        return System.currentTimeMillis() > expireTime;
+    }
+}
+
+public User getUserById(Long userId) {
+    String key = "user:" + userId;
+    CacheData<User> cacheData = redis.get(key);
+
+    // 1. 缓存不存在
+    if (cacheData == null) {
+        // 启动异步线程重建缓存
+        rebuildCacheAsync(userId);
+        return null;
+    }
+
+    // 2. 逻辑未过期
+    if (!cacheData.isExpired()) {
+        return cacheData.getData();
+    }
+
+    // 3. 逻辑已过期，获取锁
+    String lockKey = "lock:user:" + userId;
+    if (redis.tryLock(lockKey)) {
+        // 获取到锁，开启异步线程重建缓存
+        rebuildCacheAsync(userId);
+        redis.unlock(lockKey);
+    }
+
+    // 4. 返回旧数据（允许短期不一致）
+    return cacheData.getData();
+}
+
+// 异步重建缓存
+private void rebuildCacheAsync(Long userId) {
+    executorService.submit(() -> {
+        User user = db.findById(userId);
+        CacheData<User> newCacheData = new CacheData<>();
+        newCacheData.setData(user);
+        newCacheData.setExpireTime(System.currentTimeMillis() + 30 * 60 * 1000);
+        redis.set("user:" + userId, newCacheData);
+    });
+}
+```
+
+**优点**：
+- 请求永远不会等待
+- 性能最优
+
+**缺点**：
+- 实现复杂
+- 允许短期数据不一致
+- 需要维护额外的线程池
+
+**适用场景**：
+- 对一致性要求不高的业务（如推荐列表）
+- 高并发读场景
+
+---
+
+#### **缓存雪崩的解决方案**
+
+##### **方案一：过期时间加随机值**
+
+**原理**：
+在设置过期时间时，增加一个随机值，避免大量 key 同时过期。
+
+**代码示例**：
+```java
+public User getUserById(Long userId) {
+    String key = "user:" + userId;
+    User user = redis.get(key);
+    if (user != null) {
+        return user;
+    }
+
+    user = db.findById(userId);
+
+    // 过期时间 = 基础时间 + 随机时间（如 30 分钟 + 随机 0-5 分钟）
+    int baseExpire = 30 * 60;
+    int randomExpire = ThreadLocalRandom.current().nextInt(0, 5 * 60);
+    redis.set(key, user, baseExpire + randomExpire);
+
+    return user;
+}
+```
+
+**优点**：
+- 实现简单
+- 有效分散过期时间
+
+**缺点**：
+- 无法完全避免雪崩（只是降低概率）
+- 需要合理设置随机范围
+
+---
+
+##### **方案二：缓存预热 (Cache Warm-up)**
+
+**原理**：
+系统启动时或低峰期，提前加载热点数据到缓存。
+
+**实现方式**：
+```java
+@Component
+public class CacheWarmUpService implements ApplicationRunner {
+
+    @Autowired
+    private UserService userService;
+
+    @Autowired
+    private RedisTemplate redisTemplate;
+
+    @Override
+    public void run(ApplicationArguments args) {
+        log.info("开始缓存预热...");
+
+        // 1. 加载热点用户数据
+        List<User> hotUsers = userService.getHotUsers();
+        for (User user : hotUsers) {
+            String key = "user:" + user.getId();
+            redisTemplate.opsForValue().set(key, user, 30, TimeUnit.MINUTES);
+        }
+
+        // 2. 加载热点商品数据
+        List<Product> hotProducts = productService.getHotProducts();
+        for (Product product : hotProducts) {
+            String key = "product:" + product.getId();
+            redisTemplate.opsForValue().set(key, product, 30, TimeUnit.MINUTES);
+        }
+
+        log.info("缓存预热完成，加载用户数: {}, 商品数: {}", hotUsers.size(), hotProducts.size());
+    }
+}
+```
+
+**优点**：
+- 避免系统刚启动时的缓存雪崩
+- 提升用户体验
+
+**缺点**：
+- 增加启动时间
+- 需要识别热点数据
+
+---
+
+##### **方案三：高可用架构**
+
+**原理**：
+构建 Redis 高可用集群，避免单点故障。
+
+**方案对比**：
+
+| 方案 | 可用性 | 复杂度 | 成本 |
+|------|--------|--------|------|
+| **Redis 主从** | 99.9% | 低 | 低 |
+| **Redis Sentinel（哨兵）** | 99.95% | 中 | 中 |
+| **Redis Cluster（集群）** | 99.99% | 高 | 高 |
+
+**Redis Sentinel 示例**：
+```
+Master: 127.0.0.1:6379
+Slave1: 127.0.0.1:6380
+Slave2: 127.0.0.1:6381
+Sentinel1: 127.0.0.1:26379
+Sentinel2: 127.0.0.1:26380
+Sentinel3: 127.0.0.1:26381
+```
+
+**Redis Cluster 示例**：
+```
+3 主 3 从：
+- Master1 (Slots: 0-5460) + Slave1
+- Master2 (Slots: 5461-10922) + Slave2
+- Master3 (Slots: 10923-16383) + Slave3
+```
+
+---
+
+##### **方案四：多级缓存**
+
+**原理**：
+使用本地缓存 + Redis 缓存的多级架构。
+
+**架构图**：
+```
+请求 → 本地缓存 (Caffeine) → Redis 缓存 → 数据库
+```
+
+**代码示例**：
+```java
+@Configuration
+public class CacheConfig {
+
+    @Bean
+    public CacheManager caffeineCacheManager() {
+        CaffeineCacheManager cacheManager = new CaffeineCacheManager();
+        cacheManager.setCaffeine(Caffeine.newBuilder()
+            .maximumSize(10000)
+            .expireAfterWrite(5, TimeUnit.MINUTES)
+            .build());
+        return cacheManager;
+    }
+}
+
+@Service
+public class UserService {
+
+    @Autowired
+    private RedisTemplate redisTemplate;
+
+    @Cacheable(value = "users", key = "#userId") // Caffeine 本地缓存
+    public User getUserById(Long userId) {
+        // 1. 查询 Redis
+        String key = "user:" + userId;
+        User user = redisTemplate.opsForValue().get(key);
+        if (user != null) {
+            return user;
+        }
+
+        // 2. 查询数据库
+        user = db.findById(userId);
+        if (user != null) {
+            redisTemplate.opsForValue().set(key, user, 30, TimeUnit.MINUTES);
+        }
+
+        return user;
+    }
+}
+```
+
+**优点**：
+- 本地缓存速度更快（微秒级）
+- Redis 宕机时本地缓存仍可用
+- 降低 Redis 压力
+
+**缺点**：
+- 本地缓存数据不一致（分布式环境下）
+- 占用应用内存
+
+**适用场景**：
+- 读多写少的场景
+- 允许短期数据不一致
+
+---
+
+##### **方案五：熔断降级**
+
+**原理**：
+当检测到数据库压力过大时，触发熔断，暂时停止重建缓存。
+
+**代码示例**（使用 Sentinel）：
+```java
+@SentinelResource(
+    value = "getUserById",
+    blockHandler = "handleBlock",
+    fallback = "handleFallback"
+)
+public User getUserById(Long userId) {
+    // 正常逻辑
+}
+
+// 限流降级
+public User handleBlock(Long userId, BlockException ex) {
+    // 返回默认值或缓存的旧数据
+    return getDefaultUser();
+}
+
+// 异常降级
+public User handleFallback(Long userId, Throwable ex) {
+    log.error("查询用户失败: {}", userId, ex);
+    return null;
+}
+```
+
+**降级策略配置**：
+```java
+// 规则：QPS 超过 1000 时限流
+List<FlowRule> rules = new ArrayList<>();
+FlowRule rule = new FlowRule();
+rule.setResource("getUserById");
+rule.setGrade(RuleConstant.FLOW_GRADE_QPS);
+rule.setCount(1000);
+rules.add(rule);
+FlowRuleManager.loadRules(rules);
+```
+
+---
+
+### 3. 布隆过滤器深入解析
+
+#### **原理详解**
+
+布隆过滤器由两个核心部分组成：
+1. **位数组**（Bit Array）
+2. **k 个哈希函数**（Hash Functions）
+
+**工作流程**：
+
+```
+添加元素 "hello"：
+1. hash1("hello") = 3 → 设置 bit[3] = 1
+2. hash2("hello") = 5 → 设置 bit[5] = 1
+3. hash3("hello") = 7 → 设置 bit[7] = 1
+
+查询元素 "hello"：
+1. hash1("hello") = 3 → 检查 bit[3] == 1 ✓
+2. hash2("hello") = 5 → 检查 bit[5] == 1 ✓
+3. hash3("hello") = 7 → 检查 bit[7] == 1 ✓
+结果：可能存在
+
+查询元素 "world"：
+1. hash1("world") = 2 → 检查 bit[2] == 0 ✗
+结果：一定不存在
+```
+
+---
+
+#### **为什么会有误判？**
+
+假设：
+- 添加 "hello" → bit[3, 5, 7] = 1
+- 添加 "world" → bit[2, 5, 9] = 1
+- 查询 "test" → hash 后得到 [2, 5, 7]
+- 检查发现 bit[2, 5, 7] 都是 1
+- 但 "test" 实际上不存在！
+
+这就是 **哈希冲突** 导致的假阳性。
+
+---
+
+#### **Guava 布隆过滤器示例**
+
+```java
+import com.google.common.hash.BloomFilter;
+import com.google.common.hash.Funnels;
+import com.google.common.hash.PrimitiveSink;
+
+public class BloomFilterExample {
+
+    public static void main(String[] args) {
+        // 创建布隆过滤器：预计100万数据，1%误判率
+        BloomFilter<Long> bloomFilter = BloomFilter.create(
+            Funnels.longFunnel(),
+            1000000,
+            0.01
+        );
+
+        // 添加数据
+        for (long i = 0; i < 1000000; i++) {
+            bloomFilter.put(i);
+        }
+
+        // 测试
+        int falsePositive = 0;
+        for (long i = 1000000; i < 2000000; i++) {
+            if (bloomFilter.mightContain(i)) {
+                falsePositive++;
+            }
+        }
+
+        System.out.println("误判数量: " + falsePositive);
+        System.out.println("误判率: " + (falsePositive / 1000000.0));
+        // 输出：误判率约 1%
+    }
+}
+```
+
+---
+
+#### **Redis 布隆过滤器 (RedisBloom)**
+
+**安装**：
+```bash
+# Docker
+docker run -p 6379:6379 --name redis-redisbloom redislabs/rebloom:latest
+
+# 编译安装
+git clone https://github.com/RedisBloom/RedisBloom.git
+cd RedisBloom
+make setup
+make build
+```
+
+**使用**：
+```bash
+# 创建布隆过滤器
+BF.ADD userFilter 1001
+BF.ADD userFilter 1002
+BF.ADD userFilter 1003
+
+# 查询
+BF.EXISTS userFilter 1001  # 1 (存在)
+BF.EXISTS userFilter 9999  # 0 (不存在)
+
+# 批量添加
+BF.MADD userFilter 1004 1005 1006
+
+# 批量查询
+BF.MEXISTS userFilter 1001 9999 1004
+# 输出：1) (integer) 1 2) (integer) 0 3) (integer) 1
+```
+
+**Java 客户端（Redisson）**：
+```java
+// 获取布隆过滤器
+RBloomFilter<Long> bloomFilter = redisson.getBloomFilter("userFilter");
+
+// 初始化（预计100万数据，1%误判率）
+bloomFilter.tryInit(1000000L, 0.01);
+
+// 添加元素
+bloomFilter.add(1001L);
+bloomFilter.add(1002L);
+
+// 查询
+boolean exists = bloomFilter.contains(1001L); // true
+boolean notExists = bloomFilter.contains(9999L); // false
+
+// 删除布隆过滤器
+bloomFilter.delete();
+```
+
+---
+
+#### **布隆过滤器的局限性**
+
+1. **不支持删除**（传统）
+   - 删除一个 bit 可能影响其他元素
+   - 解决方案：使用 Counting Bloom Filter（但会增加内存占用）
+
+2. **误判率随数据量增加**
+   - 需要根据业务设置合理的大小
+   - 可以通过重建解决
+
+3. **需要预热**
+   - 系统启动时需要加载所有有效数据
+   - 数据变更时需要更新
+
+4. **不支持范围查询**
+   - 只能判断精确的 key 是否存在
+   - 无法查询 "user:1000 到 user:2000 之间有哪些"
+
+---
+
+### 4. 监控和预警
+
+#### **关键指标监控**
+
+```java
+@Component
+public class CacheMonitor {
+
+    @Autowired
+    private RedisTemplate redisTemplate;
+
+    private final AtomicLong cacheHits = new AtomicLong(0);
+    private final AtomicLong cacheMisses = new AtomicLong(0);
+
+    // 记录缓存命中
+    public void recordHit() {
+        cacheHits.incrementAndGet();
+    }
+
+    // 记录缓存未命中
+    public void recordMiss() {
+        cacheMisses.incrementAndGet();
+    }
+
+    // 计算缓存命中率
+    public double getHitRate() {
+        long hits = cacheHits.get();
+        long misses = cacheMisses.get();
+        long total = hits + misses;
+        return total == 0 ? 0 : (double) hits / total;
+    }
+
+    // 定时上报监控数据
+    @Scheduled(fixedRate = 60000)
+    public void reportMetrics() {
+        double hitRate = getHitRate();
+        long dbQueryCount = cacheMisses.get();
+
+        // 上报到监控系统（如 Prometheus、Grafana）
+        Metrics.gauge("cache.hit.rate", hitRate);
+        Metrics.gauge("cache.db.query.count", dbQueryCount);
+
+        // 告警判断
+        if (hitRate < 0.8) {
+            alert("缓存命中率过低: " + hitRate);
+        }
+
+        if (dbQueryCount > 10000) {
+            alert("数据库查询量过大: " + dbQueryCount);
+        }
+    }
+}
+```
+
+---
+
+#### **Prometheus + Grafana 监控**
+
+**依赖**：
+```xml
+<dependency>
+    <groupId>io.micrometer</groupId>
+    <artifactId>micrometer-registry-prometheus</artifactId>
+</dependency>
+```
+
+**配置**：
+```yaml
+# application.yml
+management:
+  endpoints:
+    web:
+      exposure:
+        include: prometheus
+  metrics:
+    export:
+      prometheus:
+        enabled: true
+```
+
+**自定义指标**：
+```java
+@Component
+public class CacheMetrics {
+
+    private final Counter cacheHits;
+    private final Counter cacheMisses;
+    private final Timer dbQueryTimer;
+
+    public CacheMetrics(MeterRegistry registry) {
+        this.cacheHits = Counter.builder("cache.hits")
+            .description("Cache hit count")
+            .register(registry);
+
+        this.cacheMisses = Counter.builder("cache.misses")
+            .description("Cache miss count")
+            .register(registry);
+
+        this.dbQueryTimer = Timer.builder("db.query.duration")
+            .description("Database query duration")
+            .register(registry);
+    }
+
+    public void recordCacheHit() {
+        cacheHits.increment();
+    }
+
+    public void recordCacheMiss() {
+        cacheMisses.increment();
+    }
+
+    public void recordDbQuery(Runnable query) {
+        dbQueryTimer.record(query);
+    }
+}
+```
+
+**Grafana 监控面板**：
+```
+1. 缓存命中率趋势图
+   - Query: rate(cache_hits[5m]) / (rate(cache_hits[5m]) + rate(cache_misses[5m]))
+
+2. 数据库 QPS 趋势图
+   - Query: rate(cache_misses[5m])
+
+3. Redis 慢查询统计
+   - Query: redis_slowlog_length
+
+4. Redis 内存使用率
+   - Query: redis_memory_used_bytes / redis_memory_max_bytes
+```
+
+---
+
+#### **告警规则**
+
+```yaml
+# Prometheus 告警规则
+groups:
+  - name: cache_alerts
+    rules:
+      # 缓存命中率过低
+      - alert: LowCacheHitRate
+        expr: |
+          rate(cache_hits[5m]) / (rate(cache_hits[5m]) + rate(cache_misses[5m])) < 0.8
+        for: 5m
+        labels:
+          severity: warning
+        annotations:
+          summary: "缓存命中率低于 80%"
+          description: "当前命中率: {{ $value }}"
+
+      # 数据库压力过大
+      - alert: HighDatabaseLoad
+        expr: rate(cache_misses[5m]) > 1000
+        for: 3m
+        labels:
+          severity: critical
+        annotations:
+          summary: "数据库压力过大"
+          description: "数据库 QPS: {{ $value }}"
+
+      # Redis 宕机
+      - alert: RedisDown
+        expr: up{job="redis"} == 0
+        for: 1m
+        labels:
+          severity: critical
+        annotations:
+          summary: "Redis 实例宕机"
+```
+
+---
+
+### 5. 实际项目经验（加分项）
+
+#### **案例 1：电商首页缓存雪崩**
+
+**问题**：
+系统重启后，大量商品缓存同时失效，导致数据库 QPS 从 500 飙升到 50000+。
+
+**解决方案**：
+1. 过期时间加随机值（30 分钟 ± 5 分钟）
+2. 实现缓存预热机制（启动时加载 Top 1000 热点商品）
+3. 引入 Sentinel 限流（数据库 QPS 限制在 5000 以内）
+
+**效果**：
+- 缓存命中率稳定在 95% 以上
+- 数据库 QPS 下降到正常水平
+
+---
+
+#### **案例 2：恶意爬虫导致缓存穿透**
+
+**问题**：
+恶意爬虫使用递增的用户 ID（从 1 到 1000 万）疯狂爬取用户信息，导致数据库被打挂。
+
+**解决方案**：
+1. 接口限流（单 IP 每秒最多 10 次请求）
+2. 布隆过滤器过滤无效 ID（预加载 500 万真实用户 ID）
+3. 缓存空对象（TTL = 5 分钟）
+
+**效果**：
+- 99% 的非法请求被布隆过滤器拦截
+- 数据库压力下降 95%
+
+---
+
+#### **案例 3：秒杀活动缓存击穿**
+
+**问题**：
+秒杀活动开始前，商品信息缓存过期，10 万 QPS 同时打到数据库。
+
+**解决方案**：
+1. 热点数据永不过期（使用逻辑过期）
+2. 异步重建缓存（允许返回 5 秒内的旧数据）
+3. 分布式锁保护数据库查询
+
+**效果**：
+- 数据库 QPS 从 10 万下降到 100
+- 用户体验几乎无影响
+
+---
+
+### 6. 总结对比表
+
+| 问题 | 原因 | 核心方案 | 适用场景 |
+|------|------|----------|----------|
+| **缓存穿透** | 查询不存在的数据 | 布隆过滤器 + 空对象缓存 | 防恶意攻击、数据量大的场景 |
+| **缓存击穿** | 热点 key 过期 | 互斥锁 / 逻辑过期 / 热点永不过期 | 秒杀、热点新闻等高并发场景 |
+| **缓存雪崩** | 大量 key 同时过期 | 过期时间随机 + 预热 + 高可用集群 | 系统重启、批量导入场景 |
+
+---
+
+### 7. 阿里 P7 加分项
+
+**深度理解**：
+- 理解布隆过滤器的数学原理和参数计算
+- 能设计多级缓存架构（本地缓存 + Redis + 数据库）
+- 理解缓存一致性的各种方案和权衡
+
+**实战能力**：
+- 有处理线上缓存雪崩/击穿/穿透的实际案例
+- 能设计完整的监控和告警体系
+- 有性能调优经验（如 Redis 连接池优化、Pipeline 批量操作）
+
+**架构设计**：
+- 能设计支持热点数据自动识别和预热的系统
+- 有灰度发布和降级预案
+- 考虑缓存数据的版本管理和回滚机制