Files

yasinshaw d80d1cf553 feat: add infrastructure interview questions

Add comprehensive interview materials for:
- Service Mesh (Istio, Linkerd)
- RPC Framework (Dubbo, gRPC)
- Container Orchestration (Kubernetes)
- CI/CD (Jenkins, GitLab CI, GitHub Actions)
- Observability (Monitoring, Logging, Tracing)

Each file includes:
- 5-10 core questions
- Detailed standard answers
- Code examples
- Real-world project experience
- Alibaba P7 bonus points

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

2026-03-01 00:06:28 +08:00

21 KiB

Raw Blame History

RPC 框架

问题

背景：在分布式系统中，服务间通信需要高效、可靠的远程调用机制。RPC（Remote Procedure Call）框架屏蔽了网络通信的复杂性，使远程调用像本地调用一样简单。

问题：

什么是 RPC？它和 HTTP REST 有什么区别？
Dubbo 的核心架构和工作原理是什么？
gRPC 的优势是什么？它如何实现高性能？
请描述 Dubbo 的负载均衡策略
Dubbo 的服务注册与发现机制是怎样的？
RPC 框架如何实现序列化？常见的序列化协议有哪些？
在实际项目中如何选择 RPC 框架？
RPC 框架如何处理超时、重试和熔断？

标准答案

1. RPC vs HTTP REST

RPC 定义：

远程过程调用（Remote Procedure Call）是一种计算机通信协议，允许运行在一台计算机的程序调用另一台计算机的子程序，而开发者无需额外编码这种交互。

对比表：

特性	RPC (Dubbo/gRPC)	HTTP REST
传输协议	TCP (长连接)	HTTP/1.1 (短连接) / HTTP/2
序列化	二进制（Hessian/Protobuf）	JSON/XML
性能	高（紧凑、高效）	中（文本解析开销）
易用性	需要接口定义	无需定义，浏览器直接访问
耦合度	强耦合（需要 stub 代码）	松耦合
流量管理	需要网关	天然支持（Nginx等）
适用场景	内部微服务通信	对外 API、跨语言调用

代码对比：

RPC 调用（Dubbo）：

// 服务提供者
public interface UserService {
    User getUserById(Long id);
}

// 服务消费者
// 像调用本地方法一样调用远程服务
@Reference
private UserService userService;

public void process() {
    User user = userService.getUserById(1L);
}

HTTP REST 调用：

// 服务提供者
@RestController
@RequestMapping("/api/users")
public class UserController {
    @GetMapping("/{id}")
    public User getUserById(@PathVariable Long id) {
        return userService.getById(id);
    }
}

// 服务消费者
RestTemplate restTemplate = new RestTemplate();
public void process() {
    String url = "http://user-service/api/users/1";
    User user = restTemplate.getForObject(url, User.class);
}

2. Dubbo 核心架构

架构图：

                    ┌─────────────────┐
                    │   Registry      │
                    │  (注册中心)      │
                    │  Zookeeper/Nacos│
                    └─────────────────┘
                           ▲   ▲
                           │   │
          Register         │   │         Subscribe
          (注册)            │   │          (订阅)
                           │   │
    ┌──────────────────────┴───┴──────────────────────┐
    │                                              │
    │  Provider                        Consumer     │
    │  ┌──────────┐                    ┌──────────┐│
    │  │Protocol  │                    │Protocol  ││
    │  │  (协议层) │                    │  (协议层) ││
    │  └──────────┘                    └──────────┘│
    │  ┌──────────┐                    ┌──────────┐│
    │  │  Cluster │◄──────────────────►│  Cluster ││
    │  │  (集群层) │    Directory      │  (集群层) ││
    │  └──────────┘                    └──────────┘│
    │  ┌──────────┐                    ┌──────────┐│
    │  │   Proxy  │                    │   Proxy  ││
    │  │ (代理层)  │                    │ (代理层) ││
    │  └──────────┘                    └──────────┘│
    │  ┌──────────┐                    ┌──────────┐│
    │  │  Service │                    │  Service ││
    │  │  (服务层) │                    │  (服务层) ││
    │  └──────────┘                    └──────────┘│
    └─────────────────────────────────────────────┘
                │
                │ Invoke
                │ (调用)
                ▼
         ┌──────────┐
         │  Channel │
         │ (网络层)  │
         └──────────┘
         │
         │ Exchange
         │ (数据交换)
         ▼
         ┌──────────┐
         │  Serialize│
         │  (序列化)  │
         └──────────┘

核心角色：

1. Container（服务容器）

负责启动、加载和运行服务提供者
通常是 Spring 容器

2. Provider（服务提供者）

暴露服务的应用
启动时向注册中心注册服务

3. Consumer（服务消费者）

调用远程服务的应用
启动时向注册中心订阅服务

4. Registry（注册中心）

服务注册与发现
常见实现：Zookeeper、Nacos、Redis

5. Monitor（监控中心）

统计服务调用次数和调用时间
常见实现：Dubbo Admin、Prometheus

代码示例：

服务提供者配置：

<!-- provider.xml -->
<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xmlns:dubbo="http://dubbo.apache.org/schema/dubbo"
       xsi:schemaLocation="http://www.springframework.org/schema/beans
       http://www.springframework.org/schema/beans/spring-beans.xsd
       http://dubbo.apache.org/schema/dubbo
       http://dubbo.apache.org/schema/dubbo/dubbo.xsd">

    <!-- 提供方应用信息 -->
    <dubbo:application name="user-provider"/>

    <!-- 使用 Zookeeper 注册中心 -->
    <dubbo:registry address="zookeeper://127.0.0.1:2181"/>

    <!-- 使用 dubbo 协议暴露服务 -->
    <dubbo:protocol name="dubbo" port="20880"/>

    <!-- 声明需要暴露的服务接口 -->
    <dubbo:service interface="com.example.UserService"
                   ref="userService" version="1.0.0"/>

    <!-- 服务实现 -->
    <bean id="userService" class="com.example.UserServiceImpl"/>
</beans>

服务消费者配置：

<!-- consumer.xml -->
<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xmlns:dubbo="http://dubbo.apache.org/schema/dubbo"
       xsi:schemaLocation="http://www.springframework.org/schema/beans
       http://www.springframework.org/schema/beans/spring-beans.xsd
       http://dubbo.apache.org/schema/dubbo
       http://dubbo.apache.org/schema/dubbo/dubbo.xsd">

    <!-- 消费方应用信息 -->
    <dubbo:application name="user-consumer"/>

    <!-- 使用 Zookeeper 注册中心 -->
    <dubbo:registry address="zookeeper://127.0.0.1:2181"/>

    <!-- 生成远程服务代理 -->
    <dubbo:reference id="userService"
                     interface="com.example.UserService"
                     version="1.0.0"
                     timeout="3000"
                     retries="2"/>
</beans>

3. gRPC 高性能原理

核心特性：

1. HTTP/2 多路复用

HTTP/1.1:
Request 1 ──► TCP Connection 1 ──► Response 1
Request 2 ──► TCP Connection 2 ──► Response 2
Request 3 ──► TCP Connection 3 ──► Response 3

HTTP/2:
Request 1 ──┐
Request 2 ──┼─► TCP Connection ──► Response 1
Request 3 ──┘                       Response 2
                                     Response 3

2. Protobuf 二进制序列化

// user.proto
syntax = "proto3";

package user;

service UserService {
  rpc GetUser(GetUserRequest) returns (User);
  rpc ListUsers(ListUsersRequest) returns (ListUsersResponse);
}

message User {
  int64 id = 1;
  string name = 2;
  string email = 3;
}

message GetUserRequest {
  int64 id = 1;
}

message ListUsersRequest {
  int32 page = 1;
  int32 size = 2;
}

message ListUsersResponse {
  repeated User users = 1;
  int32 total = 2;
}

性能对比：

JSON: {"id":1,"name":"Alice","email":"alice@example.com"}
     └─ 56 字节

Protobuf: [0x08 0x01 0x12 0x05 0x41 0x6C 0x69 0x63 0x65 ...]
        └─ ~20 字节（压缩 60%+）

3. 流式传输

# 服务端流式 RPC
async def ListUsers(request, context):
    for user in database.iter_users():
        yield user  # 持续发送，无需等待全部数据

# 客户端流式 RPC
async def UploadUsers(request_iterator, context):
    for user_request in request_iterator:
        database.save(user_request.user)
    return UploadStatus(success=True)

# 双向流式 RPC
async def Chat(request_iterator, context):
    async for msg in request_iterator:
        response = process_message(msg)
        yield response

代码示例（Python）：

服务端：

import grpc
from concurrent import futures
import user_pb2
import user_pb2_grpc

class UserServiceImpl(user_pb2_grpc.UserServiceServicer):
    def GetUser(self, request, context):
        # 查询数据库
        user = db.query(User).filter_by(id=request.id).first()
        return user_pb2.User(
            id=user.id,
            name=user.name,
            email=user.email
        )

    def ListUsers(self, request, context):
        # 服务端流式响应
        users = db.query(User).limit(request.size).offset(request.page * request.size)
        for user in users:
            yield user_pb2.User(id=user.id, name=user.name, email=user.email)

def serve():
    server = grpc.server(futures.ThreadPoolExecutor(max_workers=10))
    user_pb2_grpc.add_UserServiceServicer_to_server(UserServiceImpl(), server)
    server.add_insecure_port('[::]:50051')
    server.start()
    server.wait_for_termination()

if __name__ == '__main__':
    serve()

客户端：

import grpc
import user_pb2
import user_pb2_grpc

def run():
    with grpc.insecure_channel('localhost:50051') as channel:
        stub = user_pb2_grpc.UserServiceStub(channel)

        # 简单 RPC
        response = stub.GetUser(user_pb2.GetUserRequest(id=1))
        print(f"User: {response.name}")

        # 服务端流式 RPC
        for user in stub.ListUsers(user_pb2.ListUsersRequest(page=0, size=10)):
            print(f"User: {user.name}")

if __name__ == '__main__':
    run()

4. Dubbo 负载均衡策略

策略对比：

策略	说明	适用场景
Random（随机）	随机选择 provider	性能相近的实例
RoundRobin（轮询）	按权重轮询	性能有差异的实例
LeastActive（最少活跃）	优先调用活跃数少的	性能差异大
ConsistentHash（一致性哈希）	相同参数路由到同一 provider	有状态服务
ShortestResponse（最短响应）	优先选择响应时间短的	对延迟敏感

代码示例：

配置负载均衡：

<dubbo:reference id="userService"
                 interface="com.example.UserService"
                 loadbalance="roundRobin"  <!-- 轮询 -->
                 timeout="3000"/>

自定义负载均衡：

public class CustomLoadBalance extends AbstractLoadBalance {
    @Override
    protected <T> Invoker<T> doSelect(List<Invoker<T>> invokers, URL url, Invocation invocation) {
        // 自定义负载均衡逻辑
        // 例如：基于地理位置的负载均衡
        String location = getUserLocation();
        return invokers.stream()
            .filter(invoker -> invoker.getUrl().getParameter("location").equals(location))
            .findFirst()
            .orElse(invokers.get(0));
    }
}

// 注册自定义负载均衡
SPI.register(CustomLoadBalance.class);

LeastActive 原理：

Provider A: Active = 5 (正在处理 5 个请求)
Provider B: Active = 2 (正在处理 2 个请求)
Provider C: Active = 8 (正在处理 8 个请求)

选择顺序：B > A > C
原因：B 的负载最轻，应该优先分配

5. 服务注册与发现

Zookeeper 实现：

目录结构：

/dubbo
  └─ com.example.UserService
      ├─ providers
      │   ├─ dubbo://192.168.1.10:20880/...?version=1.0.0
      │   ├─ dubbo://192.168.1.11:20880/...?version=1.0.0
      │   └─ dubbo://192.168.1.12:20880/...?version=1.0.0
      └─ consumers
          └─ consumer://192.168.1.20/...?version=1.0.0

工作流程：

1. Provider 启动
   ↓
2. 创建临时节点 /dubbo/.../providers/dubbo://ip:port/...
   ↓
3. Consumer 启动
   ↓
4. 订阅 /dubbo/.../providers/ 节点
   ↓
5. 获取 provider 列表
   ↓
6. 监听 provider 变化（新增/下线）
   ↓
7. 动态更新本地缓存

代码示例（Zookeeper）：

// 注册中心配置
RegistryConfig registry = new RegistryConfig();
registry.setAddress("zookeeper://127.0.0.1:2181");
registry.setTimeout(5000);

// 或者使用 Nacos
RegistryConfig registry = new RegistryConfig();
registry.setAddress("nacos://127.0.0.1:8848");

服务健康检查：

// Dubbo 心跳机制
public class HeartbeatTask implements Runnable {
    @Override
    public void run() {
        // 每隔 5 秒发送心跳
        channel.send heartbeat();
    }
}

// Zookeeper 临时节点特性
// - Provider 断开连接后，临时节点自动删除
// - Consumer 立即感知到下线，剔除该 provider

6. 序列化协议对比

常见序列化协议：

协议	优点	缺点	适用场景
Hessian	简单、高效	不支持跨语言	Dubbo 默认
Protobuf	高性能、跨语言	需要定义 .proto	gRPC
JSON	易读、跨语言	冗长、解析慢	HTTP REST
Kryo	高性能	不支持跨语言	Dubbo
Avro	动态 schema、跨语言	性能略低	Hadoop 生态
FST	高性能、兼容 JDK	不支持跨语言	Dubbo

性能对比：

序列化性能排名（从快到慢）：
Kryo > FST > Protobuf > Hessian > Avro > JSON

序列化后大小排名（从小到大）：
Protobuf ≈ Kryo < Hessian < Avro < JSON

代码示例（Protobuf）：

// user.proto
syntax = "proto3";

message User {
    int64 id = 1;
    string name = 2;
    string email = 3;
    repeated string tags = 4;
}

# 编译 Protobuf
protoc --python_out=. user.proto

# Python 序列化
import user_pb2

user = user_pb2.User()
user.id = 1
user.name = "Alice"
user.email = "alice@example.com"
user.tags.extend(["vip", "active"])

# 序列化
serialized = user.SerializeToString()  # 二进制数据

# 反序列化
user2 = user_pb2.User()
user2.ParseFromString(serialized)

7. RPC 框架选型

选型决策树：

是否需要跨语言调用？
├─ 是 → gRPC（Protobuf 跨语言支持最好）
└─ 否 → 继续判断

是否需要高性能？
├─ 是 → Dubbo（TCP 长连接、Hessian 序列化）
└─ 否 → 继续判断

是否需要简单易用？
├─ 是 → Spring Cloud OpenFeign（基于 HTTP REST）
└─ 否 → Dubbo

已有技术栈？
├─ Spring Cloud → OpenFeign/Dubbo
├─ Kubernetes → gRPC（服务网格友好）
└─ Dubbo → 继续使用 Dubbo

实际项目经验：

场景 1：电商内部服务

选择：Dubbo
原因：
- 内部服务，都是 Java 技术栈
- 对性能要求高（高并发下单）
- 需要负载均衡、熔断降级

配置：
- 使用 Hessian 序列化
- Zookeeper 注册中心
- LeastActive 负载均衡

场景 2：跨语言微服务

选择：gRPC
原因：
- 后端 Java，数据分析 Python，AI 服务 Go
- 需要统一的服务间通信协议
- Protobuf 高性能且跨语言

配置：
- Protobuf 定义接口
- HTTP/2 传输
- 多语言代码生成

8. 超时、重试和熔断

超时配置：

<!-- Dubbo 超时 -->
<dubbo:reference id="userService"
                 interface="com.example.UserService"
                 timeout="3000"/>  <!-- 3 秒超时 -->

<!-- 方法级超时 -->
<dubbo:reference id="userService"
                 interface="com.example.UserService">
    <dubbo:method name="getUserById" timeout="1000"/>
    <dubbo:method name="listUsers" timeout="5000"/>
</dubbo:reference>

重试机制：

<dubbo:reference id="userService"
                 interface="com.example.UserService"
                 retries="2"/>  <!-- 失败后重试 2 次 -->

<!-- 工作流程 -->
第一次调用 → 失败
    ↓
第二次调用 → 失败
    ↓
第三次调用 → 成功/失败

注意：幂等性操作才能重试（如查询），非幂等操作（如下单）不能重试

<!-- 非幂等操作禁用重试 -->
<dubbo:method name="createOrder" retries="0"/>

熔断降级（Dubbo）：

// 使用 Sentinel 实现熔断
@SentinelResource(value = "getUserById",
    blockHandler = "handleBlock",
    fallback = "handleFallback")
public User getUserById(Long id) {
    return userService.getUserById(id);
}

// 熔断处理
public User handleBlock(Long id, BlockException ex) {
    // 熔断时返回默认值
    return new User(-1L, "Default", "default@example.com");
}

// 降级处理
public User handleFallback(Long id, Throwable ex) {
    // 异常时返回降级数据
    return new User(-1L, "Fallback", "fallback@example.com");
}

熔断规则配置：

// Sentinel 熔断规则
List<DegradeRule> rules = new ArrayList<>();
DegradeRule rule = new DegradeRule();
rule.setResource("getUserById");
rule.setGrade(RuleConstant.DEGRADE_GRADE_RT);  // 平均响应时间
rule.setCount(100);  // 100ms
rule.setTimeWindow(10);  // 10 秒熔断时间
rules.add(rule);
DegradeRuleManager.loadRules(rules);

9. 实际项目经验

场景 1：订单系统性能优化

问题：订单创建接口延迟高（2 秒）
排查：
1. 调用链追踪发现库存服务耗时最长
2. 库存服务使用 HTTP REST，JSON 序列化慢
3. 每次调用都建立新连接

解决：
1. 将库存服务从 HTTP REST 迁移到 Dubbo
2. 使用 Hessian 序列化
3. 启用长连接复用
4. 配置 LeastActive 负载均衡

结果：延迟降低到 300ms（提升 85%）

场景 2：服务注册中心故障

问题：Zookeeper 集群故障，服务调用失败
排查：
Consumer 每次调用都查询注册中心，导致无法发现服务

解决：
1. Dubbo 默认会缓存 provider 列表到本地
2. 配置缓存策略
   <dubbo:registry address="zookeeper://127.0.0.1:2181"
                  file="${user.home}/output/dubbo.cache"/>
3. 注册中心故障时，使用本地缓存

结果：注册中心故障不影响已有服务调用

场景 3：序列化兼容性问题

问题：升级服务版本后，旧客户端调用失败
原因：
- 新增字段使用了不可序列化的类型
- 客户端版本不兼容

解决：
1. Protobuf 默认兼容（新增字段不影响）
2. Hessian 需要保证序列化 ID 一致
3. 使用版本号区分服务
   <dubbo:service interface="..." version="1.0.0"/>
   <dubbo:service interface="..." version="2.0.0"/>
4. 灰度升级，逐步切换流量

结果：平滑升级，零停机

10. 阿里 P7 加分项

架构设计能力：

设计过大规模 RPC 框架的集群架构（百万级 QPS）
有自定义 RPC 框架开发经验
实现过服务网格与传统 RPC 框架的融合

深度理解：

熟悉 Dubbo 源码（SPI 机制、代理设计、集群容错）
理解 gRPC 的 HTTP/2 和 Protobuf 底层原理
有序列化协议的选型和优化经验

性能调优：

优化过 TCP 参数（连接池、KeepAlive、缓冲区大小）
调整过 JVM 参数减少 GC（减少对象创建、使用堆外内存）
优化过网络参数（MTU、TCP_NODELAY）

生产实践：

解决过 TCP 粘包/拆包问题
处理过序列化安全漏洞（如 Hessian 反序列化 RCE）
实现过服务优雅上下线（注册预热、优雅停机）

可观测性：

集成过分布式追踪（SkyWalking、Jaeger）
实现过 RPC 调用链路监控
设计过服务性能指标大盘（QPS、延迟、成功率）

跨语言调用：

有 gRPC 多语言实现经验（Java、Go、Python）
解决过 Protobuf 跨语言兼容性问题
实现过动态代理生成（如 Python 调用 Java 服务）

21 KiB Raw Blame History Unescape Escape

RPC 框架

问题

标准答案

1. RPC vs HTTP REST

RPC 定义：

对比表：

代码对比：

2. Dubbo 核心架构

架构图：

核心角色：

代码示例：

3. gRPC 高性能原理

核心特性：

代码示例（Python）：

4. Dubbo 负载均衡策略

策略对比：

代码示例：

LeastActive 原理：

5. 服务注册与发现

Zookeeper 实现：

代码示例（Zookeeper）：

服务健康检查：

6. 序列化协议对比

常见序列化协议：

性能对比：

代码示例（Protobuf）：

7. RPC 框架选型

选型决策树：

实际项目经验：

8. 超时、重试和熔断

超时配置：

重试机制：

熔断降级（Dubbo）：

9. 实际项目经验

场景 1：订单系统性能优化

场景 2：服务注册中心故障

场景 3：序列化兼容性问题

10. 阿里 P7 加分项

21 KiB

Raw Blame History