Envoy — 高性能代理
架构设计
Envoy 是 Lyft 开源的高性能 L4/L7 代理,是 Istio 数据平面的核心组件,也可独立部署。
Downstream ──► Listener ──► Filter Chain ──► Router ──► Cluster ──► Upstream
│
HTTP Filters:
- JWT Auth
- Rate Limit
- CORS
- Lua
- WASMxDS API(动态配置)
Envoy 通过 xDS API 从控制平面(如 Istiod)动态获取配置:
| API | 说明 |
|---|---|
| LDS | Listener Discovery Service,监听器配置 |
| RDS | Route Discovery Service,路由配置 |
| CDS | Cluster Discovery Service,集群配置 |
| EDS | Endpoint Discovery Service,端点配置 |
| SDS | Secret Discovery Service,证书配置 |
| ADS | Aggregated Discovery Service,聚合所有 xDS |
静态配置示例
yaml
# envoy.yaml
static_resources:
listeners:
- name: listener_0
address:
socket_address:
address: 0.0.0.0
port_value: 10000
filter_chains:
- filters:
- name: envoy.filters.network.http_connection_manager
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
stat_prefix: ingress_http
access_log:
- name: envoy.access_loggers.stdout
typed_config:
"@type": type.googleapis.com/envoy.extensions.access_loggers.stream.v3.StdoutAccessLog
http_filters:
- name: envoy.filters.http.router
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
route_config:
name: local_route
virtual_hosts:
- name: local_service
domains: ["*"]
routes:
- match:
prefix: "/api/orders"
route:
cluster: order_service
timeout: 5s
retry_policy:
retry_on: "5xx,connect-failure"
num_retries: 3
clusters:
- name: order_service
connect_timeout: 1s
type: STRICT_DNS
lb_policy: ROUND_ROBIN
load_assignment:
cluster_name: order_service
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: order-service
port_value: 8080
health_checks:
- timeout: 1s
interval: 5s
unhealthy_threshold: 3
healthy_threshold: 2
http_health_check:
path: /health
admin:
address:
socket_address:
address: 0.0.0.0
port_value: 9901熔断配置
yaml
clusters:
- name: order_service
circuit_breakers:
thresholds:
- priority: DEFAULT
max_connections: 100 # 最大连接数
max_pending_requests: 100 # 最大等待请求数
max_requests: 1000 # 最大并发请求数
max_retries: 3 # 最大重试数
outlier_detection:
consecutive_5xx: 5 # 连续5次5xx触发驱逐
interval: 30s
base_ejection_time: 30s
max_ejection_percent: 50TLS 配置
yaml
# 下游 TLS(接受 HTTPS)
filter_chains:
- transport_socket:
name: envoy.transport_sockets.tls
typed_config:
"@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.DownstreamTlsContext
common_tls_context:
tls_certificates:
- certificate_chain:
filename: /etc/ssl/certs/server.crt
private_key:
filename: /etc/ssl/private/server.key
# 上游 mTLS(与上游服务双向认证)
clusters:
- name: order_service
transport_socket:
name: envoy.transport_sockets.tls
typed_config:
"@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.UpstreamTlsContext
common_tls_context:
tls_certificates:
- certificate_chain:
filename: /etc/ssl/certs/client.crt
private_key:
filename: /etc/ssl/private/client.key
validation_context:
trusted_ca:
filename: /etc/ssl/certs/ca.crt可观测性
访问日志格式
yaml
access_log:
- name: envoy.access_loggers.stdout
typed_config:
"@type": type.googleapis.com/envoy.extensions.access_loggers.stream.v3.StdoutAccessLog
log_format:
json_format:
timestamp: "%START_TIME%"
method: "%REQ(:METHOD)%"
path: "%REQ(X-ENVOY-ORIGINAL-PATH?:PATH)%"
protocol: "%PROTOCOL%"
response_code: "%RESPONSE_CODE%"
duration: "%DURATION%"
upstream_host: "%UPSTREAM_HOST%"
upstream_cluster: "%UPSTREAM_CLUSTER%"
request_id: "%REQ(X-REQUEST-ID)%"
bytes_sent: "%BYTES_SENT%"
bytes_received: "%BYTES_RECEIVED%"Admin API
bash
# 查看集群状态
curl http://localhost:9901/clusters
# 查看监听器
curl http://localhost:9901/listeners
# 查看路由
curl http://localhost:9901/config_dump | python3 -m json.tool | grep -A 20 "route_config"
# 查看统计指标
curl http://localhost:9901/stats/prometheus
# 动态修改日志级别
curl -X POST "http://localhost:9901/logging?level=debug"
# 健康检查
curl http://localhost:9901/healthcheck/okEnvoy 作为 Sidecar
在 Kubernetes 中,Istio 自动注入 Envoy Sidecar:
Pod 网络流量拦截(iptables):
所有入站流量 → 端口 15006(Envoy)→ 应用
所有出站流量 → 端口 15001(Envoy)→ 目标服务
例外:
- 127.0.0.1 流量不拦截(本地通信)
- 端口 15090(Envoy 指标)不拦截故障处理案例
案例一:上游连接超时
现象:Envoy 日志出现 upstream_reset_before_response_started{connection_termination}。
排查:
bash
# 查看上游集群健康状态
curl http://localhost:9901/clusters | grep order_service
# 查看连接池状态
curl http://localhost:9901/stats | grep "order_service.upstream_cx"解决:
- 检查上游服务是否正常
- 调整
connect_timeout和circuit_breakers
案例二:请求被熔断(overflow)
现象:返回 503,Envoy 日志显示 upstream_overflow。
原因:并发请求超过 max_pending_requests 或 max_requests。
解决:
yaml
circuit_breakers:
thresholds:
- max_connections: 1000
max_pending_requests: 1000
max_requests: 10000案例三:证书过期
现象:mTLS 握手失败,日志出现 CERTIFICATE_VERIFY_FAILED。
排查:
bash
# 查看证书信息
curl http://localhost:9901/certs
# 检查证书有效期
openssl x509 -in /etc/ssl/certs/server.crt -noout -dates解决:更新证书,Istio 使用 SDS 动态下发证书,无需重启 Envoy。