Kubernetes 核心架构
架构概览
Control Plane(控制平面):
├── API Server → 所有操作的入口,RESTful API
├── etcd → 集群状态存储
├── Scheduler → Pod 调度决策
└── Controller Manager → 控制器(Deployment/ReplicaSet/Service 等)
Worker Node(工作节点):
├── kubelet → 节点代理,管理 Pod 生命周期
├── kube-proxy → 网络代理,实现 Service 负载均衡
└── Container Runtime(containerd/CRI-O)核心资源对象
Pod
yaml
apiVersion: v1
kind: Pod
metadata:
name: order-pod
labels:
app: order-service
version: v1.2.0
spec:
containers:
- name: order-service
image: order-service:1.2.0
ports:
- containerPort: 8080
resources:
requests:
cpu: "100m"
memory: "256Mi"
limits:
cpu: "500m"
memory: "512Mi"
readinessProbe:
httpGet:
path: /actuator/health/readiness
port: 8080
initialDelaySeconds: 10
periodSeconds: 5
failureThreshold: 3
livenessProbe:
httpGet:
path: /actuator/health/liveness
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
failureThreshold: 3
env:
- name: SPRING_PROFILES_ACTIVE
value: "production"
- name: DB_PASSWORD
valueFrom:
secretKeyRef:
name: db-secret
key: password
volumeMounts:
- name: config
mountPath: /app/config
volumes:
- name: config
configMap:
name: order-configDeployment
yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: order-service
namespace: production
spec:
replicas: 3
selector:
matchLabels:
app: order-service
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1 # 最多多出1个 Pod
maxUnavailable: 0 # 滚动更新时不允许不可用
template:
metadata:
labels:
app: order-service
version: v1.2.0
spec:
# Pod 反亲和性(分散到不同节点)
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchLabels:
app: order-service
topologyKey: kubernetes.io/hostname
containers:
- name: order-service
image: order-service:1.2.0
# ... 同 Pod 配置Service
yaml
# ClusterIP(集群内访问)
apiVersion: v1
kind: Service
metadata:
name: order-service
spec:
selector:
app: order-service
ports:
- port: 80
targetPort: 8080
type: ClusterIP
---
# NodePort(节点端口访问)
spec:
type: NodePort
ports:
- port: 80
targetPort: 8080
nodePort: 30080
---
# LoadBalancer(云厂商负载均衡器)
spec:
type: LoadBalancer
ports:
- port: 80
targetPort: 8080Ingress
yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: api-ingress
annotations:
nginx.ingress.kubernetes.io/rewrite-target: /
nginx.ingress.kubernetes.io/rate-limit: "100"
spec:
ingressClassName: nginx
tls:
- hosts:
- api.example.com
secretName: api-tls
rules:
- host: api.example.com
http:
paths:
- path: /orders
pathType: Prefix
backend:
service:
name: order-service
port:
number: 80
- path: /users
pathType: Prefix
backend:
service:
name: user-service
port:
number: 80配置与密钥管理
ConfigMap
yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: order-config
data:
application.yaml: |
server:
port: 8080
order:
timeout: 5000
max-retry: 3
LOG_LEVEL: "INFO"Secret
bash
# 创建 Secret
kubectl create secret generic db-secret \
--from-literal=username=admin \
--from-literal=password=secret123
# 从文件创建
kubectl create secret generic tls-secret \
--from-file=tls.crt=server.crt \
--from-file=tls.key=server.keyHPA(水平自动扩缩容)
yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: order-service-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: order-service
minReplicas: 2
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
# 自定义指标(需要 Prometheus Adapter)
- type: Pods
pods:
metric:
name: http_requests_per_second
target:
type: AverageValue
averageValue: "100"
behavior:
scaleUp:
stabilizationWindowSeconds: 60
policies:
- type: Pods
value: 4
periodSeconds: 60
scaleDown:
stabilizationWindowSeconds: 300 # 缩容更保守资源配额与限制
yaml
# Namespace 级别资源配额
apiVersion: v1
kind: ResourceQuota
metadata:
name: production-quota
namespace: production
spec:
hard:
requests.cpu: "20"
requests.memory: 40Gi
limits.cpu: "40"
limits.memory: 80Gi
pods: "100"
services: "20"
---
# LimitRange(Pod 默认资源限制)
apiVersion: v1
kind: LimitRange
metadata:
name: default-limits
namespace: production
spec:
limits:
- type: Container
default:
cpu: "200m"
memory: "256Mi"
defaultRequest:
cpu: "100m"
memory: "128Mi"
max:
cpu: "2"
memory: "2Gi"网络策略
yaml
# 只允许同命名空间的 Pod 访问 order-service
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: order-service-netpol
namespace: production
spec:
podSelector:
matchLabels:
app: order-service
policyTypes:
- Ingress
- Egress
ingress:
- from:
- namespaceSelector:
matchLabels:
name: production
- podSelector:
matchLabels:
role: frontend
ports:
- protocol: TCP
port: 8080
egress:
- to:
- podSelector:
matchLabels:
app: mysql
ports:
- protocol: TCP
port: 3306
- to: # 允许 DNS
- namespaceSelector: {}
ports:
- protocol: UDP
port: 53故障处理案例
案例一:Pod 一直处于 Pending 状态
排查:
bash
kubectl describe pod <pod-name> -n production
# 常见原因:
# 1. 资源不足(Insufficient cpu/memory)
kubectl top nodes
# 2. 节点选择器/亲和性不满足
kubectl get nodes --show-labels
# 3. PVC 未绑定
kubectl get pvc -n production案例二:Pod CrashLoopBackOff
bash
# 查看日志
kubectl logs <pod-name> -n production --previous
# 查看事件
kubectl describe pod <pod-name> -n production
# 常见原因:
# - 应用启动失败(配置错误、依赖服务不可用)
# - OOM(内存限制太小)
# - 健康检查失败(readinessProbe/livenessProbe 配置不当)案例三:Service 无法访问
bash
# 检查 Endpoints
kubectl get endpoints order-service -n production
# 如果 Endpoints 为空,检查 selector 是否匹配
kubectl get pods -n production -l app=order-service
# 检查 kube-proxy
kubectl get pods -n kube-system | grep kube-proxy
# 测试 DNS 解析
kubectl run debug --image=busybox --rm -it -- nslookup order-service.production.svc.cluster.local监控指标
bash
# 查看节点资源使用
kubectl top nodes
# 查看 Pod 资源使用
kubectl top pods -n production --sort-by=memory
# 查看集群事件
kubectl get events -n production --sort-by='.lastTimestamp'