Argo CD — GitOps 持续交付
GitOps 理念
GitOps 核心原则:
1. Git 是唯一的事实来源(Single Source of Truth)
2. 声明式配置(Declarative)
3. 自动同步(Automated Sync)
4. 可观测性(Observable)
传统 CI/CD(Push 模式):
代码提交 → CI 构建 → CI 推送到 K8s(kubectl apply)
GitOps(Pull 模式):
代码提交 → CI 构建镜像 → 更新 Git 仓库中的镜像 Tag
Argo CD 检测到 Git 变更 → 自动同步到 K8s架构概览
Git 仓库(期望状态)
└──► Argo CD(持续同步)──► Kubernetes(实际状态)
│
Application Controller
- 检测 Git 变更
- 对比期望状态 vs 实际状态
- 自动或手动同步安装与配置
bash
# 安装 Argo CD
kubectl create namespace argocd
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml
# 获取初始密码
kubectl -n argocd get secret argocd-initial-admin-secret \
-o jsonpath="{.data.password}" | base64 -d
# 访问 UI
kubectl port-forward svc/argocd-server -n argocd 8080:443
# 登录 CLI
argocd login localhost:8080 --username admin --password <password>Application 配置
基础 Application
yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: order-service
namespace: argocd
finalizers:
- resources-finalizer.argocd.argoproj.io # 删除 App 时同步删除 K8s 资源
spec:
project: production
source:
repoURL: https://github.com/myorg/k8s-configs
targetRevision: main
path: apps/order-service/production
# Helm Chart 源
# repoURL: https://charts.bitnami.com/bitnami
# chart: redis
# targetRevision: 18.x.x
# helm:
# values: |
# auth:
# password: mypassword
destination:
server: https://kubernetes.default.svc
namespace: production
syncPolicy:
automated:
prune: true # 自动删除 Git 中已移除的资源
selfHeal: true # 自动修复手动修改(保持与 Git 一致)
syncOptions:
- CreateNamespace=true
- PrunePropagationPolicy=foreground
- ApplyOutOfSyncOnly=true # 只同步有变更的资源
retry:
limit: 5
backoff:
duration: 5s
factor: 2
maxDuration: 3mApplicationSet(多集群/多环境)
yaml
apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
name: order-service-appset
namespace: argocd
spec:
generators:
# 矩阵生成器:环境 × 集群
- matrix:
generators:
- list:
elements:
- env: dev
cluster: dev-cluster
namespace: development
- env: staging
cluster: staging-cluster
namespace: staging
- env: prod
cluster: prod-cluster-us
namespace: production
- env: prod
cluster: prod-cluster-eu
namespace: production
template:
metadata:
name: "order-service-{{env}}-{{cluster}}"
spec:
project: "{{env}}"
source:
repoURL: https://github.com/myorg/k8s-configs
targetRevision: main
path: "apps/order-service/{{env}}"
helm:
valueFiles:
- "values-{{env}}.yaml"
destination:
server: "https://{{cluster}}.example.com"
namespace: "{{namespace}}"
syncPolicy:
automated:
prune: true
selfHeal: true多集群管理
bash
# 添加外部集群
argocd cluster add production-cluster \
--kubeconfig ~/.kube/production.yaml \
--name production
# 查看集群列表
argocd cluster listAppProject(权限隔离)
yaml
apiVersion: argoproj.io/v1alpha1
kind: AppProject
metadata:
name: production
namespace: argocd
spec:
description: Production environment
# 允许的源仓库
sourceRepos:
- https://github.com/myorg/k8s-configs
- https://charts.bitnami.com/bitnami
# 允许的目标集群和命名空间
destinations:
- server: https://prod-cluster.example.com
namespace: production
- server: https://prod-cluster.example.com
namespace: monitoring
# 禁止的资源类型(防止误操作)
clusterResourceBlacklist:
- group: ""
kind: Namespace
# 允许的命名空间级资源
namespaceResourceWhitelist:
- group: "apps"
kind: Deployment
- group: ""
kind: Service
# RBAC 角色
roles:
- name: developer
description: Developer role
policies:
- p, proj:production:developer, applications, get, production/*, allow
- p, proj:production:developer, applications, sync, production/*, allow
groups:
- myorg:developers渐进式交付(Argo Rollouts)
yaml
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
name: order-service
spec:
replicas: 10
selector:
matchLabels:
app: order-service
template:
# ... Pod 模板
strategy:
canary:
# 金丝雀发布步骤
steps:
- setWeight: 10 # 10% 流量到新版本
- pause: {duration: 5m} # 等待5分钟
- setWeight: 30
- pause: {duration: 10m}
- setWeight: 60
- pause: {duration: 10m}
- setWeight: 100
# 自动分析(基于 Prometheus 指标)
analysis:
templates:
- templateName: success-rate
startingStep: 2
args:
- name: service-name
value: order-service
# 流量管理(需要 Istio/NGINX)
trafficRouting:
istio:
virtualService:
name: order-service-vs
destinationRule:
name: order-service-dr
canarySubsetName: canary
stableSubsetName: stable
---
# 分析模板
apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
name: success-rate
spec:
args:
- name: service-name
metrics:
- name: success-rate
interval: 1m
successCondition: result[0] >= 0.95
failureLimit: 3
provider:
prometheus:
address: http://prometheus:9090
query: |
sum(rate(http_requests_total{service="{{args.service-name}}",status!~"5.."}[5m]))
/
sum(rate(http_requests_total{service="{{args.service-name}}"}[5m]))故障处理案例
案例一:Application 一直处于 OutOfSync 状态
排查:
bash
# 查看差异
argocd app diff order-service
# 查看同步状态详情
argocd app get order-service
# 常见原因:
# 1. 资源被手动修改(selfHeal=false 时不会自动修复)
# 2. 资源有 annotation/label 被 K8s 自动添加
# 3. Helm 渲染结果与实际不一致解决:
bash
# 手动同步
argocd app sync order-service
# 强制同步(覆盖手动修改)
argocd app sync order-service --force案例二:同步失败(Sync Failed)
bash
# 查看同步失败原因
argocd app get order-service --show-operation
# 查看 K8s 事件
kubectl get events -n production --sort-by='.lastTimestamp'案例三:Git 仓库连接失败
bash
# 查看仓库状态
argocd repo list
# 重新添加仓库凭证
argocd repo add https://github.com/myorg/k8s-configs \
--username myuser \
--password mytoken最佳实践
仓库结构
k8s-configs/
apps/
order-service/
base/ # 基础配置(Kustomize)
deployment.yaml
service.yaml
kustomization.yaml
overlays/
dev/
kustomization.yaml # 覆盖 dev 环境配置
patch-replicas.yaml
prod/
kustomization.yaml
patch-replicas.yaml
patch-resources.yaml
# 或 Helm 方式
helm/
order-service/
Chart.yaml
values.yaml
values-dev.yaml
values-prod.yaml
templates/镜像更新自动化
yaml
# Argo CD Image Updater
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
annotations:
argocd-image-updater.argoproj.io/image-list: order=myregistry/order-service
argocd-image-updater.argoproj.io/order.update-strategy: semver
argocd-image-updater.argoproj.io/order.allow-tags: regexp:^v[0-9]+\.[0-9]+\.[0-9]+$
argocd-image-updater.argoproj.io/write-back-method: git