Skip to content

Argo CD — GitOps 持续交付

GitOps 理念

GitOps 核心原则:
  1. Git 是唯一的事实来源(Single Source of Truth)
  2. 声明式配置(Declarative)
  3. 自动同步(Automated Sync)
  4. 可观测性(Observable)

传统 CI/CD(Push 模式):
  代码提交 → CI 构建 → CI 推送到 K8s(kubectl apply)

GitOps(Pull 模式):
  代码提交 → CI 构建镜像 → 更新 Git 仓库中的镜像 Tag
  Argo CD 检测到 Git 变更 → 自动同步到 K8s

架构概览

Git 仓库(期望状态)
  └──► Argo CD(持续同步)──► Kubernetes(实际状态)

         Application Controller
           - 检测 Git 变更
           - 对比期望状态 vs 实际状态
           - 自动或手动同步

安装与配置

bash
# 安装 Argo CD
kubectl create namespace argocd
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml

# 获取初始密码
kubectl -n argocd get secret argocd-initial-admin-secret \
  -o jsonpath="{.data.password}" | base64 -d

# 访问 UI
kubectl port-forward svc/argocd-server -n argocd 8080:443

# 登录 CLI
argocd login localhost:8080 --username admin --password <password>

Application 配置

基础 Application

yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: order-service
  namespace: argocd
  finalizers:
    - resources-finalizer.argocd.argoproj.io  # 删除 App 时同步删除 K8s 资源
spec:
  project: production
  
  source:
    repoURL: https://github.com/myorg/k8s-configs
    targetRevision: main
    path: apps/order-service/production
    
    # Helm Chart 源
    # repoURL: https://charts.bitnami.com/bitnami
    # chart: redis
    # targetRevision: 18.x.x
    # helm:
    #   values: |
    #     auth:
    #       password: mypassword
  
  destination:
    server: https://kubernetes.default.svc
    namespace: production
  
  syncPolicy:
    automated:
      prune: true      # 自动删除 Git 中已移除的资源
      selfHeal: true   # 自动修复手动修改(保持与 Git 一致)
    syncOptions:
      - CreateNamespace=true
      - PrunePropagationPolicy=foreground
      - ApplyOutOfSyncOnly=true  # 只同步有变更的资源
    retry:
      limit: 5
      backoff:
        duration: 5s
        factor: 2
        maxDuration: 3m

ApplicationSet(多集群/多环境)

yaml
apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
  name: order-service-appset
  namespace: argocd
spec:
  generators:
    # 矩阵生成器:环境 × 集群
    - matrix:
        generators:
          - list:
              elements:
                - env: dev
                  cluster: dev-cluster
                  namespace: development
                - env: staging
                  cluster: staging-cluster
                  namespace: staging
                - env: prod
                  cluster: prod-cluster-us
                  namespace: production
                - env: prod
                  cluster: prod-cluster-eu
                  namespace: production
  
  template:
    metadata:
      name: "order-service-{{env}}-{{cluster}}"
    spec:
      project: "{{env}}"
      source:
        repoURL: https://github.com/myorg/k8s-configs
        targetRevision: main
        path: "apps/order-service/{{env}}"
        helm:
          valueFiles:
            - "values-{{env}}.yaml"
      destination:
        server: "https://{{cluster}}.example.com"
        namespace: "{{namespace}}"
      syncPolicy:
        automated:
          prune: true
          selfHeal: true

多集群管理

bash
# 添加外部集群
argocd cluster add production-cluster \
  --kubeconfig ~/.kube/production.yaml \
  --name production

# 查看集群列表
argocd cluster list

AppProject(权限隔离)

yaml
apiVersion: argoproj.io/v1alpha1
kind: AppProject
metadata:
  name: production
  namespace: argocd
spec:
  description: Production environment
  
  # 允许的源仓库
  sourceRepos:
    - https://github.com/myorg/k8s-configs
    - https://charts.bitnami.com/bitnami
  
  # 允许的目标集群和命名空间
  destinations:
    - server: https://prod-cluster.example.com
      namespace: production
    - server: https://prod-cluster.example.com
      namespace: monitoring
  
  # 禁止的资源类型(防止误操作)
  clusterResourceBlacklist:
    - group: ""
      kind: Namespace
  
  # 允许的命名空间级资源
  namespaceResourceWhitelist:
    - group: "apps"
      kind: Deployment
    - group: ""
      kind: Service
  
  # RBAC 角色
  roles:
    - name: developer
      description: Developer role
      policies:
        - p, proj:production:developer, applications, get, production/*, allow
        - p, proj:production:developer, applications, sync, production/*, allow
      groups:
        - myorg:developers

渐进式交付(Argo Rollouts)

yaml
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: order-service
spec:
  replicas: 10
  selector:
    matchLabels:
      app: order-service
  template:
    # ... Pod 模板
  strategy:
    canary:
      # 金丝雀发布步骤
      steps:
        - setWeight: 10    # 10% 流量到新版本
        - pause: {duration: 5m}  # 等待5分钟
        - setWeight: 30
        - pause: {duration: 10m}
        - setWeight: 60
        - pause: {duration: 10m}
        - setWeight: 100
      
      # 自动分析(基于 Prometheus 指标)
      analysis:
        templates:
          - templateName: success-rate
        startingStep: 2
        args:
          - name: service-name
            value: order-service
      
      # 流量管理(需要 Istio/NGINX)
      trafficRouting:
        istio:
          virtualService:
            name: order-service-vs
          destinationRule:
            name: order-service-dr
            canarySubsetName: canary
            stableSubsetName: stable

---
# 分析模板
apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
  name: success-rate
spec:
  args:
    - name: service-name
  metrics:
    - name: success-rate
      interval: 1m
      successCondition: result[0] >= 0.95
      failureLimit: 3
      provider:
        prometheus:
          address: http://prometheus:9090
          query: |
            sum(rate(http_requests_total{service="{{args.service-name}}",status!~"5.."}[5m]))
            /
            sum(rate(http_requests_total{service="{{args.service-name}}"}[5m]))

故障处理案例

案例一:Application 一直处于 OutOfSync 状态

排查

bash
# 查看差异
argocd app diff order-service

# 查看同步状态详情
argocd app get order-service

# 常见原因:
# 1. 资源被手动修改(selfHeal=false 时不会自动修复)
# 2. 资源有 annotation/label 被 K8s 自动添加
# 3. Helm 渲染结果与实际不一致

解决

bash
# 手动同步
argocd app sync order-service

# 强制同步(覆盖手动修改)
argocd app sync order-service --force

案例二:同步失败(Sync Failed)

bash
# 查看同步失败原因
argocd app get order-service --show-operation

# 查看 K8s 事件
kubectl get events -n production --sort-by='.lastTimestamp'

案例三:Git 仓库连接失败

bash
# 查看仓库状态
argocd repo list

# 重新添加仓库凭证
argocd repo add https://github.com/myorg/k8s-configs \
  --username myuser \
  --password mytoken

最佳实践

仓库结构

k8s-configs/
  apps/
    order-service/
      base/                    # 基础配置(Kustomize)
        deployment.yaml
        service.yaml
        kustomization.yaml
      overlays/
        dev/
          kustomization.yaml   # 覆盖 dev 环境配置
          patch-replicas.yaml
        prod/
          kustomization.yaml
          patch-replicas.yaml
          patch-resources.yaml
  
  # 或 Helm 方式
  helm/
    order-service/
      Chart.yaml
      values.yaml
      values-dev.yaml
      values-prod.yaml
      templates/

镜像更新自动化

yaml
# Argo CD Image Updater
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  annotations:
    argocd-image-updater.argoproj.io/image-list: order=myregistry/order-service
    argocd-image-updater.argoproj.io/order.update-strategy: semver
    argocd-image-updater.argoproj.io/order.allow-tags: regexp:^v[0-9]+\.[0-9]+\.[0-9]+$
    argocd-image-updater.argoproj.io/write-back-method: git

PaaS 中间件生态系统深度学习文档