Deploying the world's largest GitLab instance 12 times daily

Every day, GitLab deploys code changes to the world's largest GitLab instance — GitLab.com — up to 12 times without any downtime. We use GitLab's own CI/CD platform to manage these deployments, which impact millions of developers worldwide. This deployment frequency serves as our primary quality gate and stress test. It also means our customers get access to new features within hours of development rather than waiting weeks or months. When organizations depend on GitLab for their DevOps workflows, they're using a platform that's proven at scale on our own infrastructure. In this article, you'll learn how we built an automated deployment pipeline using core GitLab CI/CD functionality to handle this deployment complexity.

The business case for deployment velocity

For GitLab: Our deployment frequency isn't just an engineering metric — it's a business imperative. Rapid deployment cycles mean we can respond to customer feedback within hours, ship security patches immediately, and validate new features in production before scaling them.

For our customers: Every deployment to GitLab.com validates the deployment practices we recommend to our users. When you use GitLab's deployment features, you're using the same battle-tested approach that handles millions of git operations, CI/CD pipelines, and user interactions daily. You benefit from:

Latest features available immediately: New capabilities reach you within hours of completion, not in quarterly release cycles
Proven reliability at scale: If a feature works on GitLab.com, you can trust it in your environment
Full value of GitLab: Zero-downtime deployments mean you never lose access to your DevOps platform, even during updates
Real-world tested practices: Our deployment documentation isn't theory — it's exactly how we run the largest GitLab instance in existence

Code flow architecture

Our deployment pipeline follows a structured progression through multiple stages, each acting as a checkpoint on the journey from code proposal to production deployment.

graph TD A[Code Proposed] --> B[Merge Request Created] B --> C[Pipeline Triggered] C --> D[Build & Test] D --> E{Spec/Integration/QA Tests Pass?} E -->|No| F[Feedback Loop] F --> B E -->|Yes| G[Merge to default branch] G -->|Periodically| H[Auto-Deploy Branch] subgraph "Deployment Pipeline" H --> I[Package Creation] I --> K[Canary Environment] K --> L[QA Validation] L --> M[Main Environment] end

Deployment pipeline makeup

Our deployment approach uses GitLab's native CI/CD capabilities to orchestrate complex deployments across hybrid infrastructure. Here's how we do it.

Build

Building GitLab is a complex topic in and of itself, so I'll go over the details at a high level.

We build both our Omnibus package and our Cloud Native GitLab (CNG) images. The Omnibus packages deploy to our Gitaly fleet (our Git storage layer), while CNG images run all other components as containerized workloads. Other stateful services like Postgres and Redis have grown so large we have dedicated teams managing them separately. For GitLab.com, those systems are not deployed during our Auto-Deploy procedures.

We have a scheduled pipeline that will regularly look at gitlab-org/gitlab and search for the most recent commit on the default branch with a successful (“green”) pipeline. Green pipelines signal that every component of GitLab has passed its comprehensive test suite. We then create an auto-deploy branch from that commit.

This triggers a sequence of events: primarily, the need to build this package and all components that are a part of our monolith. Another scheduled pipeline selects the latest built package and initiates the deployment pipeline. Procedurally, it looks this simple:

graph LR A[Create branch] --> B[Build] B --> C[Choose Built package] C --> D[Start Deploy Pipeline]

Building takes some time, and since deployments can vary due to various circumstances, we choose the latest build to deploy. We technically build more versions of GitLab for .com than will ever be deployed. This enables us to always have a package lined up ready to go, and this brings us the closest we can be to having a full continuously delivered product for .com.

Environment-based validation and Canary strategy

Quality assurance (QA) isn't just an afterthought here — it's baked into every layer from development through deployment. Our QA process leverages automated test suites that include unit tests, integration tests, and end-to-end tests that simulate real user interactions with GitLab's features. But more importantly for our deployment pipeline, our QA process works hand-in-hand with our Canary strategy through environment-based validation.

As part of our validation approach, we leverage GitLab's native Canary deployments, enabling controlled validation of changes with limited traffic exposure before full production deployment. We send roughly 5% of all traffic through our Canary stage. This approach increases the complexity of database migrations, but successfully navigating Canary deployments ensures we deploy a reliable product seamlessly.

The Canary deployment features you use in GitLab were refined through managing one of the most complex deployment scenarios in production. When you implement Canary deployments for your applications, you're using patterns proven at massive scale.

Our deployment process follows a progressive rollout strategy:

Staging Canary: Initial validation environment
Production Canary: Limited production traffic
Staging Main: Full staging environment deployment
Production Main: Full production rollout

graph TD C[Staging Canary Deploy] C --> D[QA Smoke Main Stage Tests] C --> E[QA Smoke Canary Stage Tests] D --> F E --> F{Tests Pass?} F -->|Yes| G[Production Canary Deploy] G --> S[QA Smoke Main Stage Tests] G --> T[QA Smoke Canary Stage Tests] F -->|No| H[Issue Creation] H --> K[Fix & Backport] K --> C S --> M[Canary Traffic Monitoring] T --> M[Canary Traffic Monitoring baking period] M --> U[Production Safety Checks] U --> N[Staging Main] N --> V[Production Main]

Our QA validation occurs at multiple checkpoints throughout this progressive deployment process: after each Canary deployment, and again after post-deploy migrations. This multilayered approach ensures that each phase of our deployment strategy has its own safety net. You can learn more about GitLab's comprehensive testing approach in our handbook.

Deployment pipeline

Here are the challenges we address across our deployment pipeline.

Technical architecture considerations

GitLab.com represents real-world deployment complexity at scale. As the largest known GitLab instance, deployments use our official GitLab Helm chart and the official Linux package — the same artifacts our customers use. You can learn more about the GitLab.com architecture in our handbook. This hybrid approach means our deployment pipeline must intelligently handle both containerized services and traditional Linux services in the same deployment cycle.

Dogfooding at scale: We deploy using the same procedures we document for zero-downtime upgrades. If something doesn't work smoothly for us, we don't recommend it to customers. This self-imposed constraint drives continuous improvement in our deployment tooling.

The following stages are run for all environment and stage upgrades:

graph LR a[prep] --> c[Regular Migrations - Canary stage only] a --> f[Assets - Canary stage only] c --> d[Gitaly] d --> k8s subgraph subGraph0["VM workloads"] d["Gitaly"] end subgraph subGraph1["Kubernetes workloads"] k8s["k8s"] end subgraph fleet["fleet"] subGraph0 subGraph1 end

Stage details:

Prep: Validates deployment readiness and performs pre-deployment checks
Migrations: Executes database regular migrations. This only happens during the Canary stage. Because both Canary and Main stages share the same database, these changes are already available when the Main stage deploys, eliminating the need to repeat these tasks.
Assets: We leverage a GCS bucket for all static assets. If any new assets are created, we upload these to our bucket such that they are immediately available to our Canary stage. As we leverage WebPack for assets, and properly leverage SHAs in the naming of our assets, we can confidently not worry that we override an older asset. Therefore, old assets continue to be available for older deployments and new assets are imemdiately made available when Canary begins its deploy. This only happens during the Canary stage deployment. Because Canary and Main stages share the same asset storage, these changes are already available when the Main stage deploys.
Gitaly: Updates Gitaly Virtual Machine storage layer via our Omnibus Linux package on each Gitaly node. This service is unique as we bundle it with git. Therefore, we need to ensure that this service is capable of atomic upgrades. We leverage a wrapper around Gitaly, which enables us to install a newer version of Gitaly and make use of the library tableflip to cleanly rotate the running Gitaly, ensuring high availability of this service on each of our instances.
Kubernetes: Deploys containerized GitLab components via our Helm chart. Note that we deploy to numerous clusters spread across Zones for redundancy, so these are usually broken into their own stages to minimize harm and sometimes allows us to stop mid-deploy if critical issues are detected.

Multi-version compatibility: The hidden challenge

As you read our process, you will notice that there's a period of time where our database schema is ahead of the code that the Main stage knows about. This happens because the Canary stage has already deployed new code and runs regular database migrations, but the Main stage is still running the previous version of the code that doesn't know about these new database changes.

Real-world example: Imagine we're adding a new merge_readiness field to merge requests. During deployment, some servers are running code that expects this field. while others don't know it exists yet. If we handle this poorly, we break GitLab.com for millions of users. If we handle it well, nobody notices anything happened.

This occurs with most other services, as well. For example, if a client sends multiple requests, there's a chance one of them might land in our Canary stage; other requests might be directed to the Main stage. This is not too different from a deploy as it does take a decent amount of time to roll through the few thousand Pods that run our services.

With a few exceptions, the vast majority of our services will run a slightly newer version of that component in Canary for a period of time. In a sense, these scenarios are all transient states. But they can often persist for several hours or days in a live, production environment. Therefore, we must treat them with the same care as permanent states. During any deployment, we have multiple versions of GitLab running simultaneously and they all need to play nicely together.

Database operations

Database migrations present a unique challenge in our Canary deployment model. We need schema changes to support new features while maintaining our ability to roll back if issues arise. Our solution involves careful separation of concerns:

Regular migrations: Run during the Canary stage, designed to be backward-compatible, consists of only reversible changes
Post-deploy migrations: The "point of no return" migrations that happen only after multiple successful deployments

Database changes are handled with precision and extensive validation procedures:

graph LR A[Regular Migrations] --> B[Canary Stage Deploy] B --> C[Main Stage Deploy] C --> D[Post Deploy Migrations]

Post-deploy migrations

GitLab deployments involve many components. Updating GitLab is not atomic, so many components must be backward-compatible.

Post-deploy migrations often contain changes that can't be easily rolled back — think data transformations, column drops, or structural changes that would break older code versions. By running them after we've gained confidence through multiple successful deployments, we ensure:

The new code is stable and we're unlikely to need a rollback
Performance characteristics are well understood in production
Any edge cases have been discovered and addressed
The blast radius is minimized if something does go wrong

This approach provides the optimal balance: enabling rapid feature deployment through Canary releases while maintaining rollback capabilities until we have high confidence in deployment stability.

The expand-migrate-contract pattern: Our database, frontend, and application compatibility changes follow a carefully orchestrated three-phase approach.

Expand: Add new structures (columns, indexes) while keeping old ones functional
Migrate: Deploy new application code that uses the new structures
Contract: Remove old structures in post-deploy migrations after everything is stable

Real-world example: When adding a new merge_readiness column to merge requests:

Expand: Add the new column with a default value; existing code ignores it
Migrate: Deploy code that reads and writes to the new column while still supporting the old approach

3 Contract: After several successful deployments, remove the old column in a post-deploy migration

All database operations, application code, frontend code, and more, are subject to a set of guidelines that Engineering must adhere to, which can be found in our Multi-Version Compatibility documentation.

Results and impact

Our deployment infrastructure delivers measurable benefits:

For GitLab

Up to 12 deployments daily to GitLab.com
Zero-downtime deployments serving millions of developers
Security patches can reach production within hours, not days
New features validated in production at massive scale before general availability

For customers

Proven deployment patterns you can adopt for your own applications
Features battle-tested on the world's largest GitLab instance before reaching your environment
Documentation that reflects actual production practices, not theoretical best practices
Confidence that GitLab's recommended upgrade procedures work at any scale

Key takeaways for engineering teams

GitLab's deployment pipeline represents a sophisticated system that balances deployment velocity with operational reliability. The progressive deployment model, comprehensive testing integration, and robust rollback capabilities provide a foundation for reliable software delivery at scale.

For engineering teams implementing similar systems, key considerations include:

Automated testing: Comprehensive test coverage throughout the deployment pipeline
Progressive rollout: Staged deployments to minimize risk and enable rapid recovery
Monitoring integration: Comprehensive observability across all deployment stages
Incident response: Rapid detection and resolution capabilities for deployment issues

GitLab's architecture demonstrates how modern CI/CD systems can manage the complexity of large-scale deployments while maintaining the velocity required for competitive software development.

Important note on scope

This article specifically covers the deployment pipeline for services that are part of the GitLab Omnibus package and Helm chart — essentially the core GitLab monolith and its tightly integrated components.

However, GitLab's infrastructure landscape extends beyond what's described here. Other services, notably our AI services and services that might be in a proof of concept state, follow a different deployment approach using our internal platform called Runway.

If you're working with or curious about these other services, you can find more information in the Runway documentation.

Other offerings, such as GitLab Dedicated are deployed more in alignment with what we expect customers to be capable of performing themselves by way of the GitLab Environment Toolkit. If you'd like to learn more, check out the GitLab Environment Toolkit project.

The deployment strategies, architectural considerations, and pipeline complexities outlined in this article represent the battle-tested approach we use for our core platform — but like any large engineering organization, we have multiple deployment strategies tailored to different service types and maturity levels.

Further documentation about Auto-Deploy and our procedures can be found at the below links:

How we deploy the largest GitLab instance 12 times daily

The business case for deployment velocity

Code flow architecture

Deployment pipeline makeup

Build

Environment-based validation and Canary strategy

Deployment pipeline

Technical architecture considerations

Multi-version compatibility: The hidden challenge

Database operations

Post-deploy migrations

Results and impact

Key takeaways for engineering teams

Important note on scope

More resources

We want to hear from you

Enjoyed reading this blog post or have questions or feedback? Share your thoughts by creating a new topic in the GitLab community forum.

Start shipping better software faster

Pricing

Contact Us

Product

Topics

Solutions

Resources

Company

How we deploy the largest GitLab instance 12 times daily

The business case for deployment velocity

Code flow architecture

Deployment pipeline makeup

Build

Environment-based validation and Canary strategy

Deployment pipeline

Technical architecture considerations

Multi-version compatibility: The hidden challenge

Database operations

Post-deploy migrations

Results and impact

Key takeaways for engineering teams

Important note on scope

More resources

Stay in the know with GitLab's monthly newsletter

We want to hear from you

Enjoyed reading this blog post or have questions or feedback? Share your thoughts by creating a new topic in the GitLab community forum.

Start shipping better software faster