Published on: December 1, 2025
13 min read
Take a deep technical dive into GitLab.com's deployment pipeline, including progressive rollouts, Canary strategies, database migrations, and multiversion compatibility.

Every day, GitLab deploys code changes to the world's largest GitLab instance — GitLab.com — up to 12 times without any downtime. We use GitLab's own CI/CD platform to manage these deployments, which impact millions of developers worldwide. This deployment frequency serves as our primary quality gate and stress test. It also means our customers get access to new features within hours of development rather than waiting weeks or months. When organizations depend on GitLab for their DevOps workflows, they're using a platform that's proven at scale on our own infrastructure. In this article, you'll learn how we built an automated deployment pipeline using core GitLab CI/CD functionality to handle this deployment complexity.
For GitLab: Our deployment frequency isn't just an engineering metric — it's a business imperative. Rapid deployment cycles mean we can respond to customer feedback within hours, ship security patches immediately, and validate new features in production before scaling them.
For our customers: Every deployment to GitLab.com validates the deployment practices we recommend to our users. When you use GitLab's deployment features, you're using the same battle-tested approach that handles millions of git operations, CI/CD pipelines, and user interactions daily. You benefit from:
Our deployment pipeline follows a structured progression through multiple stages, each acting as a checkpoint on the journey from code proposal to production deployment.
Our deployment approach uses GitLab's native CI/CD capabilities to orchestrate complex deployments across hybrid infrastructure. Here's how we do it.
Building GitLab is a complex topic in and of itself, so I'll go over the details at a high level.
We build both our Omnibus package and our Cloud Native GitLab (CNG) images. The Omnibus packages deploy to our Gitaly fleet (our Git storage layer), while CNG images run all other components as containerized workloads. Other stateful services like Postgres and Redis have grown so large we have dedicated teams managing them separately. For GitLab.com, those systems are not deployed during our Auto-Deploy procedures.
We have a scheduled pipeline that will regularly look at gitlab-org/gitlab and search for the most recent commit on the default branch with a successful (“green”) pipeline. Green pipelines signal that every component of GitLab has passed its comprehensive test suite. We then create an auto-deploy branch from that commit.
This triggers a sequence of events: primarily, the need to build this package and all components that are a part of our monolith. Another scheduled pipeline selects the latest built package and initiates the deployment pipeline. Procedurally, it looks this simple:
Building takes some time, and since deployments can vary due to various circumstances, we choose the latest build to deploy. We technically build more versions of GitLab for .com than will ever be deployed. This enables us to always have a package lined up ready to go, and this brings us the closest we can be to having a full continuously delivered product for .com.
Quality assurance (QA) isn't just an afterthought here — it's baked into every layer from development through deployment. Our QA process leverages automated test suites that include unit tests, integration tests, and end-to-end tests that simulate real user interactions with GitLab's features. But more importantly for our deployment pipeline, our QA process works hand-in-hand with our Canary strategy through environment-based validation.
As part of our validation approach, we leverage GitLab's native Canary deployments, enabling controlled validation of changes with limited traffic exposure before full production deployment. We send roughly 5% of all traffic through our Canary stage. This approach increases the complexity of database migrations, but successfully navigating Canary deployments ensures we deploy a reliable product seamlessly.
The Canary deployment features you use in GitLab were refined through managing one of the most complex deployment scenarios in production. When you implement Canary deployments for your applications, you're using patterns proven at massive scale.
Our deployment process follows a progressive rollout strategy:
Staging Canary: Initial validation environment
Production Canary: Limited production traffic
Staging Main: Full staging environment deployment
Production Main: Full production rollout
Our QA validation occurs at multiple checkpoints throughout this progressive deployment process: after each Canary deployment, and again after post-deploy migrations. This multilayered approach ensures that each phase of our deployment strategy has its own safety net. You can learn more about GitLab's comprehensive testing approach in our handbook.
Here are the challenges we address across our deployment pipeline.
GitLab.com represents real-world deployment complexity at scale. As the largest known GitLab instance, deployments use our official GitLab Helm chart and the official Linux package — the same artifacts our customers use. You can learn more about the GitLab.com architecture in our handbook. This hybrid approach means our deployment pipeline must intelligently handle both containerized services and traditional Linux services in the same deployment cycle.
Dogfooding at scale: We deploy using the same procedures we document for zero-downtime upgrades. If something doesn't work smoothly for us, we don't recommend it to customers. This self-imposed constraint drives continuous improvement in our deployment tooling.
The following stages are run for all environment and stage upgrades:
Stage details:
Prep: Validates deployment readiness and performs pre-deployment checks
Migrations: Executes database regular migrations. This only happens during the Canary stage. Because both Canary and Main stages share the same database, these changes are already available when the Main stage deploys, eliminating the need to repeat these tasks.
Assets: We leverage a GCS bucket for all static assets. If any new assets are created, we upload these to our bucket such that they are immediately available to our Canary stage. As we leverage WebPack for assets, and properly leverage SHAs in the naming of our assets, we can confidently not worry that we override an older asset. Therefore, old assets continue to be available for older deployments and new assets are imemdiately made available when Canary begins its deploy. This only happens during the Canary stage deployment. Because Canary and Main stages share the same asset storage, these changes are already available when the Main stage deploys.
Gitaly: Updates Gitaly Virtual Machine storage layer via our Omnibus Linux package on each Gitaly node. This service is unique as we bundle it with git. Therefore, we need to ensure that this service is capable of atomic upgrades. We leverage a wrapper around Gitaly, which enables us to install a newer version of Gitaly and make use of the library tableflip to cleanly rotate the running Gitaly, ensuring high availability of this service on each of our instances.
Kubernetes: Deploys containerized GitLab components via our Helm chart. Note that we deploy to numerous clusters spread across Zones for redundancy, so these are usually broken into their own stages to minimize harm and sometimes allows us to stop mid-deploy if critical issues are detected.
As you read our process, you will notice that there's a period of time where our database schema is ahead of the code that the Main stage knows about. This happens because the Canary stage has already deployed new code and runs regular database migrations, but the Main stage is still running the previous version of the code that doesn't know about these new database changes.
Real-world example: Imagine we're adding a new merge_readiness field to merge requests. During deployment, some servers are running code that expects this field. while others don't know it exists yet. If we handle this poorly, we break GitLab.com for millions of users. If we handle it well, nobody notices anything happened.
This occurs with most other services, as well. For example, if a client sends multiple requests, there's a chance one of them might land in our Canary stage; other requests might be directed to the Main stage. This is not too different from a deploy as it does take a decent amount of time to roll through the few thousand Pods that run our services.
With a few exceptions, the vast majority of our services will run a slightly newer version of that component in Canary for a period of time. In a sense, these scenarios are all transient states. But they can often persist for several hours or days in a live, production environment. Therefore, we must treat them with the same care as permanent states. During any deployment, we have multiple versions of GitLab running simultaneously and they all need to play nicely together.
Database migrations present a unique challenge in our Canary deployment model. We need schema changes to support new features while maintaining our ability to roll back if issues arise. Our solution involves careful separation of concerns:
Regular migrations: Run during the Canary stage, designed to be backward-compatible, consists of only reversible changes
Post-deploy migrations: The "point of no return" migrations that happen only after multiple successful deployments
Database changes are handled with precision and extensive validation procedures:
GitLab deployments involve many components. Updating GitLab is not atomic, so many components must be backward-compatible.
Post-deploy migrations often contain changes that can't be easily rolled back — think data transformations, column drops, or structural changes that would break older code versions. By running them after we've gained confidence through multiple successful deployments, we ensure:
The new code is stable and we're unlikely to need a rollback
Performance characteristics are well understood in production
Any edge cases have been discovered and addressed
The blast radius is minimized if something does go wrong
This approach provides the optimal balance: enabling rapid feature deployment through Canary releases while maintaining rollback capabilities until we have high confidence in deployment stability.
The expand-migrate-contract pattern: Our database, frontend, and application compatibility changes follow a carefully orchestrated three-phase approach.
Expand: Add new structures (columns, indexes) while keeping old ones functional
Migrate: Deploy new application code that uses the new structures
Contract: Remove old structures in post-deploy migrations after everything is stable
Real-world example: When adding a new merge_readiness column to merge requests:
Expand: Add the new column with a default value; existing code ignores it
Migrate: Deploy code that reads and writes to the new column while still supporting the old approach
3 Contract: After several successful deployments, remove the old column in a post-deploy migration
All database operations, application code, frontend code, and more, are subject to a set of guidelines that Engineering must adhere to, which can be found in our Multi-Version Compatibility documentation.
Our deployment infrastructure delivers measurable benefits:
For GitLab
For customers
GitLab's deployment pipeline represents a sophisticated system that balances deployment velocity with operational reliability. The progressive deployment model, comprehensive testing integration, and robust rollback capabilities provide a foundation for reliable software delivery at scale.
For engineering teams implementing similar systems, key considerations include:
Automated testing: Comprehensive test coverage throughout the deployment pipeline
Progressive rollout: Staged deployments to minimize risk and enable rapid recovery
Monitoring integration: Comprehensive observability across all deployment stages
Incident response: Rapid detection and resolution capabilities for deployment issues
GitLab's architecture demonstrates how modern CI/CD systems can manage the complexity of large-scale deployments while maintaining the velocity required for competitive software development.
This article specifically covers the deployment pipeline for services that are part of the GitLab Omnibus package and Helm chart — essentially the core GitLab monolith and its tightly integrated components.
However, GitLab's infrastructure landscape extends beyond what's described here. Other services, notably our AI services and services that might be in a proof of concept state, follow a different deployment approach using our internal platform called Runway.
If you're working with or curious about these other services, you can find more information in the Runway documentation.
Other offerings, such as GitLab Dedicated are deployed more in alignment with what we expect customers to be capable of performing themselves by way of the GitLab Environment Toolkit. If you'd like to learn more, check out the GitLab Environment Toolkit project.
The deployment strategies, architectural considerations, and pipeline complexities outlined in this article represent the battle-tested approach we use for our core platform — but like any large engineering organization, we have multiple deployment strategies tailored to different service types and maturity levels.
Further documentation about Auto-Deploy and our procedures can be found at the below links: