The Software Development Lifecycle (SDLC) is different at every company.
The version control system used, peer review process, code review process, design review process, how they do CI, automated testing, manual testing, and so on, varies greatly depending on where you work.
How a company plans, writes, builds, reviews, deploys, and releases software is optimized for their particular use-case, all with their own strengths and drawbacks in mind.
I started reading about how different big tech companies run their Software Development Lifecycles (SDLC) and heard the term Trunk Based Development a few times. This is a practice Google follows and I was curious about how it's different than the way most other companies develop software.
Two different ways to do branching
There are two common approaches to enable multiple developers to work on one codebase.
The first we'll refer to as the feature branching method.
Usually via Git, developers all fork the codebase (so they all have identical copies on their machines), they make a feature/project branch based on
master, and merge as the work is completed. The emphasis here being they only merge once, at the end when all the work is complete, and merge the whole branch into master.
Here's an overview for how developers use the Feature Branches method.
The white dots represent commits, and the bottom solid black line is
master. Developers branch off master, make their changes, and then when it's complete/has passed code QA, it gets merged back into master.
Trunk Based Development (TBD)
TBD is the second approach. Here each developer splits the work they will do into small batches and merges into
master (which is often referred to as the trunk) multiple times a day.
They don't create a branch and merge the branch into the trunk. They commit directly into the trunk without branches.
In TBD their code changes generally don't stay around for more than a few hours. They constantly get merged and integrated with the code everyone else is writing.
Jez Humble is a Site Reliability Engineer at Google, and author of Continuous Delivery, who says "branching is not the problem, merging is the problem" which is exactly what TBD tries to solve.
It aims to avoid painful merges that so often occur when it is time to merge long-lived branches that have diverged from the trunk, or even merge multiple branches together into one from different teams/developers before merging with the trunk.
Does TBD work at scale?
In a Google talk, Rachel Potvin, who's an Engineering Manager at Google, described one codebase that has (as of Jan 2015):
- 1 billion files
- 2 billion lines of code
- 86 terabytes of content
- 45,000 commits per workday
- 15 million lines changed in 250,000 files per week
They used TBD in this codebase and it served their use cases very well. As Google is made up of many talented (most importantly, experienced) engineers, they rarely break their builds.
Google also has a very thorough, strict code QA process (read about it here) which, when using TBD, allows fast, efficient software delivery.
TBD also works well for Agile methodologies where you have to ship software frequently to get feedback from your consumers/customers. You can continually integrate and get a good snapshot of your current state.
Let's briefly discuss some TBD strengths.
Strengths of TBD
- Feedback (whether from code QA, or peer review) comes quickly, as you merge daily. This can stop you from doing the wrong thing for 3 weeks, and then getting feedback that your work isn't correct at the very end, causing you to miss deadlines.
- There's a mental benefit to TBD, where developers feel like the trunk is our code, rather than everyone having their own feature branches and thinking this branch is my code. This can foster a more collaborative culture, increasing communication.
- It results in early integration with all the other in-flight projects/tickets and helps you re-use. And it also stops merge hell when your 9 month old feature branch needs to be merged back into the trunk.
- Large projects with lots of work involved are forced to be broken down into smaller deliverables, which is much better for estimating timelines and also for breaking up your code into modular pieces.
- When lots of developers work in isolation on feature branches it can be harder to spot junior developers struggling in their own branch. But if they're expected to be committing their work daily, you can monitor their daily output and assist them when necessary.
- TBD really cleanly ties in with continuous integration. With lots of small, incremental commits to an eventual finished project, you get an always tested, always integrated codebase with (minimal) horrible merges.
Weaknesses of TBD
- One of the challenges of this approach, is you have an increased chance of breaking the trunk, and stopping lots of people from working. You have to make sure your commits run unit tests along with a good code review process so you don't lose time reverting commits all day.
- Your commit history into master will likely be more verbose and it can be harder to see if something is wrong. If you are called at 3 AM and asked to fix a bug on your prod site with some dodgy commits that went on during business hours, would you prefer a day with 1 commit or 200 commits?
- If you don't have a fast build process, you will spend a long time waiting for things to build while your team constantly commits.
- Often times with TBD you are incrementally adding new code to do something new, but you also need the "old" paths you're replacing to still work. Because of this you need to rely on feature toggles (normally from a database) to turn things on and off. This can add an extra level of complexity with debugging.
- A final challenge can be that, when you have constant commits, you are constantly in a state of churn. You need to make sure your team regularly pulls from the trunk and doesn't end up tripping over one another while merging things.
How to release software with TBD
A team that's using TBD will have a totally different release process than a team using feature branches.
Generally, if you use feature branches, you release
master whenever you have something that gets merged in (tickets, completed projects, etc.). Or some teams release
master on a schedule, like once every week.
Here's an overview how TBD teams do their releases:
In TBD, branching is generally only used for releases.
They provide a "snapshot" of your codebase at a stable state, ready for deployment and release.
The only reason the TBD diagram above may need extra details is when something goes wrong with the release of prj-123. Then we commit the result into the trunk and cherry pick the commits into our release branch to get it in a workable state as soon as possible.
Some places, if they are releasing regularly, don't even branch and can just release the trunk whenever it's required.
There is a whole site based on the theory and practice of TBD. Feel free to read more here.
I hope this has explained what Trunk Based Development is and why it's used. It certainly helps alleviate some of the issues around merging long-lived branches containing major rewrites.
I share my writing on Twitter if you enjoyed this article and want to see more.