12/13/11

DB Versioning

If you deploy version 2.0 of your application against version 1.0 of your database, what do you get? A broken application, that's what. That's why your database should always be under source control, right next to your application code. You deploy the app, and you deploy the database.

When it comes to version control, the database is often a second or even third-class citizen. Version control for source code is used by most development teams today. It gives you a history of all the changes you made, you can use it to share your code with others and you can work on an experimental feature without polluting the stable product. These are just some of the reasons why version control software is considered indispensable today. But where does our database fit in all this? How can we share the changes we want to make to the database with our colleagues, and eventually with our production environment?

Database version control is something that most developers have to deal with regularly, yet only a few have actually thought about what solution might be best for them. Most people have a solution that sort of works for them, but when you ask them about the subject they are pretty convinced that there must be some better way to manage database changes, they're just not entirely sure what that solution is – but the silver bullet must be out there somewhere, right?

Writing your own script
A common starting point is to have a custom script to apply database patches (sometimes also called "deltas" or "migrations") to your database. This approach is widely used and works quite well for a lot of people. The idea behind this approach is simple: we store patch files in a directory in our project. These patch files are checked into version control alongside our source code.

whenever a developer makes a change to the database schema they must add the SQL patch file to this directory describing the change. The patch file is then checked in to version control together with the source code. When anyone updates a copy of the code, they will receive patch files as well as source code changes, and these patches need to be applied to the database to bring it to the version expected by the code.

In order to automate this using a script, our script basically needs to do two things

Run database patches on the database
Remember which one we executed last, so next time we know which ones to execute
To keep track of which patch file we executed last, we could simply update a value in our database, first checking the current patch level and then running any later scripts. 

This approach works well and is much better than making manual or undocumented database changes. We could create a simple script that updates our project from source control and then runs any patches automatically, and use this to update our projects on both development and production platforms.

As simple as this solution might look, similar solutions work quite well for many applications and these make a great starting point for implementing database version control in your own projects

Rollback
Of course you hope to never get into this situation, but sometimes you might want roll back a production update. This can for example happen when the application was insufficiently tested, or when the application behaves or performs differently on the production machines than expected. Rolling back your source code is usually quite simple: you can just change the symlink to the directory containing the previous version, or use your version control software to update the code base to an earlier revision. For databases though, reverting is a bit more complicated.

One possible approach is to maintain undo files. This would be a file that does exactly the opposite of what your patch file did, and can be used to revert an update whenever something went wrong.

Bear in mind though that some database operations are not reversible and in those situations this approach is less useful. Imagine a patch file that drops a column from a table and an undo file that creates the column again. Yes, your database schema would now be the same again, but the contents of the column are gone.

One of the main things I found out about database version control is that different projects require different approaches, and while I do not make use of undo files myself, other people rely on them and use them very frequently. Consider the options; does your project require rollback scripts? If you think so, include these in your process. They might one day save your life (or at least your deployment).

Where Is My Silver Bullet?
You might think that there would be a better solution, to compare databases and synchronise any differences between them, for example. However, if you imagine for a moment that you want to create a tool to do just that. It all works fine when adding or removing stuff, but it gets tricky when renaming tables or columns. The tool would not be able to tell whether a column was renamed, or a column was dropped and another column was added (warning: the existing tools that do compare databases always go for the "drop and add" solution if you renamed something). Since such a tool wouldn't be able to tell the difference it would have to guess, and therefore may not work correctly in all situations.

So database version control is not as simple as making two schemas equal to each other; it is about remembering all changes you did in development and then repeating exactly those steps at all other locations. Exactly how you document those steps is not really important. So far we have used SQL, but as we will see later on, other tools use languages like XML or even PHP. No matter what language you use, you will always have to document all the steps you did, in order to repeat exactly those steps somewhere else. Documenting those steps is the part that developers consider to be cumbersome and painful, but it is absolutely required if you want to do database version control in a reliable way.
 
Courtesy : Internet

No comments:

Post a Comment

Popular Posts