Time travel, 20 questions and the power of free soda.

Yesterday I presented 20 questions that I think are a good starting point when designing or evaluating a version control migration product. The point is basically to get folks thinking about what should be migrated, how it should be migrated and what would happen if that data were not there.

Last night I conjured up myself from the past. May 1999 to be specific. The European world was still in shock from the introduction of the Euro, geeks everywhere were still in awe of The Matrix, financial planners were shouting about the benefits of the Roth IRA and I was sitting in my cubical trying to figure out how to rationalize many different version control systems, build processes and development teams into a single (or at least fewer) process resembling something sane. Life was much easier just a few months earlier when I was still in college. The answers were no longer in the back of the book.

 I checked.

In 1999 I'm responsible for several large VSS databases (4-8GB), a PVCS system and a custom version control system. All told there is something on the order of 10 years of history on a 22 million LOC project spread throughout multiple systems and we are researching moving to new, single, VC solution. I handed my 1999 self the list of 20 questions and one of the free sodas from a Microsoft cafe.

My 1999 self still has to pay for his caffeine so this seemed like a fair deal.

45 minutes later this is what he gave back… next time I'll share what my 2006 self thought of all this.

1) What versioned items do you absolutely have to have in the target system?

The system must be able to migrate any type of versioned asset (Source code, images, projects, documents, etc) that is stored in the source system as a retrievable directory or file.

Specifically we need to be able to migrate the projects we are currently working on but we might want to back-fill old projects later so that we can refer to the code in the new system.

2) Do you need history for those items?

a. If so – how much?

We do want history for these items. We would like to be able to migrate the full history for the items we choose to migrate. We would also like the ability to start migration a specific point in time or a label. This is because we have one branch “WidgetToolbox” that was completely flattened (everything deleted) at one point because of a botched integration. Since we know that the actions prior to the correction of the branch are all bogus we don’t want to spend time migrating hundreds of thousands of items that are just going to be deleted anyway.

3) What would happen if you did not have history for those items?

Not having history would mean that we would need to keep the source system around for a period of time for reference. This would mean maintaining the hardware and software as well as keeping at least a limited number of licenses current. This will result in over $10,000 per year in licensing costs as well as about $15,000 in maintenance time and one-time hardware costs (since we can’t reuse the existing hardware we will need to buy new hardware for the new system).

Further we anticipate that the average developer will lose 5 hours of time over the next 12 months due to lost history. With 200 developers on staff and a development “burn rate” (the cost of an unproductive developer) of $150 per hour this will result in $150,000 in lost productivity.

4) Do you have internal processes that rely on the ability to query against well-known labels?

a. If so – can the process be changed easily?

We have several processes that rely on version control labels. Some of them will be easy to fix an others not. The most important is a code promotion process where labels are applied at various stages in the development process to indicate code that is ready for test or production. We are researching replacing this with a branching model but we have not finished this research yet and have not yet converted any tools to that new process.

Also we label the source code after each build in order to reproduce builds at a later date. Having these labels is important as it allows developers to recreate an old build when reproducing a bug or debugging an old build.

5) Can those labels be recreated manually in the target system?

They could but there is some risk since we would need to know we did it right. I estimate it would take several days to implement this and test it properly. But the automation tools to do the promotions would still need to be re-written to support the new version control system. We are hesitant to do that when we are also rewriting the tools to support a new promotion model based around branching.

We could live without all of the build labels but there are some key ones we would like – such as publically shipped betas and final releases. If we could select 5 or 10 labels and have only those migrated it would save us a ton of time and make us more confident that we would be able to produce bug fixes in the new system.

 

6) What would happen if those labels were not available?

If this process were to stop working we would be unable to easily promote code from dev to test to production. This would cause an immediate drop in productivity and create an inability to create new production builds which could breach our customer support contracts.

Not being able to recreate an old build would mean that developers would need to go back to the old system frequently during the first few months since we are still getting frequent feedback from our customers that need to be investigated. Over time the need would probably ease off – but frankly that means that the cost of each incident will probably rise since we wouldn’t be as prepared for using both systems.

7) Do you have a need to have complete and accurate integration history in the target system (i.e. if there was a branch of Foo to Bar in the source system can the new system just have Foo and Bar as unrelated versioned items or does the integration link need to exist?)

It would be nice to have this but I don’t think that it is strictly necessary. Since our promotion model is based around labels right now we would not take a major hit because of this.

 

8) What would happen if that integration history were lost?

We would have duplicated content in the VC system which could result in a larger then necessary VC database. Since our dataset is not huge (under 100GB) even a doubling is only an additional few thousand in SAN drive costs. However we would need to know ahead of time how much SAN space to allocate to avoid running out of space during the migration.

9) Can you recreate it manually using a baseless merge?

I think this would work. It would allow us to do merges later on but it would not show the merge relationship in the UI which will require us to do merging via the command line going forward.

10) Does your source system make use of complex workspace or branch mappings in order to allow sparse branches or workspaces in a manner that would incompatible with your target system? (e.g. ClearCase’s branching model is fundamentally different than TFS and the Perforce client mapping syntax is more robust than the TFS)

No.

11) Do the source and target systems have operational parity?

a. If not – how will you address migrating those? Examples include:

i. Rename? (TFS does, Perforce does not)

ii. Destroy? (ClearCase does, TFS does not)

iii. Sharing? (VSS does, TFS does not)

iv. Perforce and VSS support keyword expansion, TFS does not

 

We don’t know but we do expect this to work. We know that keyword expansion will be lost but we expect that content to still migrate properly (i.e. don’t skip the files)

 

12) Are there naming convention rules that need to be addressed between the source and target system?

a. For example TFS cannot have items that have a path segment that begin with ‘$’

b. Perforce reserves ‘#’ and ‘@’ for revision identifiers

c. TFS has a 260 character path limit (other systems have more, others less)

d. TFS is not case sensitive (E.txt is the same as e.TXT) whereas solutions with unix clients often are case sensitive (E.txt is not the same as e.TXT).

We have identified some files that start path segments with ‘$’ but we can rename these files prior to migration. If we try to migrate the history of these files they will still have that old name. Ideally we’d like to be able to map the names to something else so that when they are encountered they are automatically handled.

 

13) Will users, security groups and permissions be migrated?

This would be nice but is not a must have. We use security groups for permissions so adding a few manually is not going to be a big deal. Also we don’t have many security policies in place. Basically you can either read all of the source code, or you have R/W access or you have no access.

14) Will items that are locked in the source system remain locked in the target?

We don’t use item locking.

 

15) Will items that have an encoding associated with them retain that encoding in the target system?

This is very important to us. We have many files that are in specific encodings that we need to retain that encoding in the new system. Additionally we have some files that appear to be text files but which are typed as binary files. So we need to actually encoding, not the auto-detected encoding, to be migrated.

 

16) Will workspaces be migrated?

Our developers have indicated that this would not be something they need. The amount of time they spend setting up their workspace pales in comparison to the amount of time the download of the workspace content takes. Also they have many unused workspaces on the source system and migrating them over would just bring over additional baggage.

There is one fellow with a really complex workspace who would like to know if a single workspace could be migrated.

 

17) Will links between items be migrated?

This would be great. Our current system allows linking workitems and changesets together to show that a checking was due to a specific bug. We use this data when generating reports and when reviewing change history.

 

18) Will changeset metadata be retained?

a. Will timestamps, comments and the original author be retained?

b. If the source system allows custom metadata (e.g. ClearCase) but the target system does not will that metadata simply be lost or captured somehow?

We definitely want the checkin comments retained. We would like the timestamps to be retained but we understand that TFS does not support this.

Having the original author retained would be very useful but we’ve had hundreds of people come and go. We don’t want to have to manage a large mapping file to map non-existent users to valid accounts. We’d like some sort of fall-back mechanism that defaults invalid users to a single account.

 

19) How does the migration tool respond to failure mid-way through the process?

a. Can it recover and pick up where it left off?

b. Does it require manual intervention to get going again?

c. Is the tool able to consolidate the logs for multiple runs into a single view?

The tool has to be able to recover from a crash and pick up where it left off. Our migration will be rather lengthy and starting over is not an option we would be happy with. We are ok with it requiring a minimal amount of intervention – for example restarting a process. However we would like to not have to do anything if possible.

Log consolidation is very important. We need to be able to report a single status at the end of the migration. Having many logs to sift through would make this nearly impossible. Also we can’t risk having a new log over-write an old log file since that would cause a loss of history that we would not be able to recover from.

 

20) How long will the migration take (or how will I know how much time is left)?

We aren’t as concerned with how long migration will take as we are with knowing what has been done, what is left to do and with not having to babysit the process continually. 4 days of computer time is practically free if we don’t need to have someone sitting at the machine watching the screen. If we need to be continually checking the system to watch for dialog boxes or command line prompts then we are throwing money away. If there is user interaction needed there should be a notification system that can let us know to look instead of making us check.