Data Migration Diary: migration

Showing posts with label migration. Show all posts

Friday, November 19, 2010

Day 528 - The Start of a New Era

The new plan is now in action but things are about the same.
Insurance policy had a backset. Thins that were stable, are now being rejected and mappings are now suffering many changes.
Stability has still not been achieved and GIS is still a moving target, and the migration team is struggling to keep up with the new plan in order to comply with it.
Tests still have not full started because it's still not possible to load data into GIS.

The start of this new era is fictional because everything is about the same as before.

Monday, September 6, 2010

Day 521 - Re-Plan

Today we have re-planned.
The new dead-line will be April 2nd.
This leaves a lot of time to develop and test things.

Unfortunately management is, in my opinion, doing a terrible mistake when it comes to testing. I've personally helped in the draft test document, where around 100 tests that covered the technical part of the data migration, things such as control counts and sums. I've officially stated, more than once, that the test document should be expanded with the business tests, which should be around the 100 test also. In short, the testing document should cover around 200 tests, minimum.
My professional opinion on this has been ignored, and management identified only 27 tests to be implemented. This is, obviously, insufficient. Only 13.5% of the tests that should be performed will not guarantee the data migration quality.
This is clearly a high risk, but management has been so under pressure due to the time delays that it is making the most old, common and newbie mistake of them all: cutting on quality.

This is entirely a client problem, since its business will have to live with the data as it has been migrated. Everybody knows how bad, dirty and erroneous data impacts a business, but management seems to be ignoring this.

It looks like time is not the only issue with this project. It seems that cost has slip 50% already. No wonder administration is now taking a closer look and paying extra attention to this project.

Day 517 - Back from Vacations

We've just came back from our vacations and we've just found out that things have taken a twist.
The GIS is still not ready, instead of speeding up it actually delayed, thus it should be ready by the end of September.
Also, as I though, administration gave management one last chance to make it. It looks that this time there will be watch points, where management will decide to continue, or cancel the project, depending on its status.

We already have a working plan for this week, which kind of makes it serious when management says this time is for real.

Day 486 - First Simulation is No-Go

The first simulation will not happen this weekend.
As predicted, things must be re-planned since the dead lines will be impossible to accomplish mainly because the GIS software is not ready.
I've just discovered that the claims area is still under development and should be ready somewhere around mid August.

The re-plan will be a though decision for the administration, since management will have to explain very well why a new re-plan is required.
I have the idea that management has one last shot on getting this project done. I believe administration will allow this, last, re-plan, and that it will be the last one.

The data migration team will now go on vacation for the entire month of August.
We expect things to be better when we get back.

Day 482 - Really Bad News

The tests over the weekend when somewhat OK, but the first simulation of the data migration is compromised.
Things are still pretty unstable and there isn't a single test for validation defined. The first simulation will receive a no-go decision from me, and probably from management to, which means a re-plan will most probably occur.

Day 479 - Another Data Set for Weekend Tests

We have set up a new data set for the weekend tests, but things are so delayed that the minimum conditions for the first simulation to happen.
In particular, there isn't a single test defined to validate the data migration.

Monday, July 19, 2010

Day 475 - New GIS Version Ready

The new GIS version has been installed but the data we've prepared has not been loaded.
A new request for the same data has already been made in order to load it over the next weekend.

Within two weeks from the, still planned, first simulation, the target system, GIS, is still unstable, the claims are not fully mapped, account is still untested and automatic tests are still on paper.

We are precisely were we where some weeks ago, when management showed that did believe on this plan.

Thursday, May 20, 2010

Day 413 - Restart

Today there was a migration meeting between the migration team, one management team responsible and I. It seems the migration will be performed in one shot and October is still the official date. But the management team is ready to take a four week delay, which means the migration would happen in November 1st.

This dates have been largely refuted by the migration team for several reasons. The most important of all is that they do not believe that the GIS implementation will be ready on time form compliance with these dates. This disbelieve has resulted in a very interesting discussion between management and the migration team.
The migration team is tired, the project has now an overall of two working years where these people have a work overload, they have to perform all their usual tasks plus they have to develop the migration. The worst problem came from the fact that historically the GIS implementation has failed to deliver the implementation of the defined requirements on time. Plus, there is still an huge amount of work to be done in this area, so the migration team is more than skeptical, they actually do not believe on these dates.
Worst, they are mad about the fact that the management team is being permissive with the GIS development delays and the features are not fully implementing to the define requirements.
After some pressure by the migration team, the management team admitted that they believe on these dates but do not put aside the possibility of another postpone.

On the top of all that, I've pointed out that there's still a considerable amount of work to be done regarding the tests and quality assurance. We already have identified counting checks and summary checks, but the identification of business rules validation is still work to be done.
Once the validations have been identified, they must be development on critical points: at data extraction, at data loading into the staging area, at data extraction from the staging area, and at data loading into GIS.

Nevertheless, the entire team has agreed to make another extra effort in order to try to comply with the current defined date. After a four week stop, the project restart is planed for tomorrow.

Wednesday, May 5, 2010

Day 399 - Incremental Strategy Analysis

I've finished the requested analysis on performing the data migration incrementally.
Actually, it has a low impact for us because the requested split per product is already implemented and used on most of the data migration mappers. With less than three days work one can adapt the remaining mappers to use this strategy.

Day 398 - Kickoff Planing of Migration Strategy

Management seems to believe that the current data migration will happen in the first weekend of October. We have met and discuss a lot about what needs to be done so that all the required tasks can be officially planed.

But the big news is not that management seems to believe on October as the dead-line. The big news is that management actually does not believe on that date as the final date. Management request me to evaluate the impact of an incremental data migration by product. They've said that this idea was fresh new, but they seem to believe that in October only half of the products will be ready on GIS and the rest will not be available for at least two more months. This new strategy will imply that one data migration will occur in October and another one will probably occur three months later, around January 2011.

Thursday, March 25, 2010

Day 358 - Go Live Files Ready

We have received the source files and have loaded them into our staging area.
We have executed the data migration for the entities and created the GIS loading files required for the go live.

The files have been loaded in the quality acceptance environment without a single problem. Tomorrow the testing team will check if everything is in order and, if so, the data will be loaded on the day after.

Day 352 - Data Migration Plan For Go Live

We have just received the plan for the data migration part that is required for the new system kick-off.
We will receive the source data, loaded it in our staging area, execute the transformation rules for the entities and create the GIS loading files for the acceptance and production environments.

Since a postpone has not been granted for the new system go live date, the testing team seems to be working a bit harder and things seem to be a lot better.

Friday, January 22, 2010

297 - Data Loading Files Performance Problem

The migration scope has increased.
New source data has come into play and thus new mapping specifications have been made.

This has resulted in a really big data loading performance problem.
Our data transformation procedure is still fast, but the creation of the GIS loader files from the transformed data is starting to give us some headaches.

The GIS data loader has a flat file structure that is very verbose, the loading of each single value of each record is done through a 300 character text file line.
This means that a single database table row is transformed into, something like, the same number of text lines as the number of columns the row has. Plus the file and records header and footer structure.

As an example, house insurances data transformation is performed in around 4 hours, sequential time, and it generates about 184 millions of records. This is all performed in the transformation server.
These records are then exported into the GIS data loader file format from the Windows directly into the AS/400. This procedure is now taking much time, over 6 hours, sequentially, in the best case scenario.
This is obviously too much time, so we are exploring several hypotheses, ranging from creating a parallel file write process; writing the files locally, with and without compression, and transfer them via FTP to AS/400; cluster indexes with full coverage maintained in a different disk; splitting the database schemes across several disks.
Some of these techniques can, and will, be combined.

We have to tune the process from our side since there is not much the GIS or the AS/400 can do when it comes to massive data load tuning.
We are facing a lot of hard work in these next days.

Wednesday, January 6, 2010

Day 250 - Team Rearrangements

There is a team management small problem.
The teams seem to be rearranged from time to time and this instability is not good for the sake of the project.
For instance, one person that was responsible for the client data migration, has been reassigned several times to other areas, like account and claims, and is now in the testing team. Two other people started to be responsible for the client data migration, one for each system, but it seems that the new people depend on the initial person.
There is another person that is responsible for the data extraction for one of the source systems but it seems now that it is also part of the mapping team.
This happens less on other areas, but it seems that there is an unofficially set of people that is assigned to all areas.

There is a clear group help spirit, which is great, because this eliminates political problems as "this is my migration area, so keep out".

Wednesday, November 25, 2009

Day 238 - Migration Status

Finally I got a set of entities that migrated correctly.
It was actually quite simple, once the mapping rules were correctly specified according with the data contents. The main problem is that the source team had, as usual, some difficult to admit that their data is dirty and its quality is actually a lot lower than they expected. In this particular case, it was the zip codes.
Once they admitted that the zip codes needed specific rules because they were dirty, it became an easy migration task.
In the next few days I expect to migrate other entity sets without errors.

The migration performance as an overall is quite slow, even using 5 parallel executions. We got some times and, as already known, the GIS loading procedure is responsible for the biggest migration times.
Usually we perform the ETL cycle within Data Fusion, which means we deliver the data directly to the target database, i. e. usually we would only have the Data Fusion transformation procedure time, but in this particular scenario we have to create a text file which GIS will use to load the data.
Here's an example of the performance times we're having now:

Car Insurance Claims (470.563 records):

Data Fusion transformation procedure: 51m
File creation for GIS loading: 2h 10m
GIS file loading: 238h sequential time (time calculated based on 16h of loading)

Personal Accident Insurance Claims (29.303 records):

Data Fusion transformation procedure: 1m 51s
File creation for GIS loading: 1m 40s
GIS file loading: 2h 17m with 3 parallel processes (5h 9m sequential time)

Entities (682.569 records):

Data Fusion transformation procedure: 6m 3s
File creation for GIS loading: 23m
GIS file loading: 5 h 55m with 5 parallel processes (27h 17m sequential time)

Tonight GIS will be cleaned up and the AS/400 will be tuned for better performance for file reading. In the next few days I'll get some new times, hopefully better.

Wednesday, April 1, 2009

Day 1 - Kick off

The project has been officially kicked off.

There was the traditional meeting where all teams where present and an overview of the project was presented. All teams involved have previous data migration experience, which is good.

The project should be completed around mid October, but I felt that management confidence to comply with the date is not very high. This is not a bad signal as one might think, since management showed that it is aware of some real difficulties. Those difficulties may become technical challenges and may delay the project.
Nevertheless, I do believe that the original due date can be achieved.

There will be two phases. The first phase, the current one, will use a small entity subset in order to test the performance of the ETL and to tune the process.
The second phase will be the continuity of the previous phase, but now including all applications and entities.

The project itself consists of migrating the current insurance application, running on OS/390 mainframe, into the new system, the GIS Non-Life, running on AS/400, all using DB2, SAM and VSAM.
There are some other satellite applications running on SQL Server 2005 on Windows.
The ETL tool will be Data Fusion and it will run on AS/400 under the Qshell.

The ETL process will be implemented in a slightly different way from the usual scenario. Usually the data is extracted from the source database and loaded into the target database. This is not the case in this project.
Eventually, the data will be loaded into the final target database, but there's an intermediate step. The ETL process performed by Data Fusion will deliver the data on flat files that will be later consumed by the GIS loader. The GIS loader will validate and load the data into the final database.

Since our (ETL team) experience tells us that performance on AS/400 may be a problem during the ETL process, there was some discussion over the performance tests to be performed during phase one.

If the performance becomes an issue, the data migration may have to be incremental, instead of one-shot. This will be a problem because the source team cannot identify which data has been changed and needs to be migrated again to refresh the target system. One challenge that has been given to us, is to think in a fast way to identify if a record has changed and needs to be migrated again, just in case of a B plan is required.

Data Migration Diary

I'm involved on a new data migration project, and since I'm really interested in this area, I'm starting this blog as a diary.

I won't right entries everyday, since there will be times where only "boring" stuff will happen.
But, I'll write as much as possible about the project management, the technical difficulties, the solutions, the workarounds, the mapping process, the business rules, and, of course, the ETL process.

I have experience in data migration projects in several areas, such as banking and public administration. This will be my first data migration project in the insurance area, which will be quite a challenge because the business rules in this area are very complex.

Due to confidentiality purposes, I'll keep the project sponsor and all parts involved incognito.
The only exception will be the references to the technical stuff involved.

Data Migration Diary

Friday, November 19, 2010

Day 528 - The Start of a New Era

Monday, September 6, 2010

Day 521 - Re-Plan

Day 517 - Back from Vacations

Day 486 - First Simulation is No-Go

Day 482 - Really Bad News

Day 479 - Another Data Set for Weekend Tests

Monday, July 19, 2010

Day 475 - New GIS Version Ready

Thursday, May 20, 2010

Day 413 - Restart

Wednesday, May 5, 2010

Day 399 - Incremental Strategy Analysis

Day 398 - Kickoff Planing of Migration Strategy

Thursday, March 25, 2010

Day 358 - Go Live Files Ready

Day 352 - Data Migration Plan For Go Live

Friday, January 22, 2010

297 - Data Loading Files Performance Problem

Wednesday, January 6, 2010

Day 250 - Team Rearrangements

Wednesday, November 25, 2009

Day 238 - Migration Status

Wednesday, April 1, 2009

Day 1 - Kick off

Data Migration Diary

Search This Blog

Blog Archive

Followers

About Me

Friday, November 19, 2010

Monday, September 6, 2010

Monday, July 19, 2010

Thursday, May 20, 2010

Wednesday, May 5, 2010

Thursday, March 25, 2010

Friday, January 22, 2010

Wednesday, January 6, 2010

Wednesday, November 25, 2009

Wednesday, April 1, 2009

Search This Blog

Blog Archive

Follow by RSS

Followers

About Me