We've got the data ready for loading, as requested by management, for this weekend test.
When I went to talk with management, I was surprised to know that probably the data will not be loaded because a new GIS version was going to be installed. And it was not only me that was surprised, management didn't new this was going to happen!
Definitely management is unable to manage this project, planed things do not happen when they should and unplanned things with high impact happen without management acknowledgment.
Because of this new GIS version installation this means we have been working the last two days for nothing and the testing team will have to perform the same old ad hoc tests again.
This was also the day that the administration received the bad news. We, the data migration team, have officially stated that the immaturity of GIS, the current state of mapping rules, and the nonexistence of automated tests have made the first simulation, still scheduled for the last weekend of this month, impractical.
We also officially stated that this has a direct impact on the overall project and a new plan is now required because the current one will not happen.
Showing posts with label data. Show all posts
Showing posts with label data. Show all posts
Monday, July 19, 2010
Day 470 - New Data Set for Weekend Tests
During the weekly team meeting I got a request from management to prepare a very specific data set to test during this weekend.
It is not hard do get it, but the quantity of work involved will fulfill us for a couple of days.
During the meeting I've told management that by now the implementation of the automatic tests should have been finished. But in fact they have not even started. Plus, management has decided to cut the number of tests to implement and perform. I've officially stated that, in my professional opinion, the original number of tests should be expanded and not cut, since there was still many business things left out untested. But management things the opposite and they rule, it's their project.
Again, as I've been doing for over the past weeks, I've told management that the first simulation, and the entire plan, will be compromised if no action is taken this week.
It is not hard do get it, but the quantity of work involved will fulfill us for a couple of days.
During the meeting I've told management that by now the implementation of the automatic tests should have been finished. But in fact they have not even started. Plus, management has decided to cut the number of tests to implement and perform. I've officially stated that, in my professional opinion, the original number of tests should be expanded and not cut, since there was still many business things left out untested. But management things the opposite and they rule, it's their project.
Again, as I've been doing for over the past weeks, I've told management that the first simulation, and the entire plan, will be compromised if no action is taken this week.
Day 468 - New Week, Old Problems
Data has been loaded into GIS with some errors. We already expected errors from the account area, that was not properly tested, and the claims are not even close to be totally defined.
On the top of that, insurances have suffered a review which will result on a set of changes and, of course, a new set of tests.
Only three weeks left for the first simulation, but things are pretty unstable. I forecast a no-go decision for that first simulation. And if things don't get straight, I forecast a re-plan of the entire data migration project.
On the top of that, insurances have suffered a review which will result on a set of changes and, of course, a new set of tests.
Only three weeks left for the first simulation, but things are pretty unstable. I forecast a no-go decision for that first simulation. And if things don't get straight, I forecast a re-plan of the entire data migration project.
Day 454 - New GIS Version Required
A new GIS software version is required in order to overcome a problem in account that has been stalled for weeks.
Time is running out and the new GIS version should be available at the end of this week.
This will be a major improvement in the account area because the mapping and testing teams will finally be able to define it and test it.
Time is running out and the new GIS version should be available at the end of this week.
This will be a major improvement in the account area because the mapping and testing teams will finally be able to define it and test it.
Labels:
data,
management,
mapping,
problems,
team,
tests,
transformation rules
Day 447 - Small Vacations
Lots of people, me included, has taken some time off due to a couple of holidays.
This means that the mapping team has not performed as fast as it was required, thus the project did not recover the lost time as it needed.
The actions taken by management and the mapping team show that it will be very difficult to comply with the current plan.
This means that the mapping team has not performed as fast as it was required, thus the project did not recover the lost time as it needed.
The actions taken by management and the mapping team show that it will be very difficult to comply with the current plan.
Friday, May 28, 2010
Day 422 - Unstable Mapping
Things are not going the way I, and the management team, was expecting.
After the restart, teams start working with focused on the current official project dates, even if they don't really believe it.
But the last GIS loadings we've performed showed that there was a major mapping regression. For instance car insurances were all loading correctly and now there isn't a single insurance that loads correctly, they have all been rejected.
It turns out that the products configuration in GIS, that should be concluded some weeks ago, has changed again.
And that was not all, all the product insurance mapping rules are changing. This mapping instability means that the project has an overall regression when there's only 8 weeks left to the first data migration simulation.
But this is not all. When the testing team tried to perform tests, they canceled the task because there was nothing that they could test. The migrated data that was loaded and available for testing will change soon.
After the restart, teams start working with focused on the current official project dates, even if they don't really believe it.
But the last GIS loadings we've performed showed that there was a major mapping regression. For instance car insurances were all loading correctly and now there isn't a single insurance that loads correctly, they have all been rejected.
It turns out that the products configuration in GIS, that should be concluded some weeks ago, has changed again.
And that was not all, all the product insurance mapping rules are changing. This mapping instability means that the project has an overall regression when there's only 8 weeks left to the first data migration simulation.
But this is not all. When the testing team tried to perform tests, they canceled the task because there was nothing that they could test. The migrated data that was loaded and available for testing will change soon.
Labels:
data,
management,
mapping,
problems,
team,
tests,
transformation rules
Wednesday, March 17, 2010
Day 335 - Reprioritization
The project sponsor is not very comfortable with the current dead line compliance, so management has reprioritized the migration of the business areas.
This happened because management believes that there is need for a greater focus on some business areas where it comes to the data migration.
In practice, this means that credit insurance has been defined as non-priority data to be migrated.
I'm actively involved in this business area and my opinion is that this is actually a mistake.
Credit insurance is equivalent with all other business areas when it comes to mapping and errors, and the other business areas are not "on hold" because of the migration.
The reason the project sponsor is afraid is because the mapping team cannot solve the problems faster or there are dependencies from other players and the testing team is no longer working daily.
There are several management mistakes here. Testing should be a priority now, but the tests have been reduced and the testing team, which was performing ad-hoc tests 4 hours per day, is only doing up to 16 hours of testing per week. Some tests should have been automated by now, but there is not even a plan to do so, so testing will continue to be ad-hoc.
Management should focus on forcing third party players to comply with their tasks and dates instead of reducing the scope of data being migrated. As an example, we are waiting for some data files for at least 6 weeks, but the third party responsible for it doesn't actually care. They've sent a couple of files last week, to shut us up, and they came all wrong...
This happened because management believes that there is need for a greater focus on some business areas where it comes to the data migration.
In practice, this means that credit insurance has been defined as non-priority data to be migrated.
I'm actively involved in this business area and my opinion is that this is actually a mistake.
Credit insurance is equivalent with all other business areas when it comes to mapping and errors, and the other business areas are not "on hold" because of the migration.
The reason the project sponsor is afraid is because the mapping team cannot solve the problems faster or there are dependencies from other players and the testing team is no longer working daily.
There are several management mistakes here. Testing should be a priority now, but the tests have been reduced and the testing team, which was performing ad-hoc tests 4 hours per day, is only doing up to 16 hours of testing per week. Some tests should have been automated by now, but there is not even a plan to do so, so testing will continue to be ad-hoc.
Management should focus on forcing third party players to comply with their tasks and dates instead of reducing the scope of data being migrated. As an example, we are waiting for some data files for at least 6 weeks, but the third party responsible for it doesn't actually care. They've sent a couple of files last week, to shut us up, and they came all wrong...
Friday, January 22, 2010
297 - Data Loading Files Performance Problem
The migration scope has increased.
New source data has come into play and thus new mapping specifications have been made.
This has resulted in a really big data loading performance problem.
Our data transformation procedure is still fast, but the creation of the GIS loader files from the transformed data is starting to give us some headaches.
The GIS data loader has a flat file structure that is very verbose, the loading of each single value of each record is done through a 300 character text file line.
This means that a single database table row is transformed into, something like, the same number of text lines as the number of columns the row has. Plus the file and records header and footer structure.
As an example, house insurances data transformation is performed in around 4 hours, sequential time, and it generates about 184 millions of records. This is all performed in the transformation server.
These records are then exported into the GIS data loader file format from the Windows directly into the AS/400. This procedure is now taking much time, over 6 hours, sequentially, in the best case scenario.
This is obviously too much time, so we are exploring several hypotheses, ranging from creating a parallel file write process; writing the files locally, with and without compression, and transfer them via FTP to AS/400; cluster indexes with full coverage maintained in a different disk; splitting the database schemes across several disks.
Some of these techniques can, and will, be combined.
We have to tune the process from our side since there is not much the GIS or the AS/400 can do when it comes to massive data load tuning.
We are facing a lot of hard work in these next days.
New source data has come into play and thus new mapping specifications have been made.
This has resulted in a really big data loading performance problem.
Our data transformation procedure is still fast, but the creation of the GIS loader files from the transformed data is starting to give us some headaches.
The GIS data loader has a flat file structure that is very verbose, the loading of each single value of each record is done through a 300 character text file line.
This means that a single database table row is transformed into, something like, the same number of text lines as the number of columns the row has. Plus the file and records header and footer structure.
As an example, house insurances data transformation is performed in around 4 hours, sequential time, and it generates about 184 millions of records. This is all performed in the transformation server.
These records are then exported into the GIS data loader file format from the Windows directly into the AS/400. This procedure is now taking much time, over 6 hours, sequentially, in the best case scenario.
This is obviously too much time, so we are exploring several hypotheses, ranging from creating a parallel file write process; writing the files locally, with and without compression, and transfer them via FTP to AS/400; cluster indexes with full coverage maintained in a different disk; splitting the database schemes across several disks.
Some of these techniques can, and will, be combined.
We have to tune the process from our side since there is not much the GIS or the AS/400 can do when it comes to massive data load tuning.
We are facing a lot of hard work in these next days.
Labels:
as/400,
data,
data migration,
etl,
gis,
intel,
map,
mapping,
migration,
performance,
problems,
system i,
transformation rules,
windows
Wednesday, April 1, 2009
Day 1 - Kick off
The project has been officially kicked off.
There was the traditional meeting where all teams where present and an overview of the project was presented. All teams involved have previous data migration experience, which is good.
The project should be completed around mid October, but I felt that management confidence to comply with the date is not very high. This is not a bad signal as one might think, since management showed that it is aware of some real difficulties. Those difficulties may become technical challenges and may delay the project.
Nevertheless, I do believe that the original due date can be achieved.
There will be two phases. The first phase, the current one, will use a small entity subset in order to test the performance of the ETL and to tune the process.
The second phase will be the continuity of the previous phase, but now including all applications and entities.
The project itself consists of migrating the current insurance application, running on OS/390 mainframe, into the new system, the GIS Non-Life, running on AS/400, all using DB2, SAM and VSAM.
There are some other satellite applications running on SQL Server 2005 on Windows.
The ETL tool will be Data Fusion and it will run on AS/400 under the Qshell.
The ETL process will be implemented in a slightly different way from the usual scenario. Usually the data is extracted from the source database and loaded into the target database. This is not the case in this project.
Eventually, the data will be loaded into the final target database, but there's an intermediate step. The ETL process performed by Data Fusion will deliver the data on flat files that will be later consumed by the GIS loader. The GIS loader will validate and load the data into the final database.
Since our (ETL team) experience tells us that performance on AS/400 may be a problem during the ETL process, there was some discussion over the performance tests to be performed during phase one.
If the performance becomes an issue, the data migration may have to be incremental, instead of one-shot. This will be a problem because the source team cannot identify which data has been changed and needs to be migrated again to refresh the target system. One challenge that has been given to us, is to think in a fast way to identify if a record has changed and needs to be migrated again, just in case of a B plan is required.
There was the traditional meeting where all teams where present and an overview of the project was presented. All teams involved have previous data migration experience, which is good.
The project should be completed around mid October, but I felt that management confidence to comply with the date is not very high. This is not a bad signal as one might think, since management showed that it is aware of some real difficulties. Those difficulties may become technical challenges and may delay the project.
Nevertheless, I do believe that the original due date can be achieved.
There will be two phases. The first phase, the current one, will use a small entity subset in order to test the performance of the ETL and to tune the process.
The second phase will be the continuity of the previous phase, but now including all applications and entities.
The project itself consists of migrating the current insurance application, running on OS/390 mainframe, into the new system, the GIS Non-Life, running on AS/400, all using DB2, SAM and VSAM.
There are some other satellite applications running on SQL Server 2005 on Windows.
The ETL tool will be Data Fusion and it will run on AS/400 under the Qshell.
The ETL process will be implemented in a slightly different way from the usual scenario. Usually the data is extracted from the source database and loaded into the target database. This is not the case in this project.
Eventually, the data will be loaded into the final target database, but there's an intermediate step. The ETL process performed by Data Fusion will deliver the data on flat files that will be later consumed by the GIS loader. The GIS loader will validate and load the data into the final database.
Since our (ETL team) experience tells us that performance on AS/400 may be a problem during the ETL process, there was some discussion over the performance tests to be performed during phase one.
If the performance becomes an issue, the data migration may have to be incremental, instead of one-shot. This will be a problem because the source team cannot identify which data has been changed and needs to be migrated again to refresh the target system. One challenge that has been given to us, is to think in a fast way to identify if a record has changed and needs to be migrated again, just in case of a B plan is required.
Labels:
as/400,
data,
data fusion,
data migration,
db2,
etl,
gis,
migration,
os/390,
sql server
Data Migration Diary
I'm involved on a new data migration project, and since I'm really interested in this area, I'm starting this blog as a diary.
I won't right entries everyday, since there will be times where only "boring" stuff will happen.
But, I'll write as much as possible about the project management, the technical difficulties, the solutions, the workarounds, the mapping process, the business rules, and, of course, the ETL process.
I have experience in data migration projects in several areas, such as banking and public administration. This will be my first data migration project in the insurance area, which will be quite a challenge because the business rules in this area are very complex.
Due to confidentiality purposes, I'll keep the project sponsor and all parts involved incognito.
The only exception will be the references to the technical stuff involved.
I won't right entries everyday, since there will be times where only "boring" stuff will happen.
But, I'll write as much as possible about the project management, the technical difficulties, the solutions, the workarounds, the mapping process, the business rules, and, of course, the ETL process.
I have experience in data migration projects in several areas, such as banking and public administration. This will be my first data migration project in the insurance area, which will be quite a challenge because the business rules in this area are very complex.
Due to confidentiality purposes, I'll keep the project sponsor and all parts involved incognito.
The only exception will be the references to the technical stuff involved.
Subscribe to:
Posts (Atom)