This weekend data migration was stroke hard by Murphy's law.
It was terrible. It really seemed that everything that could go wrong, went wrong.
Many of the failed data loads will be repeated during the week.
And in the next weekend we'll try again with another data set.
Showing posts with label problems. Show all posts
Showing posts with label problems. Show all posts
Monday, February 7, 2011
Tuesday, January 11, 2011
Day 651 - Distinct Views of the Same Situation
Management and me disagree when it comes to the current status of the data migration project.
Management seems happy with it and I'm concerned with the quantity of change requests we're getting when we're so near the next big data migration.
After that we'll have only one month before the beginning of the simulations and I'm not happy because I already know things will change a lot again.
I'm very concerned with this kind of undesired activity since it may be a no-go factor.
Management has not taken my opinion too seriously and has assured its own team things are ok.
Management seems happy with it and I'm concerned with the quantity of change requests we're getting when we're so near the next big data migration.
After that we'll have only one month before the beginning of the simulations and I'm not happy because I already know things will change a lot again.
I'm very concerned with this kind of undesired activity since it may be a no-go factor.
Management has not taken my opinion too seriously and has assured its own team things are ok.
Day 647 - New Patch
It's unbelievable the quantity of patches GIS has already received in order to increase its functionality and to fix the bugs previous patches have introduced!
I've lost two mornings working with the functional team in the account module just to find out that we were on a witch chase.
The problem was on GIS and we have just been informed that the new patch fixes the errors we were getting. The problem is that these errors only appeared after a previous software installation the week before.
I've lost two mornings working with the functional team in the account module just to find out that we were on a witch chase.
The problem was on GIS and we have just been informed that the new patch fixes the errors we were getting. The problem is that these errors only appeared after a previous software installation the week before.
Friday, November 19, 2010
Day 573 - DB2 Crash
This weekend we are going to load a specific sample in the quality system for integration tests.
Unfortunately this task were almost compromised because our DB2 staging area database crashed real hard.
It took me around 4 hours to put everything working again, and I had to reconstruct the entire staging area responsible for holding the transformed data.
Unfortunately this task were almost compromised because our DB2 staging area database crashed real hard.
It took me around 4 hours to put everything working again, and I had to reconstruct the entire staging area responsible for holding the transformed data.
Monday, July 19, 2010
Day 472 - Bad, Bad News.
We've got the data ready for loading, as requested by management, for this weekend test.
When I went to talk with management, I was surprised to know that probably the data will not be loaded because a new GIS version was going to be installed. And it was not only me that was surprised, management didn't new this was going to happen!
Definitely management is unable to manage this project, planed things do not happen when they should and unplanned things with high impact happen without management acknowledgment.
Because of this new GIS version installation this means we have been working the last two days for nothing and the testing team will have to perform the same old ad hoc tests again.
This was also the day that the administration received the bad news. We, the data migration team, have officially stated that the immaturity of GIS, the current state of mapping rules, and the nonexistence of automated tests have made the first simulation, still scheduled for the last weekend of this month, impractical.
We also officially stated that this has a direct impact on the overall project and a new plan is now required because the current one will not happen.
When I went to talk with management, I was surprised to know that probably the data will not be loaded because a new GIS version was going to be installed. And it was not only me that was surprised, management didn't new this was going to happen!
Definitely management is unable to manage this project, planed things do not happen when they should and unplanned things with high impact happen without management acknowledgment.
Because of this new GIS version installation this means we have been working the last two days for nothing and the testing team will have to perform the same old ad hoc tests again.
This was also the day that the administration received the bad news. We, the data migration team, have officially stated that the immaturity of GIS, the current state of mapping rules, and the nonexistence of automated tests have made the first simulation, still scheduled for the last weekend of this month, impractical.
We also officially stated that this has a direct impact on the overall project and a new plan is now required because the current one will not happen.
Day 468 - New Week, Old Problems
Data has been loaded into GIS with some errors. We already expected errors from the account area, that was not properly tested, and the claims are not even close to be totally defined.
On the top of that, insurances have suffered a review which will result on a set of changes and, of course, a new set of tests.
Only three weeks left for the first simulation, but things are pretty unstable. I forecast a no-go decision for that first simulation. And if things don't get straight, I forecast a re-plan of the entire data migration project.
On the top of that, insurances have suffered a review which will result on a set of changes and, of course, a new set of tests.
Only three weeks left for the first simulation, but things are pretty unstable. I forecast a no-go decision for that first simulation. And if things don't get straight, I forecast a re-plan of the entire data migration project.
Day 454 - New GIS Version Required
A new GIS software version is required in order to overcome a problem in account that has been stalled for weeks.
Time is running out and the new GIS version should be available at the end of this week.
This will be a major improvement in the account area because the mapping and testing teams will finally be able to define it and test it.
Time is running out and the new GIS version should be available at the end of this week.
This will be a major improvement in the account area because the mapping and testing teams will finally be able to define it and test it.
Labels:
data,
management,
mapping,
problems,
team,
tests,
transformation rules
Day 427 - No Real Improvements
Things are not going as expected.
The mapping is still unstable and there are delays in solving the problems.
The test team is not working full time since there is not much to test.
Management is trying to act in order to put everything on track again, I sincerely hope it can go on track. There is still time, with an added effort, to make things work on time.
The mapping is still unstable and there are delays in solving the problems.
The test team is not working full time since there is not much to test.
Management is trying to act in order to put everything on track again, I sincerely hope it can go on track. There is still time, with an added effort, to make things work on time.
Friday, May 28, 2010
Day 422 - Unstable Mapping
Things are not going the way I, and the management team, was expecting.
After the restart, teams start working with focused on the current official project dates, even if they don't really believe it.
But the last GIS loadings we've performed showed that there was a major mapping regression. For instance car insurances were all loading correctly and now there isn't a single insurance that loads correctly, they have all been rejected.
It turns out that the products configuration in GIS, that should be concluded some weeks ago, has changed again.
And that was not all, all the product insurance mapping rules are changing. This mapping instability means that the project has an overall regression when there's only 8 weeks left to the first data migration simulation.
But this is not all. When the testing team tried to perform tests, they canceled the task because there was nothing that they could test. The migrated data that was loaded and available for testing will change soon.
After the restart, teams start working with focused on the current official project dates, even if they don't really believe it.
But the last GIS loadings we've performed showed that there was a major mapping regression. For instance car insurances were all loading correctly and now there isn't a single insurance that loads correctly, they have all been rejected.
It turns out that the products configuration in GIS, that should be concluded some weeks ago, has changed again.
And that was not all, all the product insurance mapping rules are changing. This mapping instability means that the project has an overall regression when there's only 8 weeks left to the first data migration simulation.
But this is not all. When the testing team tried to perform tests, they canceled the task because there was nothing that they could test. The migrated data that was loaded and available for testing will change soon.
Labels:
data,
management,
mapping,
problems,
team,
tests,
transformation rules
Thursday, May 20, 2010
Day 413 - Restart
Today there was a migration meeting between the migration team, one management team responsible and I. It seems the migration will be performed in one shot and October is still the official date. But the management team is ready to take a four week delay, which means the migration would happen in November 1st.
This dates have been largely refuted by the migration team for several reasons. The most important of all is that they do not believe that the GIS implementation will be ready on time form compliance with these dates. This disbelieve has resulted in a very interesting discussion between management and the migration team.
The migration team is tired, the project has now an overall of two working years where these people have a work overload, they have to perform all their usual tasks plus they have to develop the migration. The worst problem came from the fact that historically the GIS implementation has failed to deliver the implementation of the defined requirements on time. Plus, there is still an huge amount of work to be done in this area, so the migration team is more than skeptical, they actually do not believe on these dates.
Worst, they are mad about the fact that the management team is being permissive with the GIS development delays and the features are not fully implementing to the define requirements.
After some pressure by the migration team, the management team admitted that they believe on these dates but do not put aside the possibility of another postpone.
On the top of all that, I've pointed out that there's still a considerable amount of work to be done regarding the tests and quality assurance. We already have identified counting checks and summary checks, but the identification of business rules validation is still work to be done.
Once the validations have been identified, they must be development on critical points: at data extraction, at data loading into the staging area, at data extraction from the staging area, and at data loading into GIS.
Nevertheless, the entire team has agreed to make another extra effort in order to try to comply with the current defined date. After a four week stop, the project restart is planed for tomorrow.
This dates have been largely refuted by the migration team for several reasons. The most important of all is that they do not believe that the GIS implementation will be ready on time form compliance with these dates. This disbelieve has resulted in a very interesting discussion between management and the migration team.
The migration team is tired, the project has now an overall of two working years where these people have a work overload, they have to perform all their usual tasks plus they have to develop the migration. The worst problem came from the fact that historically the GIS implementation has failed to deliver the implementation of the defined requirements on time. Plus, there is still an huge amount of work to be done in this area, so the migration team is more than skeptical, they actually do not believe on these dates.
Worst, they are mad about the fact that the management team is being permissive with the GIS development delays and the features are not fully implementing to the define requirements.
After some pressure by the migration team, the management team admitted that they believe on these dates but do not put aside the possibility of another postpone.
On the top of all that, I've pointed out that there's still a considerable amount of work to be done regarding the tests and quality assurance. We already have identified counting checks and summary checks, but the identification of business rules validation is still work to be done.
Once the validations have been identified, they must be development on critical points: at data extraction, at data loading into the staging area, at data extraction from the staging area, and at data loading into GIS.
Nevertheless, the entire team has agreed to make another extra effort in order to try to comply with the current defined date. After a four week stop, the project restart is planed for tomorrow.
Labels:
management,
migration,
problems,
quality,
team,
validation
Thursday, April 22, 2010
Day 386 - Official Plan Released
The new official data migration dates have been released.
It seems we're going to perform a simulation in the weekend before we all go to holidays, which will be the entire month of August. I don't really believe that this will happen since there is a lot of work to be done by the project sponsor teams, specially when it comes to quality. There was no more tests performed since the testing team has come form Easter holidays and it seems that there is no implementation plan regarding the tests we've identified some weeks ago.
We are still 3 months away, so it can actually be done if the sponsor acts fast.
When we get back from the holidays, in the first week of September, we will only be performing simulations until the real data migration happens in the beginning of October.
There are rumors from some sponsor people that the this new data migration date will not happen and that the project will be postponed again to January. These people are the same that have already predicted the other postpones and their justifications for it were always right. I'm afraid they are right again since they are pointing out the quality of the data migration and the quality of new system as major show stoppers. As I've stated before, a lot of testing is still required.
It seems we're going to perform a simulation in the weekend before we all go to holidays, which will be the entire month of August. I don't really believe that this will happen since there is a lot of work to be done by the project sponsor teams, specially when it comes to quality. There was no more tests performed since the testing team has come form Easter holidays and it seems that there is no implementation plan regarding the tests we've identified some weeks ago.
We are still 3 months away, so it can actually be done if the sponsor acts fast.
When we get back from the holidays, in the first week of September, we will only be performing simulations until the real data migration happens in the beginning of October.
There are rumors from some sponsor people that the this new data migration date will not happen and that the project will be postponed again to January. These people are the same that have already predicted the other postpones and their justifications for it were always right. I'm afraid they are right again since they are pointing out the quality of the data migration and the quality of new system as major show stoppers. As I've stated before, a lot of testing is still required.
Thursday, April 8, 2010
Day 372 - Project Due Date Postponed Again
The project due date has been postponed four more months. It looks likes the data migration will now happen in the 4th of October.
This happened mainly because the target system will not be fully developed up to June.
It looks management wants to keep the data migration plan to June and give us some months of force holidays. I do hope management won't go with this plan because it is impractical.
It is wrong to believe that the transformation rules can be frozen for 3 months when there will be massive development on the target system. And the project sponsor teams already know it by experience, whenever it was required to change something on the target system there was always a considerable amount of mapping rules that had to be redefined, implemented and tested. Plus, it is also wrong to believe that whatever tests the testing team will perform over this 3 months will not require any mapping changes.
This postpone is a great opportunity to define and implement the test plan. Some of the people from the testing team, which I've been talking to, have seen the number of preliminary tests we've identified and agree that it would be unfeasible to define and implement a testing plan if the data migration was to be performed in June.
This is actually the first time I'm on a data migration project were, once set, the official D-Day had been postponed. And in this project, it has already happened twice!
Like in the first postpone, and as far as I know, the data migration itself is one of the few projects that was on schedule by the previous plan.
As in the previous postpone, this probably means that we will have extra resources for most of the time, considering the amount of work of the data migration team in this stage.
But it is still not clear what management will do regarding all the changes that this postpone brings.
This happened mainly because the target system will not be fully developed up to June.
It looks management wants to keep the data migration plan to June and give us some months of force holidays. I do hope management won't go with this plan because it is impractical.
It is wrong to believe that the transformation rules can be frozen for 3 months when there will be massive development on the target system. And the project sponsor teams already know it by experience, whenever it was required to change something on the target system there was always a considerable amount of mapping rules that had to be redefined, implemented and tested. Plus, it is also wrong to believe that whatever tests the testing team will perform over this 3 months will not require any mapping changes.
This postpone is a great opportunity to define and implement the test plan. Some of the people from the testing team, which I've been talking to, have seen the number of preliminary tests we've identified and agree that it would be unfeasible to define and implement a testing plan if the data migration was to be performed in June.
This is actually the first time I'm on a data migration project were, once set, the official D-Day had been postponed. And in this project, it has already happened twice!
Like in the first postpone, and as far as I know, the data migration itself is one of the few projects that was on schedule by the previous plan.
As in the previous postpone, this probably means that we will have extra resources for most of the time, considering the amount of work of the data migration team in this stage.
But it is still not clear what management will do regarding all the changes that this postpone brings.
Wednesday, April 7, 2010
Day 371 - Tests Needed
The massive data migration performed over the weekend was positive.
We got some errors, some due to dependencies and but nothing critical nor that we were not expecting.
One positive result from this weekend action was a request from the management team. They loved the counts so much that they want us to identify some data tests that should be performed.
We've identified almost 100 simple data tests, most of them queries, that should be performed over the source data, the migrated data and the loaded data to check if there were no records lost nor information mismatch.
They should finalize the document with identical tests related with the business itself.
And it looks like the testing team will start gathering and testing again this week.
Still, we are worried with the tests and the testing plan because the tests are totally ad hoc and there is no testing plan whatsoever.
We got some errors, some due to dependencies and but nothing critical nor that we were not expecting.
One positive result from this weekend action was a request from the management team. They loved the counts so much that they want us to identify some data tests that should be performed.
We've identified almost 100 simple data tests, most of them queries, that should be performed over the source data, the migrated data and the loaded data to check if there were no records lost nor information mismatch.
They should finalize the document with identical tests related with the business itself.
And it looks like the testing team will start gathering and testing again this week.
Still, we are worried with the tests and the testing plan because the tests are totally ad hoc and there is no testing plan whatsoever.
Monday, March 29, 2010
Day 363 - First Production Problems
The testing team was right. They needed more time to test the system.
The first problem found regards product integration and interface testing.
It looks that the data migrated by us is using a key that differs from a satellite system. We checked the data migrated and it complies with the specification, the mapping rules defined and the transformation rules implemented. This was also confirmed various times over the last 4 months by the tests performed on the migrated data.
The other system is sending a different key from the one we're sending.
This is an obvious integration error which would easily be found and corrected if integration tests had been performed.
With less than 3 month from the real data migration, this is not a good sign. Specially because we've already alerted several times to the need of consistent and automatic tests, something that no one here seems to care for except the testing team which, by the way, does not have the power to enforce it.
The first problem found regards product integration and interface testing.
It looks that the data migrated by us is using a key that differs from a satellite system. We checked the data migrated and it complies with the specification, the mapping rules defined and the transformation rules implemented. This was also confirmed various times over the last 4 months by the tests performed on the migrated data.
The other system is sending a different key from the one we're sending.
This is an obvious integration error which would easily be found and corrected if integration tests had been performed.
With less than 3 month from the real data migration, this is not a good sign. Specially because we've already alerted several times to the need of consistent and automatic tests, something that no one here seems to care for except the testing team which, by the way, does not have the power to enforce it.
Day 360 - First Product Go Live
Things didn't went exactly as planned.
Just minutes before the loading of the migrated data someone just noticed that there was something wrong with the configuration and a critical action took place.
This meant a delay of over 4 hours regarding the original plan but eventually everything went well and the first product in GIS is ready for the go live next Monday.
Just minutes before the loading of the migrated data someone just noticed that there was something wrong with the configuration and a critical action took place.
This meant a delay of over 4 hours regarding the original plan but eventually everything went well and the first product in GIS is ready for the go live next Monday.
Thursday, March 25, 2010
Day 345 - New System Go Live
The new system will go live with a new product.
Since this is a new product, only a tiny part of the data included in the migration scope will be loaded at that time.
This will happen on 29th March.
There has been problems with the tests, which can resumed to the fact that the tests are not yelling the expected results and are seen as insufficient by many people involved in the process.
Management did tried to postpone the go live for two weeks. But the project sponsor has denied this request, so, the system will go live as planed.
Since this is a new product, only a tiny part of the data included in the migration scope will be loaded at that time.
This will happen on 29th March.
There has been problems with the tests, which can resumed to the fact that the tests are not yelling the expected results and are seen as insufficient by many people involved in the process.
Management did tried to postpone the go live for two weeks. But the project sponsor has denied this request, so, the system will go live as planed.
Wednesday, March 17, 2010
Day 335 - Reprioritization
The project sponsor is not very comfortable with the current dead line compliance, so management has reprioritized the migration of the business areas.
This happened because management believes that there is need for a greater focus on some business areas where it comes to the data migration.
In practice, this means that credit insurance has been defined as non-priority data to be migrated.
I'm actively involved in this business area and my opinion is that this is actually a mistake.
Credit insurance is equivalent with all other business areas when it comes to mapping and errors, and the other business areas are not "on hold" because of the migration.
The reason the project sponsor is afraid is because the mapping team cannot solve the problems faster or there are dependencies from other players and the testing team is no longer working daily.
There are several management mistakes here. Testing should be a priority now, but the tests have been reduced and the testing team, which was performing ad-hoc tests 4 hours per day, is only doing up to 16 hours of testing per week. Some tests should have been automated by now, but there is not even a plan to do so, so testing will continue to be ad-hoc.
Management should focus on forcing third party players to comply with their tasks and dates instead of reducing the scope of data being migrated. As an example, we are waiting for some data files for at least 6 weeks, but the third party responsible for it doesn't actually care. They've sent a couple of files last week, to shut us up, and they came all wrong...
This happened because management believes that there is need for a greater focus on some business areas where it comes to the data migration.
In practice, this means that credit insurance has been defined as non-priority data to be migrated.
I'm actively involved in this business area and my opinion is that this is actually a mistake.
Credit insurance is equivalent with all other business areas when it comes to mapping and errors, and the other business areas are not "on hold" because of the migration.
The reason the project sponsor is afraid is because the mapping team cannot solve the problems faster or there are dependencies from other players and the testing team is no longer working daily.
There are several management mistakes here. Testing should be a priority now, but the tests have been reduced and the testing team, which was performing ad-hoc tests 4 hours per day, is only doing up to 16 hours of testing per week. Some tests should have been automated by now, but there is not even a plan to do so, so testing will continue to be ad-hoc.
Management should focus on forcing third party players to comply with their tasks and dates instead of reducing the scope of data being migrated. As an example, we are waiting for some data files for at least 6 weeks, but the third party responsible for it doesn't actually care. They've sent a couple of files last week, to shut us up, and they came all wrong...
Friday, January 22, 2010
297 - Data Loading Files Performance Problem
The migration scope has increased.
New source data has come into play and thus new mapping specifications have been made.
This has resulted in a really big data loading performance problem.
Our data transformation procedure is still fast, but the creation of the GIS loader files from the transformed data is starting to give us some headaches.
The GIS data loader has a flat file structure that is very verbose, the loading of each single value of each record is done through a 300 character text file line.
This means that a single database table row is transformed into, something like, the same number of text lines as the number of columns the row has. Plus the file and records header and footer structure.
As an example, house insurances data transformation is performed in around 4 hours, sequential time, and it generates about 184 millions of records. This is all performed in the transformation server.
These records are then exported into the GIS data loader file format from the Windows directly into the AS/400. This procedure is now taking much time, over 6 hours, sequentially, in the best case scenario.
This is obviously too much time, so we are exploring several hypotheses, ranging from creating a parallel file write process; writing the files locally, with and without compression, and transfer them via FTP to AS/400; cluster indexes with full coverage maintained in a different disk; splitting the database schemes across several disks.
Some of these techniques can, and will, be combined.
We have to tune the process from our side since there is not much the GIS or the AS/400 can do when it comes to massive data load tuning.
We are facing a lot of hard work in these next days.
New source data has come into play and thus new mapping specifications have been made.
This has resulted in a really big data loading performance problem.
Our data transformation procedure is still fast, but the creation of the GIS loader files from the transformed data is starting to give us some headaches.
The GIS data loader has a flat file structure that is very verbose, the loading of each single value of each record is done through a 300 character text file line.
This means that a single database table row is transformed into, something like, the same number of text lines as the number of columns the row has. Plus the file and records header and footer structure.
As an example, house insurances data transformation is performed in around 4 hours, sequential time, and it generates about 184 millions of records. This is all performed in the transformation server.
These records are then exported into the GIS data loader file format from the Windows directly into the AS/400. This procedure is now taking much time, over 6 hours, sequentially, in the best case scenario.
This is obviously too much time, so we are exploring several hypotheses, ranging from creating a parallel file write process; writing the files locally, with and without compression, and transfer them via FTP to AS/400; cluster indexes with full coverage maintained in a different disk; splitting the database schemes across several disks.
Some of these techniques can, and will, be combined.
We have to tune the process from our side since there is not much the GIS or the AS/400 can do when it comes to massive data load tuning.
We are facing a lot of hard work in these next days.
Labels:
as/400,
data,
data migration,
etl,
gis,
intel,
map,
mapping,
migration,
performance,
problems,
system i,
transformation rules,
windows
Wednesday, January 13, 2010
Day 287 - Project Due Date Postponed
The project due date has been postponed almost two months. The D-Day date is now June 1st.
This happened because sub-projects related with the target system, such as testing, were delaying.
This is actually the first time I'm on a data migration project were, once set, the official D-Day had been postponed.
As far as I know, the data migration itself was one of the few projects that was on schedule.
The data migration will have to comply with this date and this probably mean that for one month we will have one extra resource. This is still unclear and thus the resize of the team is still under analysis.
This happened because sub-projects related with the target system, such as testing, were delaying.
This is actually the first time I'm on a data migration project were, once set, the official D-Day had been postponed.
As far as I know, the data migration itself was one of the few projects that was on schedule.
The data migration will have to comply with this date and this probably mean that for one month we will have one extra resource. This is still unclear and thus the resize of the team is still under analysis.
Wednesday, January 6, 2010
Day 279 - Happy New Year and Happy News
Clients are now being loaded with a marginal number of rejections.
The GIS configuration issues still have an impact but that problem is being solved quickly.
I'm still around the report load query problem, but it is not easy to test the performance on a slow system, since it takes too much time to process a query and return its result.
The GIS configuration issues still have an impact but that problem is being solved quickly.
I'm still around the report load query problem, but it is not easy to test the performance on a slow system, since it takes too much time to process a query and return its result.
Subscribe to:
Posts (Atom)