Finally management has shaken things up and is enforcing the teams to close issues, specially issues that are open for too long.
This is part of a bigger strategy, finalizing the mapping and loading of some business areas. There have been three areas selected, clients, simulations and credit insurance, I'm involved in two of them and I'm happy to see that some of these things are now moving into cruse speed.
Wednesday, January 27, 2010
Friday, January 22, 2010
297 - Data Loading Files Performance Problem
The migration scope has increased.
New source data has come into play and thus new mapping specifications have been made.
This has resulted in a really big data loading performance problem.
Our data transformation procedure is still fast, but the creation of the GIS loader files from the transformed data is starting to give us some headaches.
The GIS data loader has a flat file structure that is very verbose, the loading of each single value of each record is done through a 300 character text file line.
This means that a single database table row is transformed into, something like, the same number of text lines as the number of columns the row has. Plus the file and records header and footer structure.
As an example, house insurances data transformation is performed in around 4 hours, sequential time, and it generates about 184 millions of records. This is all performed in the transformation server.
These records are then exported into the GIS data loader file format from the Windows directly into the AS/400. This procedure is now taking much time, over 6 hours, sequentially, in the best case scenario.
This is obviously too much time, so we are exploring several hypotheses, ranging from creating a parallel file write process; writing the files locally, with and without compression, and transfer them via FTP to AS/400; cluster indexes with full coverage maintained in a different disk; splitting the database schemes across several disks.
Some of these techniques can, and will, be combined.
We have to tune the process from our side since there is not much the GIS or the AS/400 can do when it comes to massive data load tuning.
We are facing a lot of hard work in these next days.
New source data has come into play and thus new mapping specifications have been made.
This has resulted in a really big data loading performance problem.
Our data transformation procedure is still fast, but the creation of the GIS loader files from the transformed data is starting to give us some headaches.
The GIS data loader has a flat file structure that is very verbose, the loading of each single value of each record is done through a 300 character text file line.
This means that a single database table row is transformed into, something like, the same number of text lines as the number of columns the row has. Plus the file and records header and footer structure.
As an example, house insurances data transformation is performed in around 4 hours, sequential time, and it generates about 184 millions of records. This is all performed in the transformation server.
These records are then exported into the GIS data loader file format from the Windows directly into the AS/400. This procedure is now taking much time, over 6 hours, sequentially, in the best case scenario.
This is obviously too much time, so we are exploring several hypotheses, ranging from creating a parallel file write process; writing the files locally, with and without compression, and transfer them via FTP to AS/400; cluster indexes with full coverage maintained in a different disk; splitting the database schemes across several disks.
Some of these techniques can, and will, be combined.
We have to tune the process from our side since there is not much the GIS or the AS/400 can do when it comes to massive data load tuning.
We are facing a lot of hard work in these next days.
Labels:
as/400,
data,
data migration,
etl,
gis,
intel,
map,
mapping,
migration,
performance,
problems,
system i,
transformation rules,
windows
Day 293 - Loading Status Queries
It seems the loading procedure feedback queries cannot be optimized any further by us.
We don't even have full comprehension about the data model, since there is no documentation and we had to learn by direct observation and trail and error.
We request help to the GIS team so that they help us in validating the queries and tuning the process.
We don't even have full comprehension about the data model, since there is no documentation and we had to learn by direct observation and trail and error.
We request help to the GIS team so that they help us in validating the queries and tuning the process.
Wednesday, January 13, 2010
Day 287 - Project Due Date Postponed
The project due date has been postponed almost two months. The D-Day date is now June 1st.
This happened because sub-projects related with the target system, such as testing, were delaying.
This is actually the first time I'm on a data migration project were, once set, the official D-Day had been postponed.
As far as I know, the data migration itself was one of the few projects that was on schedule.
The data migration will have to comply with this date and this probably mean that for one month we will have one extra resource. This is still unclear and thus the resize of the team is still under analysis.
This happened because sub-projects related with the target system, such as testing, were delaying.
This is actually the first time I'm on a data migration project were, once set, the official D-Day had been postponed.
As far as I know, the data migration itself was one of the few projects that was on schedule.
The data migration will have to comply with this date and this probably mean that for one month we will have one extra resource. This is still unclear and thus the resize of the team is still under analysis.
Wednesday, January 6, 2010
Day 279 - Happy New Year and Happy News
Clients are now being loaded with a marginal number of rejections.
The GIS configuration issues still have an impact but that problem is being solved quickly.
I'm still around the report load query problem, but it is not easy to test the performance on a slow system, since it takes too much time to process a query and return its result.
The GIS configuration issues still have an impact but that problem is being solved quickly.
I'm still around the report load query problem, but it is not easy to test the performance on a slow system, since it takes too much time to process a query and return its result.
Day 273 - Configuration Issues
The GIS configuration is changing and that has a direct impact in the mapping rules.
We are now facing an increase number of data being rejected by GIS because its configuration has been changed but the mapping hasn't.
We are now facing an increase number of data being rejected by GIS because its configuration has been changed but the mapping hasn't.
Day 266 - Christmas Break
The project will be stopped for a short Christmas break.
The overall number of data being rejected by the GIS loader is decreasing every day.
But we are now facing a performance problem again. It takes too much time to query GIS in order to know how many errors and what time of errors occurred in the loading procedure. I'll have to take a look at these query performance so that the loading report does not take half-day for a simple auto claim data load.
The overall number of data being rejected by the GIS loader is decreasing every day.
But we are now facing a performance problem again. It takes too much time to query GIS in order to know how many errors and what time of errors occurred in the loading procedure. I'll have to take a look at these query performance so that the loading report does not take half-day for a simple auto claim data load.
Day 264 - Error Free Loading
Finally we are start to get error free data loading.
Some types of clients are already loading into GIS and this is reducing the number of claims rejected by the GIS loader.
I'm expecting to stop worrying with client problems soon.
Some types of clients are already loading into GIS and this is reducing the number of claims rejected by the GIS loader.
I'm expecting to stop worrying with client problems soon.
Day 260 - Problem Prioritization
The problem solving problem is finally being solved.
Management has prioritized the problems to be fixed and is now working faster in the problem assignment.
Management has prioritized the problems to be fixed and is now working faster in the problem assignment.
Day 253 - Problem Solving is a Problem
Some problems do take too much time to be solved, they get stacked in the sponsor queue. Sometimes management does not assigned the problems by the team members fast enough, sometimes the team depends on other people and sometimes the team simply does not have a quick answer.
This is a real problem since we are now facing a dependency problem.
There are unsolved problems in the client area which is keeping clients to be integrated in GIS. And since there are clients missing, many claims are being rejected.
And since there are claims rejected, account is also being rejected.
This requires a fast answer from the management in order to be solved.
This is a real problem since we are now facing a dependency problem.
There are unsolved problems in the client area which is keeping clients to be integrated in GIS. And since there are clients missing, many claims are being rejected.
And since there are claims rejected, account is also being rejected.
This requires a fast answer from the management in order to be solved.
Day 250 - Team Rearrangements
There is a team management small problem.
The teams seem to be rearranged from time to time and this instability is not good for the sake of the project.
For instance, one person that was responsible for the client data migration, has been reassigned several times to other areas, like account and claims, and is now in the testing team. Two other people started to be responsible for the client data migration, one for each system, but it seems that the new people depend on the initial person.
There is another person that is responsible for the data extraction for one of the source systems but it seems now that it is also part of the mapping team.
This happens less on other areas, but it seems that there is an unofficially set of people that is assigned to all areas.
There is a clear group help spirit, which is great, because this eliminates political problems as "this is my migration area, so keep out".
The teams seem to be rearranged from time to time and this instability is not good for the sake of the project.
For instance, one person that was responsible for the client data migration, has been reassigned several times to other areas, like account and claims, and is now in the testing team. Two other people started to be responsible for the client data migration, one for each system, but it seems that the new people depend on the initial person.
There is another person that is responsible for the data extraction for one of the source systems but it seems now that it is also part of the mapping team.
This happens less on other areas, but it seems that there is an unofficially set of people that is assigned to all areas.
There is a clear group help spirit, which is great, because this eliminates political problems as "this is my migration area, so keep out".
Day 240 - Cleanup Procedure Status
The AS/400 clean up procedure and tuning procedures has resulted in a better performance.
The performance increase was not substantial, but since the domain includes millions of records, any minor improvement results in an overall improvement.
The performance increase was not substantial, but since the domain includes millions of records, any minor improvement results in an overall improvement.
Day 245 - Testing team
Management has finally assigned a test team to working half-day every day.
This is taking the tests to cruse speed.
Every day there are problems detected, rules changed and minor changes that have a great impact on the migration. The number of rejected records by the GIS data loader is starting to decrease.
This is taking the tests to cruse speed.
Every day there are problems detected, rules changed and minor changes that have a great impact on the migration. The number of rejected records by the GIS data loader is starting to decrease.
Subscribe to:
Posts (Atom)