We have solved most of the issues reported, mostly were wrong mapping specifications, but some implementations bugs were also found.
Almost all issues are set for the functional team to solve and relate with rule specification.
Things are at a stable speed, but a bit slow from my point of view.
Showing posts with label map. Show all posts
Showing posts with label map. Show all posts
Monday, February 15, 2010
Friday, January 22, 2010
297 - Data Loading Files Performance Problem
The migration scope has increased.
New source data has come into play and thus new mapping specifications have been made.
This has resulted in a really big data loading performance problem.
Our data transformation procedure is still fast, but the creation of the GIS loader files from the transformed data is starting to give us some headaches.
The GIS data loader has a flat file structure that is very verbose, the loading of each single value of each record is done through a 300 character text file line.
This means that a single database table row is transformed into, something like, the same number of text lines as the number of columns the row has. Plus the file and records header and footer structure.
As an example, house insurances data transformation is performed in around 4 hours, sequential time, and it generates about 184 millions of records. This is all performed in the transformation server.
These records are then exported into the GIS data loader file format from the Windows directly into the AS/400. This procedure is now taking much time, over 6 hours, sequentially, in the best case scenario.
This is obviously too much time, so we are exploring several hypotheses, ranging from creating a parallel file write process; writing the files locally, with and without compression, and transfer them via FTP to AS/400; cluster indexes with full coverage maintained in a different disk; splitting the database schemes across several disks.
Some of these techniques can, and will, be combined.
We have to tune the process from our side since there is not much the GIS or the AS/400 can do when it comes to massive data load tuning.
We are facing a lot of hard work in these next days.
New source data has come into play and thus new mapping specifications have been made.
This has resulted in a really big data loading performance problem.
Our data transformation procedure is still fast, but the creation of the GIS loader files from the transformed data is starting to give us some headaches.
The GIS data loader has a flat file structure that is very verbose, the loading of each single value of each record is done through a 300 character text file line.
This means that a single database table row is transformed into, something like, the same number of text lines as the number of columns the row has. Plus the file and records header and footer structure.
As an example, house insurances data transformation is performed in around 4 hours, sequential time, and it generates about 184 millions of records. This is all performed in the transformation server.
These records are then exported into the GIS data loader file format from the Windows directly into the AS/400. This procedure is now taking much time, over 6 hours, sequentially, in the best case scenario.
This is obviously too much time, so we are exploring several hypotheses, ranging from creating a parallel file write process; writing the files locally, with and without compression, and transfer them via FTP to AS/400; cluster indexes with full coverage maintained in a different disk; splitting the database schemes across several disks.
Some of these techniques can, and will, be combined.
We have to tune the process from our side since there is not much the GIS or the AS/400 can do when it comes to massive data load tuning.
We are facing a lot of hard work in these next days.
Labels:
as/400,
data,
data migration,
etl,
gis,
intel,
map,
mapping,
migration,
performance,
problems,
system i,
transformation rules,
windows
Friday, June 19, 2009
Day 79 - Insurance Claims Mapping
Finally there was a new mapping for insurance claims programmed.
The meeting went great and we're already producing mappings. This work will continue for about one more week, maybe 3 or 4 more mapping meetings and we're done with it.
The meeting went great and we're already producing mappings. This work will continue for about one more week, maybe 3 or 4 more mapping meetings and we're done with it.
Monday, May 25, 2009
Day 50 - New Mappings Posponed
Since the first stage has over, we're now expecting for the meetings to map the new entities.
Unfortunately, the target team still does not have the product fully configured, it looks like the requirements are not fully closed yet.
Therefor, the new mappings have been postponed.
./M6
Unfortunately, the target team still does not have the product fully configured, it looks like the requirements are not fully closed yet.
Therefor, the new mappings have been postponed.
./M6
Monday, April 6, 2009
Day 6 - Setup for First Mapping
The plan for phase has rolled out and we've prepared for the first mapping meeting tomorrow.
We've also set up the development environment on the computer that has arrived.
My computer has not arrived yet, and that will become a problem if it does not arrive in the next two days.
We have also requested for a new data loading into DB2, but this time with the structure will be preserved.
We've also set up the development environment on the computer that has arrived.
My computer has not arrived yet, and that will become a problem if it does not arrive in the next two days.
We have also requested for a new data loading into DB2, but this time with the structure will be preserved.
Subscribe to:
Posts (Atom)