The migration scope has increased.
New source data has come into play and thus new mapping specifications have been made.
This has resulted in a really big data loading performance problem.
Our data transformation procedure is still fast, but the creation of the GIS loader files from the transformed data is starting to give us some headaches.
The GIS data loader has a flat file structure that is very verbose, the loading of each single value of each record is done through a 300 character text file line.
This means that a single database table row is transformed into, something like, the same number of text lines as the number of columns the row has. Plus the file and records header and footer structure.
As an example, house insurances data transformation is performed in around 4 hours, sequential time, and it generates about 184 millions of records. This is all performed in the transformation server.
These records are then exported into the GIS data loader file format from the Windows directly into the AS/400. This procedure is now taking much time, over 6 hours, sequentially, in the best case scenario.
This is obviously too much time, so we are exploring several hypotheses, ranging from creating a parallel file write process; writing the files locally, with and without compression, and transfer them via FTP to AS/400; cluster indexes with full coverage maintained in a different disk; splitting the database schemes across several disks.
Some of these techniques can, and will, be combined.
We have to tune the process from our side since there is not much the GIS or the AS/400 can do when it comes to massive data load tuning.
We are facing a lot of hard work in these next days.
Showing posts with label windows. Show all posts
Showing posts with label windows. Show all posts
Friday, January 22, 2010
Monday, June 22, 2009
Day 83 - Data Loaded into DB2 on Windows
We have finally received the database data from OS/390 in a format that we were able to load it without trouble in DB2 on Windows.
Some of the weird problems we got on AS/400 will no longer bother us but now, we have to rewrite some scripts because DB2 in Windows is different from DB2 on AS/400.
One of the problems was that I an insert into script that loads the postal code with almost 300000 records. I had to loaded it through the DB2 command line tool. Here's how one can do it:
The entire process took about 6 hours, this includes taking the data from OS/390, moving it to Windows, load it into DB2 in Windows and preparing the staging area with look up and convert tables. Way to much time. This process will definitely be optimized.
We've also found out one of the most idiotic functionality in software I've ever seen.
After installing the database client on the local machines, and after configuring the server access, the Command Editor only allows one to open files from the database server computer. Yes, I mean it is impossible to open any local file using the Command Editor. And no, that is not configurable!
Some of the weird problems we got on AS/400 will no longer bother us but now, we have to rewrite some scripts because DB2 in Windows is different from DB2 on AS/400.
One of the problems was that I an insert into script that loads the postal code with almost 300000 records. I had to loaded it through the DB2 command line tool. Here's how one can do it:
- Open the DB2 Command Line Processor tool. Don' t just open a regular shell since it will not work, some environment settings are required.
- In the shell, execute the following commands:
db2 connect to {database} user {username} using {password}Don't forget to have a tail and grep installed, if you wish to track the evolution of the script execution and check for operation success.
db2 -vf {path/script.sql} -t > output.txt
The entire process took about 6 hours, this includes taking the data from OS/390, moving it to Windows, load it into DB2 in Windows and preparing the staging area with look up and convert tables. Way to much time. This process will definitely be optimized.
We've also found out one of the most idiotic functionality in software I've ever seen.
After installing the database client on the local machines, and after configuring the server access, the Command Editor only allows one to open files from the database server computer. Yes, I mean it is impossible to open any local file using the Command Editor. And no, that is not configurable!
Labels:
as/400,
command editor,
command line,
db2,
os/390,
windows
Monday, June 15, 2009
Day 76 - Back from Vacation
I've just came back from a one week vacation.
Unfortunately there's no real news.
It was still not possible to load data into the new DB2 on Windows. Since some field values hold the " character, the DB2 LOAD messes up the data when loading. A new field qualifier will be tried soon.
There was also a kick-off meeting for a new mapping for insurance claims programmed, but unfortunately it did not happened.
I still hope to have new mappings during this week, though.
Unfortunately there's no real news.
It was still not possible to load data into the new DB2 on Windows. Since some field values hold the " character, the DB2 LOAD messes up the data when loading. A new field qualifier will be tried soon.
There was also a kick-off meeting for a new mapping for insurance claims programmed, but unfortunately it did not happened.
I still hope to have new mappings during this week, though.
Friday, May 29, 2009
Day 59 - Perfomance Tests on Intel
We've finished the performance tests on the Intel server.
It's an 8 CPU machine with 8 GB of RAM running Windows Server 2003 64 bits. I was hopping to get it running Linux, but that seem not to be possible.
If you recall, our transformation engine it CPU bound and each transformation engine runs on a single CPU, therefor a direct comparison is easy to perform.
The server is powerful enough to hold the transformation engine and the database itself. Therefor, we made several tests combining the servers.
One test case used the AS/400 as the database server and the Intel machine as the transformation engine. On another test case, both the database and the transformation engine were on the Intel machine.
For easiness, the database on the Intel machine is a, real, DB2.
Amazingly, or not, the Intel CPU was a lot faster than the AS/400 CPU. One single Intel CPU runs our allows us to migrate faster than any combination we've tried using the AS/400.
Here's a full graphic showing the results on both systems. It's easy to see that the Intel machine scales very well when we use several processes in parallel.
It's an 8 CPU machine with 8 GB of RAM running Windows Server 2003 64 bits. I was hopping to get it running Linux, but that seem not to be possible.
If you recall, our transformation engine it CPU bound and each transformation engine runs on a single CPU, therefor a direct comparison is easy to perform.
The server is powerful enough to hold the transformation engine and the database itself. Therefor, we made several tests combining the servers.
One test case used the AS/400 as the database server and the Intel machine as the transformation engine. On another test case, both the database and the transformation engine were on the Intel machine.
For easiness, the database on the Intel machine is a, real, DB2.
Amazingly, or not, the Intel CPU was a lot faster than the AS/400 CPU. One single Intel CPU runs our allows us to migrate faster than any combination we've tried using the AS/400.
Here's a full graphic showing the results on both systems. It's easy to see that the Intel machine scales very well when we use several processes in parallel.

Subscribe to:
Posts (Atom)