It was actually quite simple, once the mapping rules were correctly specified according with the data contents. The main problem is that the source team had, as usual, some difficult to admit that their data is dirty and its quality is actually a lot lower than they expected. In this particular case, it was the zip codes.
Once they admitted that the zip codes needed specific rules because they were dirty, it became an easy migration task.
In the next few days I expect to migrate other entity sets without errors.
The migration performance as an overall is quite slow, even using 5 parallel executions. We got some times and, as already known, the GIS loading procedure is responsible for the biggest migration times.
Usually we perform the ETL cycle within Data Fusion, which means we deliver the data directly to the target database, i. e. usually we would only have the Data Fusion transformation procedure time, but in this particular scenario we have to create a text file which GIS will use to load the data.
Here's an example of the performance times we're having now:
Car Insurance Claims (470.563 records):
- Data Fusion transformation procedure: 51m
- File creation for GIS loading: 2h 10m
- GIS file loading: 238h sequential time (time calculated based on 16h of loading)
Personal Accident Insurance Claims (29.303 records):
- Data Fusion transformation procedure: 1m 51s
- File creation for GIS loading: 1m 40s
- GIS file loading: 2h 17m with 3 parallel processes (5h 9m sequential time)
Entities (682.569 records):
- Data Fusion transformation procedure: 6m 3s
- File creation for GIS loading: 23m
- GIS file loading: 5 h 55m with 5 parallel processes (27h 17m sequential time)
Tonight GIS will be cleaned up and the AS/400 will be tuned for better performance for file reading. In the next few days I'll get some new times, hopefully better.
No comments:
Post a Comment