Thursday, April 9, 2009

Day 9 - Data Profiling

I'm continue to develop the data cleaning function for the honorific titles.
Currently I'm developing a data profiler, a stand alone Java application that queries DB2 via JDBC JTOpen, that has already found some anomalies in the person contact name. There's the expected typos, some ad-hoc markers, and finally, the usual user creativity.

These user creativity anomalies are the funniest of them all. In the person contact name there's random chars, like "USO USSSSS", training data, like "TEST" and totally nonsense person names, like "EXITING NOW", "ALSO EXITING" and "NEW".

No comments:

Post a Comment