Data Migration Diary: Day 13 - Data Profiler Enhanced

Monday, April 13, 2009

Day 13 - Data Profiler Enhanced

Today the data profiler has received a ranking function, that ranks the profiled data and cuts off data that is considered as discarded.

When data is profiled, a tupple with the term and the context is saved.
The context helps in the evaluation of the term, specially to know if it's a false positive.
The context is also used to cut off data.
It is possible to maintain only terms that have a ranking index, calculated from the term context, equal or higher than a value. Or, it is possible to maintain only terms that differ from the immediately higher rank higher than a giver percentage, i. e., if the current term differs mores than a given percentage from the last term, the current term and all with lower ranks are discarded.

With this enhancements, the data profiler is finished.
Unless there's some special need to profile other data, it is done for now.

Data Migration Diary

Monday, April 13, 2009

Day 13 - Data Profiler Enhanced

No comments:

Post a Comment

Search This Blog

Blog Archive

Followers

About Me

Data Migration Diary

Monday, April 13, 2009

Day 13 - Data Profiler Enhanced

No comments:

Post a Comment

Search This Blog

Blog Archive

Follow by RSS

Followers

About Me