Today the data profiler has received a ranking function, that ranks the profiled data and cuts off data that is considered as discarded.
When data is profiled, a tupple with the term and the context is saved.
The context helps in the evaluation of the term, specially to know if it's a false positive.
The context is also used to cut off data.
It is possible to maintain only terms that have a ranking index, calculated from the term context, equal or higher than a value. Or, it is possible to maintain only terms that differ from the immediately higher rank higher than a giver percentage, i. e., if the current term differs mores than a given percentage from the last term, the current term and all with lower ranks are discarded.
With this enhancements, the data profiler is finished.
Unless there's some special need to profile other data, it is done for now.
Monday, April 13, 2009
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment