Thursday, April 9, 2009

Day 8 - Honorific Title Cleaning

I've started to work on a data cleaning function to detect honorific titles in a free text field.
The text field holds the contact name of a person that might be preceded, or followed by, the honorific title.

I'm working on several distinct approaches. The first one is the classical one, where I'm using a dictionary. The second one is automatic where the algorithm will try to infer if an honorific title exists or not. And the last one is an extension of the automatic, using an exclusion dictionary.

In the first tests I've performed I've found a lot of false positives, therefor I'll use a first name dictionary to exclude those.

1 comment:

  1. Thanks for Sharing
    We are also provide Tile Cleaning Sacramento service
    To know more Visit Us http://kicamerica.com

    ReplyDelete