Goal:
Use class-conditioned character Language Models to classify given list of entities into one of the four target classes: Person, Location, Organization, or Movie. Use Bayes to make final decision based on language model predictions.
Entity types:
qPerson (PER): Name, mostly U.S./English Source: Hoovers' company database, 1999
Location (LOC): City, predominantly U.S. Source: Hoovers' company database, 1999
Company (ORG): Organization, mostly U.S. Source: Hoovers' company database, 1999
Movie (MOV): Big-screen/significant movie. Source: Wikipedia
Method
Evaluation
Deliverables
Data
The set of entities (99550 total) for
training and test are available on MathCS: /home/cs571000/Project1.1/entities.txt
Note that the dataset is not balanced (there are more Person and
Location entities than ORG and MOV).