Information extraction from calls for papers with conditional random fields and layout features |
| |
Authors: | Karl-Michael Schneider |
| |
Affiliation: | (1) Textkernel B.V., Amsterdam, The Netherlands |
| |
Abstract: | For members of the research community it is vital to stay informed about conferences, workshops, and other research meetings
relevant to their field. These events are typically announced in calls for papers (CFPs) that are distributed via mailing
lists. We employ Conditional Random Fields for the task of extracting key information such as conference names, titles, dates,
locations and submission deadlines from CFPs. Extracting this information from CFPs automatically has applications in building
automated conference calendars and search engines for CFPs. We combine a variety of features, including generic token classes,
domain-specific dictionaries and layout features. Layout features prove particularly useful in the absence of grammatical
structure, improving average F1 by 30% in our experiments. |
| |
Keywords: | Information extraction Layout features Conditional random fields |
本文献已被 SpringerLink 等数据库收录! |
|