AP TAGGING SERVICE
The AP Tagging Service analyzes English-language news content and automatically returns relevant metadata, using standardized terminology from the AP News Taxonomy. The process identifies people, companies, geographic locations, organizations, and a wide array of subjects. Although the system will recognize and return specific entities that it finds in the submitted content (aka “text extraction”), it also goes beyond that, using human–created semantic rules to identify topics that may not be explicitly mentioned in the text at all. For example, a story about a particular country music star can trigger the “Country music” subject, even if the word “music” does not appear in the story. Human–managed rules allow for more precise control over the performance of the service.
What types of metadata does the service provide?
Drawing from the AP News Taxonomy, the tagging service looks at each piece of submitted content and returns standardized names and IDs for all types of metadata. The following types of data are available, and the user can specify which types should be returned by the service.
The service identifies the names of newsmakers in the submitted text. Robust support for synonyms and name variants allows identification of difficult names (such as Qaddafi/Gadhafi/Khadafi), regardless of how they are spelled—but still returns a standardized name and ID. A disambiguation process uses surrounding context to tell the difference between people who share the same name, so you’ll know whether “Kenny Rogers” refers to the musician or the athlete.
Any publicly-traded company mentioned in the submitted content will be returned. As with person names, the use of synonyms and disambiguation processes will help to ensure that companies are identified appropriately, regardless of spelling anomalies or ambiguous names.
Regions, countries, major world cities, and a wide variety of North American places are identified, based on the subject matter of the content. Passing mentions of a place name will not trigger the metadata rule, so you can be sure that the location is truly relevant.
A wide variety of institutions and groups, from sports teams to government organizations, are identified based on the subject matter of the submitted content. Again, the system will ignore passing mentions so that the presence of an organization tag denotes relevant subject matter.
Relevant topics, both broad and narrow, based on the primary or secondary subject matter of the submission.
After all the matching metadata values have been identified, the service checks for additional standardized names and IDs based on relationships stored in the AP News Taxonomy. For instance, the subject hierarchy will ensure that any item tagged with “Food safety” will also be tagged with “Health”, and any content that picks up a sports league will be tagged with the relevant sport subject.
Finally, the metadata output will be enhanced with additional data properties. Companies will be given a ticker; athletes will be associated to their teams; geographic locations will get latitude and longitude data; and so forth. Users can also access the AP News Taxonomy for additional information about any given tag.
How does the service work?
The tagging service is accessed by making calls to an API (Application Programming Interface). Subscribers may submit content in plain text or XML, and can specify which types of data should be returned.
Tagging service data can be returned in a variety of Semantic-Web compatible formats, including RDF (XML, JSON, or TTL) and NewsML–G2. It can also be returned in Simple XML format. A comprehensive Developer’s Guide provides all the necessary details.
Use the links below to see a sample of AP Tagging Service data in each of the available formats.
AP Metadata Services online demo – Try it now!
Try the AP Metadata Services online demo.
© 2012 The Associated Press. All rights reserved. Terms and conditions apply. See AP.org for details.