WikiWho APIs

The APIs provide provenance and change information about the tokens a Wikipedia article consists of, for several languages. Apart from the source language edition they draw from, their specifications and usage are identical, as described below.

WikiWho API EN
WikiWho API DE
WikiWho API EU
WikiWho API TR
WikiWho API ES

Quick Start

The quickest way to get started is following our WikiWho Tutorial. Or, you can also check the WikiWho Demo for some applied ideas.

Both of them use the WikiWho Wrapper, a Python 3 package that can be installed through pypi:
pip install wikiwho_wrapper
The WikiWho wrapper provide easy access to the API:
from wikiwho_wrapper import WikiWho
ww = WikiWho(lng='de') # or WikiWho(USERNAME, PASSWORD, lng='de')

#You can either use api to directly access the JSON (raw format from api.wikiwho.net)
response = ww.api.all_content("Bioglass")

# Or you can use the dataview to obtain a pandas DataFrame with the data
dataView = ww.dv.all_content("Bioglass")

Usage

For each article page, the API mirrors its current state on the Wikipedia. The API is based on the WikiWho algorithm (~95% acc.).

Currently, there is a limit of 2000 requests/day for unregistered users, and also a 60 requests/minute limit for all users.

Terminology used:
  • "Wikipedia": A selected language version of Wikipedia. Available languages can be chosen from the top navigation bar.
  • "article (page)": Any Wikipedia page in namespace = 0.
  • "(article) content": The tokenized Wiki Markup text content of a (range of) revision(s) of an article page, not the front-end HTML (if you want that, you have to "untokenize" and appropriately parse it; the original order of tokens is retained – or take a look at the WhoColor API).
  • "token"/"tokenized": The Wiki Markup text is split at (i) white spaces and (ii) certain special characters (special chars also act as tokens). E.g., tokens in "A [[house]], a boat." are "a", "[[", "house", "]]", ",", "a", "boat", "." I.e., all tokens are converted into lower-case and certain character combinations that have a specific function in Wiki Markup, such as double-square brackets, get treated as single tokens. >> Current WikiWho tokenization
  • "revisions": The article revisions and their IDs as retrieved from Wikipedia, with one exception: The WikiWho algorithm implements a (very lenient) filter to avoid spending time DIFFing blatant vandalism which gets immediately reverted after. About 0.5% of the revisions from Wikipedia are hence not available here as we consider those changes to have disappeared immediately. This is a temporary constraint to be removed in an upcoming version.
>> Toy example for how the token metadata is generated

See the description of the different query types for more information.

A dataset with this data (until Nov. 2016, no redirects) is available for download at https://doi.org/10.5281/zenodo.345571.

Please cite it as well if you use data from this API in your research (note that the dataset excludes redirect articles and tokenization can slightly differ from the API version, as we continuously improve it).

An example call: Cologne

WhoColor APIs

This collection of APIs can be thought of as an additional service on top of the core WikiWho data described above, available for the same languages. The same term descriptions as above apply.

The goal is to deliver annotated HTML of a Wiki article that can be read by a browser (instead of annotated, tokenized Wikitext as delivered by the primary API).

Annotations available per token (realized via <span>) are currently:

  • original revision and author
  • changes applied
  • conflict score

plus certain metadata (e.g., ‘present’ authors and their percentages of words originally written in the current revision, revision list with metadata).

The main use case so far (hence the name) is colored annotation of text parts with this meta information, for example in a Grease-/Tampermonkey userscript we developed, which runs in a browser extension and can be used on any Wikipedia article to find out how wrote and changed which words via simple visual inspection. Find our more here.

Note that this API project is still in alpha, several elements of Wiki pages cannot yet be annotated, such as Tables, Infoboxes (or any templates), certain references. To find out more about the backend that delivers this data and to file issues or even contribute (always welcome!) check our Github Repository.


WhoColor API EN
WhoColor API DE
WhoColor API EU
WhoColor API TR
WhoColor API ES