Dimensions includes data from a large number of sources. The data is converted to a common data model, cleaned, and then enriched so it is ready for use. The enrichment steps include disambiguation of people (“Researchers”) and Organizations, and categorizing the data into topics (“Categories”) through machine-learning based algorithmic classification. It also includes reference extraction and cross-linking between documents where relevant.