PMC provides cloud service access to the following subsets of the PMC Article Datasets:
- PMC Open Access Subset - Those articles in the PMC Open Access Subset archived in PMC with a machine-readable Creative Commons license; and
- Author Manuscript Dataset - The complete Author Manuscript Dataset, which includes those articles collected under a funder policy in PMC and made available in machine-readable formats for text mining.
As the custodian of these datasets, PMC works to ensure the contents are available in formats and through channels that enable new discovery, in a manner consistent with copyright law.
File Formats, Metadata, Media Files & Supplementary Materials
File Formats and Metadata
The files that PMC distributes via our cloud service include individual articles in NISO Z39.96-2015 JATS XML format as well as in plain text as extracted from the XML. Also included are file lists that contain metadata for articles in each dataset.
Media Files and Supplementary Materials Availability
- Media files and supplementary materials associated with Open Access Subset articles can be retrieved in individual article packages using the PMC FTP Service.
- Media files and supplementary materials for author manuscripts are NOT made available as part of the PMC Article Datasets.
Cloud Service Providers
All data made available on cloud services are managed by the National Library of Medicine (NLM). Currently cloud service is only available through the Amazon Web Service (AWS).
Articles in these datasets are made available consistent with either the terms of applicable article-level license statements or the funder’s policy. See PMC Copyright for more information.
How to Cite
See the individual dataset pages on how to cite the PMC Open Access Subset and PMC Author Manuscript Dataset.