The process of authoring, reviewing, and publishing scholarly articles remains
an expensive, time-consuming process that can require significant up-front
investment and technical expertise. Coupled with lengthy review processes this
can create delays of up to a year before new scientific findings are
published. Annotum, a new, open-source, open-access authoring publishing tool
based on the WordPress content management system, builds on the earlier work of
the Public Library of Science's Currents publication and provides an easy-to use
alternative to existing publishing systems that supports very rapid expert
review and professional online publishing.
Introduction
Despite significant advances in most forms of publishing,
from blogs to news sites and other user-generated web content, the process of
authoring scholarly articles remains an expensive, time-consuming process that
can require significant up-front investment and technical expertise. While a
number of electronic publishing and workflow management systems exist, those
intended for the scientific publishing community provide at best only
rudimentary authoring tools—and in many cases simply provide a repository for
document files created in other formats. It is as if the entire revolution in
online, web-based content authoring tools has passed by the scientific publishing community. And
despite the development of advanced document formats such as the National
Library of Medicine's (NLM) Journal Article Tag Suite (JATS), virtually no
current system allows scientific authors to easily create structured XML
documents using simple, web-based tools.
Project Background
The inspiration for Annotum comes from the extensive work of
the Public Library of Science (PLoS), who, in
conjunction with Google, launched PLoS: Currents Influenza in 2009. In the
words of Harold Varmis:
The key goal of PLoS: Currents is to accelerate scientific
discovery by allowing researchers to share their latest findings and ideas
immediately with the world's scientific and medical communities.
PLoS: Currents incorporated some key publication elements
that are central to achieving its goal of rapidly disseminating scientific
knowledge. Articles must be date-stamped and citable, reviewed
by expert researchers, released as open-access,
and archived in a public repository such as PubMed Central (PMC). Again, Varmis:
To enable contributions to PLoS: Currents:
Influenza to be shared as rapidly as possible, they will not be subject to
in-depth peer review; however, unsuitable submissions will be screened out by a
board of expert moderators.
Thus, a key driver for creating PLoS: Currents was to speed
up the process of disseminating new science. Consider the following schematic:
By reducing the time for review from up to one year to as
little as one day, Currents articles are able to make new findings available
very quickly.
Google Knol, an article authoring, collaboration and hosting
service, was selected as the platform for PLoS: Currents. Knol provides a
web-based authoring environment with a rather rich toolset including text
formatting, tables, figures, and mathematical equations, and an extensive set
of collaboration tools. Knols can be edited by multiple authors, incorporate
both author and reader comments, and include a rating system, all important
features for authoring scholarly articles. Moreover, Knol's moderated collection
feature allows a designated editor, in effect the lead reviewer, to manage a
simple expert review process whereby experts in a given field are invited to
review each submission, providing comments and a rating. The editor then makes
a final decision, and the article is either accepted and published, sent back to
the author(s) for revision, or rejected. Published articles appear immediately
online, and via an arrangement with PubMed Central are quickly imported and
made available within that repository.
Following the success of PLoS: Currents Influenza, several
additional sections (moderated collections) have been launched
under the PLoS: Currents brand, including sections for Huntington's Disease,
Tree of Life, and Evidence on Genomic Tests, with additional sections in the
planning stages. In all several hundred articles have been published in the
two years since the PLoS: Currents launch.
However, with this rich feature set, the Knol platform also
brought a number of limitations.
Firstly, DTD/Structural conformance has been a challenge.
Although Google generously enhanced the Knol feature set to include an XML
output that was loosely compatible with JATS, the lack of control on the
authoring side meant that quite a bit of poorly-structured content made it into
the output and had to be removed, sometimes via hand-editing the exported XML.
For example, when using a web-based tool, or Microsoft Word, for that matter,
authors may select a heading style for headings, or they may simply make the
heading text boldface with a larger font size. To the eye, or in print,
headings marked in this disparate fashion may appear exactly the same, but for
tagging purposes the headings not explicitly tagged as such will not show up in
a table of contents or other summary created based on tags. Beyond the
limitations of Knol's XML output, Knol has no provision at all for importing
articles; all articles must be entered via web-based editing tools. This
makes it difficult for authors to compose and edit offline; if authors use
another tool such as Microsoft Word to create articles, the formatting and
structure can be quite messy when the article text is pasted into Knol's
editor, further exacerbating the tag conformance issue.
And it was not only the XML output (and lack of XML input)
that raised issues. The Knol platform itself had significant limitations in
terms of online presentation and customization. Unlike a web site or content management system (CMS) that
is under the control of the publishing entity, Knol is set up as a central
service. Users cannot alter the basic design, beyond adding a logo, and the
feature set (everything from how author and editor information is used and
displayed to the design and arrangement of content) is relatively fixed. Adding new features, such as a
nicely-formatted PDF or print output, more robust reference and citation
handling, or modification of the existing review and publishing workflow, were
simply not possible on the Google-hosted Knol platform.
Given the very real benefits of the Knol platform in
enabling the publication of PLoS: Currents, but also the limitations outlined
above, Google generously decided to fund a successor system
that would continue to facilitate the rapid review and publishing of web-based
scientific publications while also addressing some of the Knol system's
limitations. That project, and the product it was to produce, is called Annotum
(from the Latin for an individual annotation).
Project Objectives
The overall objective of Annotum is to develop a simple, robust, easy-to-use authoring
system to create and edit scholarly articles using JATS, and to deliver a working,
functional system that can be used to create, maintain, and publish scholarly
articles.
To meet this objective, Annotum must provide the following capabilities:
Allow publication owners to produce peer-reviewed journals online
Allow authors collaboratively to create content in the NLM DTD
Replicate the PLoS: Currents authoring and workflow including export
to PubMed Central
Address key limitations of the Knol toolset:
Provide flexible hosting options via open source code
Publications should be inexpensive (or free) to host/export
System should require minimal technical skills to install, configure, operate, and maintain.
It is worth noting that the scope of Annotum version 1.0 was
very explicitly limited: it is not intended to replace all print/online journals, tools, and/or,
systems. Annotum is an incremental step, but not the ultimate one, in the
evolution of scientific publishing systems.
Project Approach
When considering the goals for Annotum, an obvious question arises: why not use an existing product? After all, a number of products
seem to meet many of the capabilities outlined above, from full-blown CMS to standalone applications and addins for popular
word processing applications. For example, the Public Knowledge Project's (PKP) Open Journal System (OJS)
and Phase2 Technology's OpenPublish provide complete online publishing systems including workflow; many existing scientific and other publications use these
systems in production and have done so for years. Some standalone product offerings are also quite robust, such as Inera's eXtyles
or the Microsoft Word Article Authoring add-in — these focus on the creation of JATS-compliant XML from word processing documents. Hybrids such as PKP's
Lemon8-XML are set up as hosted systems and manage the conversion of different document formats into JATS. Outside of the scientific publishing sphere, the open source
WordPress software provides a comprehensive CMS with extremely simple setup and free (on the hosted version) or very inexpensive hosting options.
One option that was not considered in depth for Annotum was to create a completely new system from scratch. There are too many potential existing systems that provide
many of the desired features for version 1.0 to make a new application the best choice, simply from a cost and time perspective. Therefore, in devising an approach to
meeting the version 1.0 goals, the Annotum team focused on which existing tools could provide the best starting point for further customization.
One approach would be to take the existing stand-alone applications and attempt to merge them with an online tool. However, systems such as eXtyles are both proprietary
and expensive; even the free options such as the MS Word Article Authoring addin do not really lend themselves to integration with a publishing system, and furthermore rarely
support multiple platforms (PC, Mac, Linux). Even if we were to select an existing stand-alone tool to meet some of the Annotum version 1.0 requirements, a hosted publication
CMS would still be required.
So we took a look at a number of hosted systems. Lemon8-XML does provide JATS XML compliance, but it has too few of the
other required features to make a feasible starting point. This leaves the hosted site-publication CMS options: OJS, OpenPublish, and WordPress. All of these platforms have limitations;
OpenPublish and OJS both provide options for publications out of the box—subscription models and highly-configurable workflow options for example. But they also tend to
be rather complicated systems to set up and maintain. Upgrades for Drupal (the platform on which OpenPublish is based) are time-consuming and complex; maintenance and operation of OJS usually
requires a full-time techician, making it beyond the reach of a small research group. Other limitations surfaced: WordPress has no workflow built-in; conversely OJS and OpenPublish do have a very robust—but
perhaps overly complex -- workflow functionality, well beyond what is required for Annotum. Finally, OJS is more of a document handling system with no provision for web-based
editing of articles, a key Annotum requirement.
After considering the limitations of the three options considered in depth, we narrowed our scope to OJS and WordPress. Both are open source, both are extensible via a plugin architecture, and both support
customized templates (themes) for presentation. In the final analysis OJS' complex maintenance needs, overly complex workflow system, and lack of the core web-based editing capability, led us to select WordPress as the basis for Annotum version 1.0.
WordPress is extremely simple to set up and run, with numerous free or inexpensive hosting options available, and it comes with a rich set of user-friendly web-based editing controls. And WordPress functionality is
easily extended using plugins and themes. One could argue that OJS is also extensible, but the WordPress platform has spawned a far more diverse and productive
'ecosystem' of developers for themes, plugins, and extensions, meaning a much larger set of options for adding functionality to Annotum in the future. And finally, the setup and operation of a WordPress site is among
the simplest of any web-based application. On WordPress.com, for example, users with an account need only provide a name for their site to have a fully-functioning site available in seconds.
WordPress, whether in the hosted (WordPress.com) service or the
open-source and freely downloadable (WordPress.org) software package, has seen extremely
wide adoption for a very broad range of web sites:
14.7% of the top million websites worldwide (State of the Word, 2011);
55.9 million sites run WordPress software (Stats—WordPress.com);
Over 290 million people view more than 2.5 billion pages per month on WordPress.com (Stats—WordPress.com)
Of course, setting up a journal authoring, review, and publishing system isn't a popularity contest—but it is important to recognize the real benefits of using a platform with a very large user and developer/designer base. Because so many
people work on WordPress, journal publishers using a WordPress-based CMS have literally thousands of existing development and design shops, and themes and plugins, from which to choose.
Despite these many advantages, for the Annotum project
WordPress is missing some key requirements:
Support for multiple authors, article review workflow, and version comparison
Scholarly features such as citations, equations, and controlled document structure (headings, lists of figures/equations/tables)
Export to and import from the NLM/PubMed Journal Article DTD and other structured formats
Thus development of Annotum version 1.0 was focused on providing these additional features (shown in more detail below).
Annotum is provided as open-source software—all software
code and other materials will be made available to the open-source community
for use and future enhancement or development. More information about Annotum
can be found at [http://annotum.wordpress.com,
and the source code is available on GitHub [git://github.com/Annotum/Annotum.git].
Results
As of this writing (September 2011), Annotum version 1.0 is
nearing the end of its software development phase and about to enter the
initial beta test period. Annotum version 1.0 is scheduled to be released in
Fall 2011, both as a separate WordPress theme available for installation on
self-hosted (.org) WordPress sites and, thanks again to the generous
contributions of both Automattic, Inc. and Google, as a
free theme on the WordPress.com hosted service. This means that anyone will
be able to create a new journal with all of the features listed in this paper
at zero cost with very little if any technical expertise required.
The following section describes some of how Annotum is
implemented, and is followed by a brief walk-through of key Annotum features
and setup.
Annotum Implementation
Annotum is provided as a single WordPress theme, including
its own plugins, templates, and custom code. Although some features are
already available as stand-alone plugins, a key design philosophy of Annotum
was to keep the installation as simple as possible. WordPress plugins and
themes can at times conflict; the Annotum team sought to reduce the chance of
such conflicts by having a single theme contain all of the features to be
delivered.
The Annotum theme is based on the Carrington theme engine, a
CMS framework provided by Crowd Favorite Ltd., who also provided software engineering resources for Annotum. Carrington provides
an elegant framework for creating sophisticated WordPress themes, and supports multiple child themes for sites with
multiple publications (for example, a professional society with multiple
journals or sections, as with PLoS: Currents). In Annotum the Carrington engine is
enhanced with a workflow and permissions engine, along with a custom
post type ("article") that supports the additional requirements. An enhanced editor, based
on the TinyMCE editor that comes with the base WordPress package and
implemented as a series of TinyMCE plugins, rounds out the package.
It is perhaps difficult to overstate the challenge with adding JATS compliance to the editing component.
Many tools have attempted to provide a WYSIWYG environment for creating structured content, with varying degrees of success. The
approach in Annotum is to provide a very basic and simple set of formatting options, and rigorously strip from the content any tags or other
elements that are not compliant. This entails some overhead for the author, particularly if she has spent time laboriously crafting a
document in Microsoft Word and, for example, formatted all of her headings using sized fonts rather than a heading style. Once pasted into Annotum
the font sizes are stripped out. Only by ensuring structure conformance at authoring time can we ensure that all text in the system
is compliant with the underlying schema. In the case of Annotum, the schema used is a subset of JATS called Kipling.
Enforcing this XML conformance and still retaining both appropriate web formatting for the published pages along with a true WYMIWYG display at editing time was one of the more challenging tasks facing the development team.
It is much easier to display article content (either in the editor or as a previewed
or published web page) when it is kept in an HTML format, but at the same time
the XML format must be retained for use in exporting and for validation.
Annotum resolves these divergent goals by storing both the 'filtered' XML
content and the 'unfiltered' HTML content. The 'normal' post content location,
the post_content field in the wp_posts table, is used for the unfiltered
content, while the XML version is stored in the post_content_filtered field.
The figure below shows a comparison of the unfiltered and filtered content for a sample
article section containing text with a heading and a table.
Fig. 2Comparison of unfiltered (HTML) and filtered (XML) content stored by Annotum.
Annotum Feature Walkthrough and Demo
[This is a screen-shot version of the Annotum live demo.]
The basic workflow in Annotum is as follows:
Authors must log in to reach the Annotum dashboard, then either open an existing article or create a new one.
Once they arrive at the article editing screen, authors invite any coauthors with whom they may wish to
collaborate.
Fig. 5Main Article Editing Screen
Fig. 7The editor can be expanded to fill the screen
The editor provides a number of features. For example, authors may:
Insert and format text
Add formatted tables
Add equations and quotations
Create a reference list and insert references:
Among the collaboration features of Annotum is support for internal discussion comments
Editors, reviewers, and authors also have the ability to view a log of work on the article
and to compare article revisions.
Once editing is complete, authors submit their article for review. The submission process is a single click:
.
Different
users (authors, editors, etc) see a different view of the article status pane depending on their permission level:
Once the article is submitted, Annotum then optionally notifies the editor (if email notification is disabled, the
editor may simply monitor the article queue for new submissions), and the
editor assigns one or more reviewers to the task.
Next, reviewers
sign in, read the article, and enter comments if desired. Reviewers have a separate, private comment area as well,
one not visible to Authors. Optionally, site administrators can enable a form
of open-process review in which review comments (and the identity of each
reviewer) is visible to the authors.
After the reviewer enters any relevant comments, which might be a question back to the editor (in effect the lead reviewer), he or she makes a
recommendation: Approve, Reject, or Request Revisions.
When all the reviews are submitted (or whatever portion is sufficient according to
the editorial policy of the publication) the editor makes a final ruling on the
article, which again can be Approve, Reject, or Request Revisions.
Once approved, the article is ready for final copyediting and publishing by a site
admin (editors cannot publish); if revisions are requested the article returns
to a draft status for further editing by the author(s). And if the article is
rejected it is removed from the publication queue. In all cases notification
is sent to the authors and editors of the final decision. The publication
staff (admins) makes whatever final tweaks are required, and publishes the
articles live on the public-facing web site.
Annotum XML is based on the Kipling subset of the JATS, and will validate via the
PMC XML ValidatorIf the publication has made arrangements with PMC for inclusion of published work
in the repository, PMC will monitor that publication's Annotum RSS feed for
newly-published articles. When a new article is available, PMC will request
the XML version and import it directly into PMC for publication there.
Conclusions
Annotum significantly reduces the barriers to entry for new,
scientific journals built on the type of rapid-review process pioneered by
PLoS: Currents. It is hoped that by providing both free software and free
hosting, a vibrant open-source community will develop around Annotum, whereby
scholars and others can contribute new features and foster the spread and
improvement of this publishing tool.
Where can this growth lead? Given the ability of Annotum to
both import and export NLM-JATS tagged content, it is possible to envision a
number of compelling use cases from individual groups of interested parties
self-publishing journals to an entire ecosystem of content re-use and
republication across professional societies, individuals, universities, and
public knowledge repositories such as PMC.
For more information about Annotum, please visit the Annotum home page,
download the code from the (Github repository),
participate in discussions and get support via the (Annotum discussion group),
or follow @annotum on Twitter.
The author may be reached via solvitor.com
or directly at moc.rotivlos@lrac.
Acknowledgments
Annotum is a production of Solvitor LLC with heavy lifting provided by Crowd Favorite, and special thanks to: Google, PLoS, NIH/NLM/NCBI, and Automattic.