U.S. flag

An official website of the United States government

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Journal Article Tag Suite Conference (JATS-Con) Proceedings 2016 [Internet]. Bethesda (MD): National Center for Biotechnology Information (US); 2016.

Cover of Journal Article Tag Suite Conference (JATS-Con) Proceedings 2016

Journal Article Tag Suite Conference (JATS-Con) Proceedings 2016 [Internet].

Show details

Jatdown: a markdown language for writing JATS

, MD.

Author Information and Affiliations

Markdown is a popular writing syntax based on it's simplicity, readability, and wide-range of support. Despite it's success in the software development community, the utility of Markdown in scholarly publishing is limited by Markdown's simplistic document model. Plain Markdown alone is too limited to support the expressiveness required in a scholarly publishing context.

The scholarly article document model is precisely which JATS addresses. JATS has a large and somewhat complex feature set and wide range of flexibility. This complexity and flexibility comes as a price of cognitive burden and potential incompatible interpretations of the tag set.

In this paper we will explore Jatdown , a Markdown variant that is designed for conversion to JATS. Jatdown provides a way to author JATS using a Markdown syntax that maps to a subset of the JATS document model.

Background

Markdown was invented by John Gruber and Aaron Swartz in March 2004. Gruber, himself a bit of an XML geek, was frustrated with the state of blogging tools at the time. Specifically, it was not straightforward to generate a weblog entry for Moveable Type. His blogging workflow involved multiple steps involving BBEdit, in-browser preview of the HTML, and ultimately pasting HTML into Moveable Type. Gruber disliked this.

Despite his disdain for blogging workflows, he did recognize that he enjoyed the process of writing email. What was it about email that was likable? In his case, email was a plain-text medium where he can could quickly express his ideas clearly without the it getting in the way.

Thus, Markdown was born with the specific goal writing structured documents (weblogs) using a syntax based on email, preserving the ability to include HTML snippets if needed. His Markdown.PL script served as the prototype for the now hundreds of Markdown implementations that exist in nearly all major programming languages.

In the subsequent 12 years Markdown has become the de-facto standard writing medium for programmer-related content on the web:

  • The vast majority of README files and software documentation on github.com (12 million users, 31 million repositories) are written in Markdown.
  • The 29 million or so posts on stackoverflow.com (5.3M users) are written in Markdown.
  • Posts on reddit.com (~7 billion) are formatted as Markdown (although the majority of these are simply plain-text without any Markdown formatting).
  • New and innovative book publishing sites such as leanpub.com use Markdown as the primary syntax.

The reason for the wide adoption of Markdown is partially due to the ease of learning Markdown (it can be learned in a matter of minutes), the wide array of tools to support it, and the success and growth of sites such as those mentioned above. A qualitative indication of the increase in Markdown interest is given in Figure 1.

John Gruber remains an active blogger but no longer involved in the ongoing development of Markdown. The primary community interested in promoting Markdown as a standard is CommonMark. CommonMark aims to provide an unambiguous specification for Markdown processors and is led by John MacFarlane, also the primary author of Pandoc (discussed later in this paper).

Figure 1. Google Search Trends for Markdown.

Figure 1Google Search Trends for Markdown

Google search trend line graph for the word " markdown". Y-axis values are a measure of relative search activity.

Relevance to Scholarly Publishing

Scholarly publishing is currently entering a new phase that increasingly values the author experience in the publishing process. The factors driving this author-centric change may include:

  • Increased competition (more journals to choose from).
  • The shift in payment towards open-access models (wherein the author is more directly exposed to the costs of publishing).
  • Increasing availability of APIs in the scholarly publishing space (a critical factor in making it possible to improve the author experience).

Whatever the root cause, it's now more important than ever to facilitate a hassle-free experience for scholarly authors.

John Gruber's frustration with his weblog writing workflow is, at a basic level, a frustration shared by many if not most scholarly authors. It is likely there are lessons to be learned from the explosive popularity of Markdown in the programming community that can be adapted to the scholarly publishing space to improve the author experience.

Current Use in Scholarly Publishing

Current usage of Markdown in scholarly publishing is negligible. The reasons for this include:

  1. The population of people publishing in scholarly journals is different than the population of people currently writing Markdown. Publishing in scholarly journals is not a key factor for career advancement for most software developers.
  2. Software tools that support a Markdown-based workflow for scholarly publishing are inadequate or too difficult to use.

For those people that are engaging with Markdown, The main tool that people are currently directly or indirectly using is Pandoc, written by John MacFarlane. Pandoc is a command-line based program written in Haskell capable of converting multiple document formats via a JSON-based document abstraction.

As a command-line utility, Pandoc's use is limited to those persons comfortable in a terminal interface. However, Pandoc is at the core of several commercial academic writing tools including authorea.com and ManuscriptsApp, as well as freely available tools such as R-markdown. Pandoc is available under a GPL license.

Alternatives to Markdown

There are a number of alternative plain-text structured document formats that share similar design considerations to Markdown. Several of these are arguably more appropriate for use as a foundation for conversion to JATS. However, Markdown is unquestionably more popular overall and is the preferred writing format for the author (pcj) and thus Markdown was chosen over the alternatives. The most viable include:

  • AsciiDoc. "AsciiDoc is a text document format for writing notes, documentation, articles, books, ebooks, slideshows, web pages, man pages and blogs. AsciiDoc files can be translated to many formats including HTML, PDF, EPUB, man page."
  • reStructuredText: "reStructuredText is an easy-to-read, what-you-see-is-what-you-get plaintext markup syntax and parser system. It is useful for in-line program documentation (such as Python docstrings), for quickly creating simple web pages, and for standalone documents."

Prior work on adapting Markdown for scholarly writing

This is a partial list is should not be considered an authoritative reference on prior work. Several notable projects include:

Scholarly Markdown

Scholarly Markdown (scholdoc) is a fork of Pandoc created by Tim T.Y. Lin with additional extensions for creating figures, numbered equations, and other constructs relevant to scholarly publishing. Scholdoc contains a number of good ideas but has not been updated since April 2015.

Pandoc-JATS

Martin Fenner, a previous medical oncologist, blogger, and now the technical director of DataCite has been a vocal proponent for the use of Markdown for writing scholarly works. One such effort is a lua-plugin for Pandoc that is capable of generating JATS from Pandoc's central JSON-based document representation. The use of Pandoc-JATS requires a similar or slightly higher level of technical proficiency as using Pandoc directly as well as a knowledge of JATS document structure for customization.

Markx

Markx is an online Markdown editor intended for scientific writing. It is an open-source project that hosts a demonstration editor running as a Heroku app. It uses Pandoc as the document generation backend. It does not appear to be under active development and has not been significantly updated since 2013.

Summary of prior work

In nearly all cases Pandoc is used as the key enabling tool. While there is no question that Pandoc is an excellent piece of software and John MacFarlane is an outstanding programmer and community builder, it is possible that Pandoc is not the ultimate solution for scholarly publishing. Specifically:

  1. Pandoc was specifically designed as the "universal document converter". As such, Pandoc's document abstraction will remain a least common denominator solution.
  2. Pandoc is hard to use for most mortals.
  3. Haskell remains an obscure programming language for the majority of mainstream developers. A solution written in Javascript would be a much more accessible toolset.

Using JATS as the common document abstraction

While JATS is currently a fairly obscure tagset known to relatively few, it is has become the de-facto standard for scholarly publishing. Application programming interfaces (APIs) to the incumbent distribution channels for scholarly content are converging on JATS.

Therefore, the author believes that JATS to be the best practical and relevant document abstraction that should serve as the common article representation. The thesis is that we should build tools to convert in- and out- of JATS to build an open, transparent, and reproducible ecosystem for scholarly publishing.

JATS has it's own set of issues. These include but are not limited to:

  1. JATS is complex. There are a dizzying array of available elements, and it's not always clear what the right approach should be for a given scenario.
  2. JATS is flexible. Like all real-world solutions, the model captures an evolution of ideas over time and desires to support the needs of different organizations having disparate goals. The inherent flexibility of JATS lends itself to parallel incompatible interpretations. Projects like JATS for Reuse are critically important for providing clarity on best practices.
  3. JATS authoring tools for the end-user (the scholarly author) are essentially non-existent. Nearly all solutions cater to the needs of publishers near the end of the publishing workflow.

https://pubref.org (under active development) aims to be a place for scientific and medical authors to manage, share, and collaborate on research projects and write scholarly manuscripts.

By providing a writing environment that produces high-quality JATS output, the intent is that value can be provided to both authors and publishers: authors benefit from a more efficient and reproducible workflow, while publishers are provided semantically-tagged content that requires less effort to wrangle into a publishable format.

Migration of the scholarly publishing workflow chain from the early stage of content generation all the way through the final stages of publishing to a semantic document model (JATS) may ultimately have a major community benefit: we can, as a culture, depart from the ridiculous formatting requirements that pervade the consciousness of scholarly journals. Authors hate these needless rules. It is no small irony that journals should demand a Microsoft Word file from authors in a specific format only to immediately strip this away via conversion to JATS using tools such as eXtyles.

There are a great deal of human-hours to be spared when we can arrive at a solution that does not burden the author with unnecessary formatting.

Jatdown

Jatdown is a Markdown variant is intended for conversion to JATS XML rather than HTML. The prefix jat- is used rather than jats- as the pronunciation is more akin to jotdown , as in "jotting down notes".

Intent of this paper

As Jatdown is still under development, this document is not intended to be a fixed reference document or official specification. For example, when the abstract of this JATS-CON paper was submitted, the author proposed YAML as an alternative means to express structured content. While this has some advantages, the requirement for a full YAML parser in addition to a Markdown parser is probably too complicated, and Jatdown has since migrated away from this approach.

Framework

Jatdown is a compiler framework written in javascript for converting document formats in- and out- of an idealized JATS document model. Jatdown compilation is a multi-step process involving several actors:

  1. Lexer: responsible for transforming a character stream into a set of tokens that are passed to the parser.
  2. Parser: responsible for interpreting the token stream and building an abstract syntax tree (AST).
  3. Post-processor: responsible to iteratively transforming the AST into a modified AST. Post-processing is divided into a linear sequence of compiler phases . Each phase may contain multiple compiler passes that can operate in parallel (such as fetching citation metadata).

Syntax

While several Markdown parsers are written using context-free grammars, the majority follow the a hybrid approach of initially assigning block structure to the document and then parsing each block for inline structure.

It is therefore useful to think about the syntax in terms of block-level elements and inline inline elements as a guide. As there are many available "Introduction to Markdown Syntax" guides available online, this document will jump directly into the mapping between Markdown syntax and JATS structure.

Inline Elements

Basic Formatting

Markdown supports both asterisk and underscore methods to express <em> and <strong> elements. In Jatdown, there is only one way to express bold or italic content; the other forms are used for underline and overline. A listing of inline formatting syntax and their corresponding JATS equivalents are given in Table 1.

Table 1Inline Formatting Elements

Mapping between Markdown syntax and XML representations for inline formatting elements.

Jatdown InputJATS-XML OutputFormatted Output
*emphasis*<italic>emphasis</italic>emphasis
**strong emphasis**<bold>strong emphasis</bold>strong emphasis
__underline__<underline>underline</underline>underline
1 tab POTID _c_ meals 1 tab PO <overline>c</overline> meals 1 tab PO c meals
~~strike~~<strike>strike</strike>strike
var x = 1;<monospace>var x = 1;</monospace>var x = 1;
H~2~0H<sub>2</sub>0 H 2 0
6.022x10^23^6.022x10<sup>23</sup> 6.022x10 23
External Links and Cross References

Markdown implementation typically support three or more forms of linking: these are known as inline links , reference links , and autolinks . Depending on the link destination (the URL), an <ext-link> or <xref> element is created.

  • Inline link : [JATS](http://jats.nlm.nih.gov/ "JATS Homepage") becomes <ext-link xlink:href="http://jats.nlm.nih.gov/" alt="JATS homepage">JATS</ext-link>
  • Reference link : [JATS][1] becomes <ext-link xlink:href="http://jats.nlm.nih.gov/" alt="JATS homepage">JATS</ext-link> assuming there is a link reference definition line such as [1]: http://jats.nlm.nih.gov "JATS homepage" somewhere in the document.
  • Autolink : <http://jats.nlm.nih.gov> becomes <ext-link xlink:href="http://jats.nlm.nih.gov">http://jats.nlm.nih.gov</ext-link> .

If the link destination starts with a hash-character, it is interpreted as an internal link (similar to HTML) and rendered as an <xref> , using the Camel-Cased label as the linking mechanism. For example:

  • Inline cross-reference : [Figure 1](#Figure1) becomes <xref ref-type="figure" rid="Figure1">Figure 1</xref> assuming there is a <fig><label>Figure 1</label>...</fig> block in the document.
Citations

Markdown parsers officially recognize a subset of possible URI schemes as valid links. Jatdown processes pmid: and doi: link schemes in a special manner. If links of these types are discovered, Pubmed or Crossref (respectively) are queried for citation metadata and the corresponding reference is constructed in the <ref-list> .

For example, <http://doi.org/10.1038/nphys1170> , having the scheme http , is interpreted as a typical external link that happens to point to a digital object identifier.

However, <doi:10.1038/nphys1170> is interpreted as a request to fetch the metadata for that DOI from crossref, construct a <ref id="bibr-doi_10.1038-nphys1170"><element citation>... for the citation (if one does not exist already), and generate an <xref ref-type="bibr" rid="bibr-doi_10.1038-nphys1170">(Aspelmeyer 2009)</xref> , using a pre-configured CSL style to format the xref content.

Multiple citations can be specified to generate a citation cluster (an xref having multiple IDREFs). A simplified example: <doi:A,doi:B,doi:C> becomes <xref rid="bibr-doi_A bibr-doi_B bibr-doi_C">[1-3]</xref> .

Additional metadata providers other than pubmed.org (pmid:) and http://crossref.org (doi:) can be implemented in the future.

Footnotes

Footnotes in Jatdown work like PHP-Markdown and hence like reference-style links. A footnote is made of two elements: a footnote marker in the text that will become a superscript number and a footnote definition that will be placed in a list of footnotes in the nearest footnote-compatible block. See the <table-wrap> wrapper block example (given below) for an example.

Footnotes bubble up into the nearest containing footnote-compatible block (content model supports <fn-group> ). These include <table-wrap> , <sec> , <abstract> , <front> , and <back> .

Block Elements

Sections

A Markdown document is can be partitioned into sections via the use of ATX or Setext style headers. These map relatively easily to JATS <sec> elements, but one then needs to convert to a nested structure. For example, consider the the following Markdown:

# Materials & Methods
...

### Description of Study
...

## Protocol for Data Collection
...

# Results
...

A Markdown parser will typically not assign nested structure according to the header level. For example, the following HTML will typically be generated:

<H1>Materials &amp; Methods</H1>
<p>...</p>
<H2>Description of Study</H2>
<p>...</p
<H2>Protocol for Data Collection</H2>
<p>...</p>
<H1>Results</H1>
<p>...</p>

JATS expects a nested structure. In Jatdown, a post-processing step is applied to assign the correct nesting based on the relative ordering of header levels:

<sec disp-level="1">
 <title>Materials &amp; Methods</title>
 <p>...</p>

 <sec disp-level="2">
  <title>Description of Study</title>
  <p>...</p>
 </sec>

 <sec disp-level="2">
  <title>Protocol for Data Collection</title>
  <p>...</p>
 </sec>

</sec>

<sec disp-level="1">
 <title>Results</title>
  <p>...</p>
</sec>

Although section nesting is done by the relative ordering of header levels, the original header level numeric value (the number of pound-sign characters) is preserved in the disp-level attribute.

Lists

Lists in Markdown are recognized by common list prefixes given in Table 2. Sublists are expressed by indenting a list item by four spaces relative to the previous list item.

A significant difference between JATS and HTML is that list items cannot contain inline elements, but are instead buffered by an additional paragraph element. Also, sublists must exist within a list-item. Therefore, in transforming Jatdown to JATS, care must be taken to ensure that sublists are contained with list-items and list-items are padded with paragraph elements. For example:

1. Item 1
2. Item 2
    * Bullet A
    * Bullet B

Is transformed to:

<list list-type="order">
 <list-item><p>Item 1</p></list-item>
 <list-item><p>Item 2</p></list-item>
 <list-item>
  <list list-type="bullet">
   <list-item><p>Bullet A</p></list-item>
   <list-item><p>Bullet B</p></list-item>
  </list>
 </list-item>
</list>

An optional alternative rendering would explicitly assign a <label/> element to each list item according to the JATS content model for list-item: (label?, title?, (p | def-list | list)+)) however in the current Jatdown implementation this is not done as the list-item label is implied by the list-type.

Table 2List type mappings

Mapping between Jatdown list syntax and <list list-type=""/> attribute values. Note there is no syntax equivalent for the simple list type.

List Prefixlist-type attribute value
[* ] or [- ]bullet
[1. ]order
[a. ]alpha-lower
[A. ]alpha-upper
[i. ]roman-lower
[I. ]roman-upper
(no mapping)simple
Definition Lists

Work on definition lists is not complete, but is likely to use a bracketed list item prefix to define the list structure in conjunction with a wrapper section (see below) to indicate the <def-list> label and title. For example:

# [Definitions 1] Abbreviations

- [PAP-1] poly(A)polymerase I 
- [PNPase] polynucleotide phosphorylase

Woud be tranformed to:

<def list>
 <label>Definitions 1</label>
 <title>Abbreviations</title>

 <def-item>
  <term>PAP-1</term>
  <def><p>poly(A)polymerase I</p></def>
 </def-item>

 <def-item>
  <term>PNPase</term>
  <def><p>polynucleotide phosphorylase</p></def>
 </def-item>

</def-list>
Code Blocks

Blocks of preformatted text are expressed in Markdown as either indented code blocks (each line is indented by four spaces) or fenced code blocks (delimited by lines with three or more fence characters (typically a backtick or tilde character). These are usually converted to <pre><code>...</code></pre> HTML. In Jatdown, these are converted to <preformat> blocks. If the fenced code block is used (preferred), the info-string (the text on the opening fence line) is passed to the preformat-type attribute. For example:

```javascript
var sayHello = function() {
    alert('Hello World!')
}
```

Is converted to:

<preformat preformat-type="javascript" xml:space="preserve">
var sayHello = function() {
    alert('Hello World!')
}
</preformat>

The preformat-type can be used to facilitate syntax highlighting. As Pubref uses Prism.js, the recommended attribute values to identify programming languages for syntax highlighting is given in Table 3:

Table 3Preformat-type attribute values

Recommended preformat-type attribute values for common identification of programming languages, for the purpose of syntax highlighting.

Common NameAttribute value
Markupmarkup
CSScss
C-likeclike
JavaScriptjavascript
ABAPabap
ActionScriptactionscript
Apache Configurationapacheconf
APLapl
AppleScriptapplescript
AsciiDocasciidoc
ASP.NET (C#)aspnet
AutoItautoit
AutoHotkeyautohotkey
Bashbash
BASICbasic
Batchbatch
Bisonbison
Cc
C#csharp
C++cpp
CoffeeScriptcoffeescript
Crystalcrystal
CSS Extrascss-extras
Dd
Dartdart
Diffdiff
Dockerdocker
Eiffeleiffel
Elixirelixir
Erlangerlang
F#fsharp
Fortranfortran
Gherkingherkin
Gitgit
GLSLglsl
Gogo
Groovygroovy
Hamlhaml
Handlebarshandlebars
Haskellhaskell
Haxehaxe
HTTPhttp
Iconicon
Inform 7inform7
Iniini
Jj
Jadejade
Jatdownjatdown
JATS-XMLjats
Javajava
JSONjson
Juliajulia
Keymankeyman
Kotlinkotlin
LaTeXlatex
Lessless
LOLCODElolcode
Lualua
Makefilemakefile
Markdownmarkdown
MATLABmatlab
MELmel
Mizarmizar
Monkeymonkey
NASMnasm
nginxnginx
Nimnim
Nixnix
NSISnsis
Objective-Cobjectivec
OCamlocaml
Ozoz
PARI/GPparigp
Parserparser
Pascalpascal
Perlperl
PHPphp
PHP Extrasphp-extras
PowerShellpowershell
Processingprocessing
Prologprolog
Puppetpuppet
Purepure
Pythonpython
Qq
Qoreqore
Rr
React JSXjsx
reST (reStructuredText)rest
Riprip
Roboconfroboconf
Rubyruby
Rustrust
SASsas
Sass (Sass)sass
Sass (Scss)scss
Scalascala
Schemescheme
Smalltalksmalltalk
Smartysmarty
SQLsql
Stylusstylus
Swiftswift
Tcltcl
Textiletextile
Twigtwig
TypeScripttypescript
Verilogverilog
VHDLvhdl
vimvim
Wiki markupwiki
YAMLyaml
Block Quotes

Markdown blockquotes (a line starting with a open-angle bracket > ) are converted to <disp-quote> elements. If the last line of a blockquote starts with two dashes, this is translated to the attribution.

> Dead flies cause the ointment of the apothecary to send forth a
> stinking savor; so doth a little folly him that is in reputation
> for wisdom and honour.
> -- Ecclesiastes 10:1

Becomes:

<disp-quote>
 <p>
  Dead flies cause the ointment of the apothecary to send forth a
  stinking savor; so doth a little folly him that is in reputation
  for wisdom and honour.
 </p>
 <attrib>Ecclesiastes 10:1</attrib>
</disp-quote>
Equations

Jatdown follows the example of other Markdown processors by borrowing the inline and display math delimiters native to LaTeX. These are converted to <inline-formula> and <disp-formula> , respectively. For example, the inline math equation $V = IR$ (V=IRV = IR) becomes:

<inline-formula>
 <alternatives>
  <tex-math notation="LaTeX">V = IR</tex-math>
  <mml:math>...</mml:math>
 </alternatives>
</inline-formula>

Note that the LaTeX syntax is preserved in the <tex-math> element, and the corresponding MathML is included (not shown for brevity in the example). Display math $E = mc^2$ follows a similar pattern:

<disp-formula>
 <alternatives>
  <tex-math notation="LaTeX">E = mc^2</tex-math>
  <mml:math>...</mml:math>
 </alternatives>
</disp-formula>

E=mc2E = mc^2

Tables

Currently GitHub-flavored Markdown table syntax is supported. Support for more complex tables is an ongoing area of study. Clearly, good table support is important in scholarly publishing. One observation is that many tables generally resemble nested lists, and a possible hybrid list + table syntax is being considered. An alternative is to adapt concepts from LaTeX.

Currently, Jatdown generates XHTML-based <table> with a formal structure that always includes a <colgroup> , a <thead> , and a <tbody> .

For example:

|  A  |  B  |  C  | <-- thead
|:----|:---:|----:| <-- colgroup (optional alignment specifiers)
|  1  |  2  |  3  | <-- tbody

Becomes (abbreviated):

```jats
<table>
 <colgroup>
  <col style="align: left"/>
  <col style="align: center"/>
  <col style="align: left"/>
 </colgroup>
 <thead>...</thead>
 <tbody>...</tbody>
</table>
Labeled Sections

<sec> , <fig> , <table-wrap> (and others) have the same basic content model that includes an optional label, and optional title, and some other content. Jatdown re-uses the ATX header syntax to denote these kind of blocks if the header content begins with the general pattern (WHITESPACE, OPEN_BRACKET, NOT_CLOSE_BRACKET+, CLOSE_BRACKET) . For example, a <fig> can be expressed as:

# [Figure 1] Type II CRISPR-Cas

![](./figures/type-2-crispr.tif)

Genetic organization of the type II-A CRISPR-Cas system...

Which becomes:

<fig id="fig-Figure1">
<label>Figure 1</label>
<graphic xlink:href="figures/type-2-crispr.tif" mimetype="image" mime-subtype="tiff"/>
<caption>
 <title>Type II CRISPR-Cas</title>
 <p>Genetic organization of the type II-A CRISPR-Cas system...</p>
</caption>

Similarly, a table-wrap block:

# [Table 1]: Patient Characteristics

|         | Pre (n=151) | Post (n=146)[^1] |
|---------|-------------|------------------|
| Ascites | 89 (59%)    | 34 (22%)         |

Number of patients with ascites pre- and post- treatment.

[^1]: Data not available for 1 trial.

Becomes:

<table-wrap id="table-wrap-Table1">
 <label>Table 1</label>
 <table>...</table>
 <caption>
  <title>Patient Characteristics</title>
  <p>Number of patients with ascites pre- and post- treatment.</p>
 </caption>
 <table-wrap-foot>
  <fn-group>
   <fn id="TF1-150"><p>Data not available for 1 trial.</p></fn>
  </fn-group>
 </table-wrap-foot>
</table-wrap>

A similar section labeling pattern will be applied for special sections such as <abstract> , <ack> , <license> , <notes> , <statement> .

JATS-XML Blocks

Similar to HTML blocks in Markdown, Jatdown optionally supports inclusion of raw JATS XML when needed. It is the onus of the author to ensure that the XML is well-formed and sensible for the document context.

Pipeline: a writing environment for JATS

Pipeline is Pubref's web-based version control, JATS authoring, and manuscript submission application. The main components of Pipeline are:

  1. A web-based WYSIWYG editor that generates JATS-XML. While plain-text formats have desirable characteristics previously mentioned, it is clear that forcing all users into a plain-text only workflow is a strategy likely to remain in obscurity. Less technical users will need a WYSIWYG editor to be productive.
  2. A text-based document representation (Jatdown) that allows more technical users to write in plain-text, but still be able to use a WYSIWYG editor when creating more complex structures (such as tables and/or multi-paneled figures).
  3. A filesystem abstraction with built-in version control that makes it easy for non-technical users to benefit from modern version control workflows. Reproducible research is simply not possible without versioning.

A screenshot of the Jatdown writing environment is shown in Figure 2. The JATS WYSIWYG editor is shown without the text editor component in Figure 3.

Persons interested in beta access to the JATS editing environment should sign-up at https://pubref.org.

Figure 2. Jatdown Writing Environment.

Figure 2Jatdown Writing Environment

Screenshot of the side-by-side text editor panel and WYSIWYG editor. Dual synchronization facilitates using either editor depending on author preference.

Figure 3. WYSIWYG Writing Environment.

Figure 3WYSIWYG Writing Environment

Screenshot of the WYSIWYG JATS editor. The document outline view facilitates drag-and-drop rearrangement of article block structure.

Conclusions

As the notion of the scholarly manuscript evolves from its current state (a piece a paper representing a frozen opinion in time) to its future state (a versioned container that bundles the written word with the code + software environment to reproduce the results), there exists both great challenges and great opportunity to modernize the software stack used by the scholarly publishing community and optimize the experience for the scholarly author.

The components to support that future already exist in the software community today. Our challenge is to translate these concepts and tools in a way that non-technical authors can make productive use of.

It is the opinion of the author that plain-text formats such as Jatdown are one complement of the open, transparent, and hopefully reproducible future of scholarly communication.

Copyright 2016 by Paul C. Johnston.

"The copyright holder grants the U.S. National Library of Medicine permission to archive and post a copy of this paper on the Journal Article Tag Suite Conference proceedings website."

Bookshelf ID: NBK350677

Views

  • PubReader
  • Print View
  • Cite this Page

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...