Converting a Markdown File to PDF Using Pandoc

Working with knitr and markdown is a great way to share quick reports with colleagues, but in cases where IE8 is still the dominant browser, shipping an HTML file with embedded graphics is a non-starter. IE8 does not support the Data URI format used to embed images directly in the HTML file if those files are greater than 32kb ( This means that you can't easily share a graphic heavy report as an HTML file with colleagues (and really, what statistical report isn't figure heavy?).

So the next best thing is to ship a PDF. One way to do this would be to print the HTML file from a browser that can display it as a PDF. In this case, the resulting file is generally quite ugly, the images are distorted often, and the header and footer are problematic. Another way is to rewrite your report with Markdown more friendly for conversion into LaTeX and then to PDF. Neither of these is fun, neither is efficient, and neither looks ideal.

Luckily, I found a great way to use pandoc to convert the HTML report into a good looking PDF without resorting to rewriting the report in LaTeX and reknitting. This means you can get the power of Markdown with the portability of PDF for long form documents and one-off data reports. All you need is a handy little script to do the translating from format to format.

You can read the StackOverflow discussion here:

My interpretation/use of this is below:

# Define your report
# Knit the Rmd to an Md file
# Convert the MD file to Html
system("Rscript -e 'require(knitr);require(markdown);knit('$RMDFILE.rmd', '$');
markdownToHTML('$', '$RMDFILE.html', options=c(\"use_xhml\"))'")
markdownToHTML('','myreport.html', options=c("use_xhml"))
# convert the system("pandoc -s myreport.html -o myreport.pdf")

All Your Source Code Are Belong to... Nature?

The Journal of Nature put out an interesting op-ed recently discussing the need to make source code available for scientific articles that require statistical computation to produce their results.
The article is hits on a point that is absolutely critical--statistical computing is difficult. Honest mistakes get made. A lot. The peer review process catches theoretical flaws, omitted bibliographic references, and some criticism of the methods based on the amount of detail provided in the article itself. But, all of those flaws could be absent and an article could still be fatally flawed and draw completely false conclusions, simply due to an error in the code, and it would still be published if that code was never reviewed or made public.
A big concern here is transparency, as the authors state so well:
Our view is that we have reached the point that, with some exceptions, anything less than release of actual source code is an indefensible approach for any scientific results that depend on computation, because not releasing such code raises needless, and needlessly confusing, roadblocks to reproducibility.
And of course, R and Sweave are mentioned as an elegant solution to this problem:
There are a number of tools that enable code, data and the text of the article that depends on them to be packaged up. Two examples here are Sweave associated with the programming language R and the text-processing systems LaTeX and LyX, and GenePattern-Word RRS, a system specific to genomic research31.Sweave allows text documents, figures, experimental data and computer programs to be combined in such a way that, for example, a change in a data file will result in the regeneration of all the research outputs.
Technology has changed the tools necessary to ensure rigor and replicability in science, but not the principle behind it. It is great to see a journal such as Nature making the case for this level of scrutiny to be applied to the computational routines used to derive results.