PDF One of the most used file formats. 

One of the simplest to use. Since even browsers can open them, we don’t even need any additional software besides OS. 

One of the first pieces of advice we received was to always print files using PDF. Even in today’s age when unnecessary printing is being discouraged, we still need an easy way to create and share documents without worrying about formatting issues caused by different versions of software. 

But what about us, developers? 

What tools do we have at our disposal to create invoices, certificates, whatever it is your boss sends to other bosses, etc… ? Well, as most of you already know, that is a little more difficult than one would expect. There are tons of possible solutions, every language offers at least a few libraries (going by Google, PHP seems to have most of them) and also many paid services. So, which one to use?

Well, if you are searching for a simple answer that would say: Use XYZ!, I will disappoint you. There is not one solution that might fit everyone. But let me show you some (spoiler alert: 2) simple ways to create PDF documents without much worries and see where it gets us.

Before we start, I think we might want to define what we want to achieve. I compiled requirements I consider important:

  • Styling (fonts, colors, etc)
  • Tables with header on every page
  • Header/footer on every page with page numbers
  • Running custom code (for charts, maps, etc…)

Other features like a signed document or editable will be considered nice to have, but not essential.

React-pdf

Open source library that combines React (to define layout and content) and PDFKit (for rendering). One of its biggest advantages is simplicity: anyone with knowledge of React will be able to generate documents in a matter of minutes. Also the rendering process doesn’t use any HTML to PDF conversion, but creates documents directly in PDF format (which, to be honest, I am still not really sure how much advantage is). Styling can be achieved using CSS and flexbox, either using inline styles or Stylesheet object.

Introduction

Example of simple document:

Now we have multiple options on how to get our document to the user. React-pdf supports both client-side and server-side rendering. This is how our document looks rendered in a browser:

Example of simple document:

Simple and self-explanatory. But we might not want to display documents, we want to allow users to download them. See the Download now! Link on top? This is its source:

OK, client side rendering might be cool, but less practical in the real world. Luckily, server-side rendering is as easy.

And that’s it. Combine it with express.js, add some parameters and reports-generating endpoint is ready. Now let’s check which requested features are supported.

Tables

React-pdf has a set of available components  which we can use for defining documents. Quick glance does not reveal any table components. And that seems to be true also for PDFKit. But we need our tables! Looks like the only way to achieve tables is to style <Text> and <View> components to look like tables. At least we can use flexbox.

This gives us a simple (admittedly terrible looking, but hey, I am not going to steal the fun of writing CSS from you) table.

We can also create our own Table, TableCell etc. components to avoid repetition. But look at the end of the first page.

Table row is broken. It might not be obvious, since all values are the same, but we lost one John Smith. Yeah, this is going to Jira. Fixing it isn’t very complicated. Just add wrap={false} to every <View> in our example and try again.

Perfect. This property can be used also on any part of the document which we don’t want to be broken. Also be sure to check documentation for more page wrapping options.

Now we would like to have a table header on every page. I am going to let you down on this one. I wasn’t able to find any way to achieve that, except for counting how many rows fit in one page and then render the header. If rows have different heights, you are probably out of luck.

While we are at the table, I believe it is also worth mentioning the library react-pdf-table. I haven’t really used it, but from a quick look at the source, it looks like all their components are wrappers to Text and View similar as my example. One issue with this library I had, was breaking of cells in the middle, which I couldn’t find a way to configure.

Header/footer

To display header/footer in our document, we need to create a View element and set it to fixed.  This means that our component will be rendered on every page. Now it is only a matter of proper styling (we can use absolute positioning).

If we need to display page number or total page count, we can pass the render function to <Text> or <View>, which will receive those two values as parameters.

Example of footer with page number:

We can also use pageNumber to apply styles depending on which page we are at (left/right margins depending on odd/even numbers come to mind). Just one thing to mention: for pageNumber to work correctly, the outer Page component needs to have the wrap property set to true.

Custom code

React-pdf supports Canvas, that can be used to draw. Unfortunately, I can’t quite imagine how it would work with other charting libraries (implementing own adapters would be anything but worryless). Also, since we cannot use standard html tags, I wasn’t able to combine it with Google maps. 

Conclusion

React-pdf is an easy to use library that avoids conversion between HTML and PDF and generates documents directly in PDF format using a predefined set of components (so only limited reuse of the front-end). 

Might be useful for simple documents that don’t need features that would require other libraries. Also templates written for react-pdf cannot be used by any other library, so replacing it would lead to rewriting all existing templates.

Nice features are the possibility to add metadata to the final document, and client-side rendering, which can be useful in SPA applications (e.g. for live preview).

Puppeteer

Second from my web-like solution is using PUPPETEER. As most of you already know, Puppeteer is a Node library providing API to control Chrome (paraphrasing documentation).  So basically browser you can run from Node JS. Most developers are probably familiar with this library, since one of its main uses is  automated testing of web application, but it can also be used to convert page to PDF.

Introduction

Let’s start with same document as previously, only this time we will use HTML.

Now, obviously, we cannot run puppeteer inside browsers (or…can we?). So let’s add code to our server endpoint.

So, what did we do?

  1. we launched browser instance
  2. we opened new empty page
  3. we set our html as content of the page
  4. created pdf and save to file
  5. closed browser instance

Simple as that. 

Also with just one small change, we can generate pdf from any accessible web page:

We can also add css stylesheets or javascript files.

Now let’s take a look at what PDF features puppeteer supports.

Tables

Adding tables using puppeteer is really non-issue. Just create the correct HTML table and that’s it. To have table header on every page, set proper header of table (<thead>) and puppeteer will add table header on every page. Puppeteer seems to avoid breaking cells at the end of the page out of the box.

Header / footer

Setting header and footer templates is done by sending parameters headerTemplate / footerTemplate to pdf() function. Just don’t forget to set property displayHeaderFooter to true. It is possible to display current page (or other from allowed values), just by creating an element with the correct class. For example our footer will display current page / total pages count.

Custom code

In my opinion, this is one of the greatest features of puppeteer: since we have a full JS engine at our disposal, we can run anything that the browser can run. For example here I used Google Maps to render a map of Bratislava.

Sometimes happens that puppeteer renders the page before all resources are loaded, for example before all map images we downloaded. 

We can fix that by

Now browser instance will wait until all network requests are finished and only then will start the rendering process (we can also wait for other events). 

Conclusion

Using puppeteer is a great (and free) way to convert html to pdf documents. Greatest strength is that we can use the full power of javascript, so including charts, maps or anything else should go without problems. 

Disadvantages includes lack of support for document metadata, encrypting documents or creating editable forms. 

Performance might be an issue, depending on application. Unfortunately I don’t have any benchmarks on hand, but chrome is stereotypically resource hungry and might kill the server in high load pdf rendering scenarios. 

Also usage in cloud, for example AWS Lambda Function, seems to be non-trivial.

Nice thing is, since we are using html to define content, switching to another html-pdf solution should be easy, which makes puppeteer ideal during the beginning phase of projects, when all requirements might not be fully defined.

 

Related Post

Leave a Comment

© 2021 Instea, s.r.o. All rights reserved. Privacy policy

Contact us

Where to find us