Generate PDF from React components
Posted on 2021-09-26 in Programmation
How do you generate a PDF from any React components? I am not talking about just allowing your users to print a page into a PDF with their browser, I am talking about transforming a page as PDF programmatically and saving it server side. The idea is for our user to see what the page looked like before moving forward in the application and this page becoming unaccessible.
Note
The final solution I describe, since it relies on a browser can be applied to any other framework like Angular or Aurelia or Vue.
There are several options to generate PDFs from React:
- @react-pdf/renderer to generate beautiful PDF with dedicated components and only from its dedicated components. It's the best solution if you want a beautiful PDF and have the time/need for a dedicated template.
- jsPDF with its html method which can transform any HTML element into a PDF. However, from experience, you need to adjust the scale of the page so it can if into the PDF and you can have render issues (with unbreakable spaces for instance).
- A combo based on html2canvas and jsPDF to transform any HTML element into an image which you can then integrate into a PDF document. The main problem being: it's an image (so text cannot be selected), you cannot easily split it across multiple pages and if you try to put it into a very long page, it won't print properly. It's also quite slow.
All these libraries work client side. @react-pdf/renderer is also compatible with NodeJS so it can run server side. So, where to render the PDF, client side or server side?
Rendering it client side is appealing: no need for a dedicated service and you can leverage the browser to render the components. But, you are dependent on the browser of the user (the screen size will impact how the document is rendered and how it will be printed) and you must trust your user not to change anything.
Rendering it server side is more complex: you need a service to do it, but you render the document in an environment our control. You inject the data you want and render it the way you want. The service itself is complex: you need to launch a browser to render the components.
In our use case, we wanted to be sure the PDF was generated with our values without influences from media queries. So this meant rendering it server side. Since we couldn't use @react-pdf/renderer (we wanted to render arbitrary components), what could we do? Well, we decided to create a NodeJS service that runs a headless Chrome managed by Puppeteer to render the components. We then use the printing functions of Puppeteer and some CSS rules to print the page into a PDF with margins and a custom footer. The result isn't perfect: just like when you print a page with the browser, you can have page break in middle of words, but it's a fairly good compromise between time and quality.
Here is how we did it:
We created a custom index-pdf-server.ts file to contain a dedicated entry point for the app. It configures our MaterialUI theme, our translations and selects which components to render on the page. It is also responsible for getting the data. To avoid roundtrips, we inject the data into the page with Puppeteer. It works like that:
- We launch Puppeteer.
- It loads the React app.
- We inject data into a dedicated element in the page with Puppeteer.
- The React app loads it and renders.
- We wait for the page to render in Puppeteer.
- We transform the page into a PDF.
The gist of this index is this (imports omitted):
1 const rootEl = document.getElementById('root'); 2 3 const Component: React.FC = () => { 4 const intl = useIntl(); 5 6 const [reactData, setReactData] = useState<ReactData>(); 7 8 useEffect(() => { 9 if (reactData) { 10 return; 11 } 12 13 const interval = setInterval(() => { 14 const reactDataElement = document.getElementById('react-data'); 15 if (reactDataElement && reactDataElement.textContent) { 16 setReactData(JSON.parse(reactDataElement.textContent)); 17 } 18 }, 100); 19 20 return () => clearInterval(interval); 21 }, [reactData]); 22 23 const page = useMemo(() => { 24 if (!reactData) { 25 return; 26 } 27 28 // Switch to get the component to render. 29 }, [intl, reactData]); 30 31 const canDisplayPage = useMemo(() => !reactData || !page, [reactData, page]); 32 33 if (canDisplayPage) { 34 return null; 35 } 36 37 return ( 38 <div> 39 <div>{page}</div> 40 {/* We will wait for this element to detect the end of render. */} 41 <div id="print" /> 42 </div> 43 ); 44 }; 45 46 if (rootEl) { 47 ReactDOM.render( 48 <ThemeProvider theme={theme}> 49 <RawIntlProvider value={intlData}> 50 <Component /> 51 </RawIntlProvider> 52 </ThemeProvider>, 53 rootEl, 54 ); 55 }
Our app uses react-app-rewired for app compilation. In order to use this index-pdf-server.ts file, we had to modify a bit our config-overrides.js file. In a nutshell, when we compile the app for the server, we set a SERVER environment variable to change the entry point and disable CSP and SRI. The most important part is this:
if (process.env.SERVER) { config.entry = config.entry.replace('index', 'index-pdf-server'); }
This allows us to reuse almost all our code and configuration while being able to compile an app dedicated to our server. This is done with SERVER=true env-cmd -f .env.${REACT_APP_ENV} react-app-rewired build && cp -R build pdf-server/.
We added some CSS code inside @media print media queries to improve the display and hide some buttons for print. As a nice side effect, if your users try to print a page, it will be more beautiful.
Astuce
You can use the break-after or break-before rules to force page breaks. For instance, break-after: page; will force a page break after this element.
We can then create a server/server.js file in our project to handle the backend side of things. It has two routes: one to server the built application and one to receive the PDF request with its data (and respond with the PDF). It looks like this:
1 const puppeteer = require('puppeteer'); 2 const express = require('express'); 3 const bodyParser = require('body-parser'); 4 const path = require('path'); 5 const winston = require('winston'); 6 const pdf = require('pdf-parse'); 7 8 // Allowed pages. 9 const allowedPages = []; 10 11 const port = process.env.PORT || 4000; 12 const logLevel = process.env.LOG_LEVEL || 'info'; 13 const waitForSelectorTimeout = process.env.WAIT_FOR_SELECTOR_TIMEOUT 14 ? parseInt(process.env.WAIT_FOR_SELECTOR_TIMEOUT) 15 : 15_000; 16 const pdfGenerationRetryCount = process.env.PDF_GENERATION_RETRY_COUNT 17 ? parseInt(process.env.PDF_GENERATION_RETRY_COUN) 18 : 3; 19 20 const logger = winston.createLogger({ 21 level: logLevel, 22 format: winston.format.json(), 23 defaultMeta: { service: 'frontend-pdf-server' }, 24 }); 25 26 if (process.env.NODE_ENV === 'production') { 27 logger.add( 28 new winston.transports.Console({ 29 format: winston.format.json(), 30 }), 31 ); 32 } else { 33 logger.add( 34 new winston.transports.Console({ 35 format: winston.format.simple(), 36 }), 37 ); 38 } 39 40 const app = express(); 41 app.use(bodyParser.urlencoded({ extended: true })); 42 app.use(bodyParser.json()); 43 44 const injectReactDataIntoPage = async (page, requestBody) => { 45 logger.debug('Injecting data into the page.'); 46 await page.evaluate(reactData => { 47 const node = document.createElement('script'); 48 node.setAttribute('type', 'application/json'); 49 node.setAttribute('id', 'react-data'); 50 node.innerText = JSON.stringify(reactData); 51 document.body.appendChild(node); 52 }, requestBody); 53 }; 54 55 const print = async (page, timestamp) => { 56 const margin = 30; 57 return await page.pdf({ 58 format: 'A4', 59 printBackground: true, 60 omitBackground: true, 61 margin: { top: margin, bottom: margin, left: margin, right: margin }, 62 displayHeaderFooter: true, 63 footerTemplate: `<p style="font-size: 2mm; position: absolute; right: 50%; transform: translateX(-50%)">${timestamp}</p>`, 64 headerTemplate: '', 65 }); 66 }; 67 68 const checkPDF = async (pdfBuffer, timestamp) => { 69 const data = await pdf(pdfBuffer); 70 // Do we have text beside the timestamp in the footer? If yes, it's good. 71 // Otherwise, the PDF is invalid (we generated it before render completed). 72 return data.text.replace(timestamp, '').trim().length > 100; 73 }; 74 75 const printAndCheckPDF = async (page, timestamp) => { 76 let isValid = false; 77 let retryCount = 0; 78 let pdfBuffer = null; 79 80 // Most of the time, waiting for this element is enough for the page to render correctly and for 81 // us to get a proper PDF. Once in a while, no text is rendered and the PDF is empty. So we always 82 // check the generated PDF. 83 await page.waitForSelector('#print', { timeout: waitForSelectorTimeout }); 84 do { 85 retryCount += 1; 86 // Wait a bit for the render before trying again (invalid PDF) or to be sure to 87 // get a full render (first try). 88 await page.waitForTimeout(retryCount * retryCount * 100); 89 pdfBuffer = await print(page, timestamp); 90 isValid = await checkPDF(pdfBuffer, timestamp); 91 } while (!isValid && retryCount < pdfGenerationRetryCount); 92 93 if (!isValid) { 94 throw new Error(`Failed to generate PDF, even after ${retryCount} tries.`); 95 } 96 97 return pdfBuffer; 98 }; 99 100 /** 101 * Generate a PDF from React frontend components with a Chrome headless 102 * managed by puppeteer. 103 */ 104 app.post('/react-to-pdf', async (req, res) => { 105 if (!req.body.page || !allowedPages.includes(req.body.page) || !req.body.pdfData) { 106 res.writeHead(400); 107 res.end(); 108 return; 109 } 110 111 const browserConsoleMessages = []; 112 try { 113 const browser = await puppeteer.launch({ 114 headless: true, 115 dumpio: true, 116 args: [ 117 '--disable-gpu', 118 '--disable-dev-shm-usage', 119 '--disable-setuid-sandbox', 120 '--no-sandbox', 121 '--disable-software-rasterizer', 122 ], 123 }); 124 const page = await browser.newPage(); 125 await page.setViewport({ width: 1980, height: 768 }); 126 // Force language for translation and number formatting. 127 await page.evaluateOnNewDocument(() => { 128 Object.defineProperty(navigator, 'language', { 129 get: function() { 130 return 'fr-FR'; 131 }, 132 }); 133 Object.defineProperty(navigator, 'languages', { 134 get: function() { 135 return ['fr-FR', 'fr']; 136 }, 137 }); 138 }); 139 await page.goto(`http://localhost:${port}`); 140 page.on('console', message => { 141 browserConsoleMessages.push(message.text()); 142 }); 143 144 logger.debug('Handling print request', req.body.page); 145 146 await injectReactDataIntoPage(page, req.body); 147 const pdfBuffer = await printAndCheckPDF(page, req.body.timestamp); 148 149 logger.debug('Sending response.'); 150 res.writeHead(200, { 151 'Content-Type': 'application/pdf', 152 'Content-Length': pdfBuffer.length, 153 }); 154 res.end(pdfBuffer); 155 156 await page.close(); 157 await browser.close(); 158 } catch (e) { 159 logger.error(e); 160 logger.error(browserConsoleMessages.join('\n')); 161 res.writeHead(500); 162 res.end(); 163 throw e; 164 } 165 logger.debug('Print succeeded.'); 166 }); 167 168 app.get('/', async (req, res) => { 169 res.sendFile(path.join(__dirname, 'build/index.html')); 170 }); 171 172 app.get('/health', (req, res) => { 173 res.writeHead(200); 174 res.end(); 175 }); 176 177 app.use(express.static('public')); 178 app.use(express.static('build')); 179 180 logger.info(`Listening on port ${port}`); 181 app.listen(port);
To deploy this service with Docker, you need to install several libraries as well as run the service as a user other than root. This Dockerfile can help you get started:
1 FROM node:14-slim AS builder 2 WORKDIR /app 3 4 RUN apt-get update && \ 5 apt-get install -y python3 make gcc g++ openssl ca-certificates 6 7 ARG REACT_APP_ENV=undefined 8 ARG REACT_APP_WEBSITE_BASE_URL=undefined 9 ARG REACT_APP_COMMIT_SHA=undefined 10 ARG COMMIT_SHA=undefined 11 12 ENV REACT_APP_ENV=$REACT_APP_ENV 13 ENV REACT_APP_WEBSITE_BASE_URL=$REACT_APP_WEBSITE_BASE_URL 14 ENV REACT_APP_COMMIT_SHA=$REACT_APP_COMMIT_SHA 15 ENV COMMIT_SHA=$COMMIT_SHA 16 17 COPY . ./ 18 19 RUN yarn install --frozen-lockfile 20 RUN yarn build-pdf-server 21 22 23 # Run the pdf-server in node. 24 FROM node:14-slim AS runner 25 RUN mkdir -p /var/www/frontend-pdf-server/build/ 26 WORKDIR /var/www/frontend-pdf-server/ 27 28 RUN apt-get update && \ 29 apt-get upgrade -y && \ 30 apt-get install -y dumb-init \ 31 fonts-liberation \ 32 gconf-service \ 33 libappindicator1 \ 34 libasound2 \ 35 libatk1.0-0 \ 36 libcairo2 \ 37 libcups2 \ 38 libfontconfig1 \ 39 libgbm-dev \ 40 libgdk-pixbuf2.0-0 \ 41 libgtk-3-0 \ 42 libicu-dev \ 43 libjpeg-dev \ 44 libnspr4 \ 45 libnss3 \ 46 libpango-1.0-0 \ 47 libpangocairo-1.0-0 \ 48 libpng-dev \ 49 libx11-6 \ 50 libx11-xcb1 \ 51 libxcb1 \ 52 libxcomposite1 \ 53 libxcursor1 \ 54 libxdamage1 \ 55 libxext6 \ 56 libxfixes3 \ 57 libxi6 \ 58 libxrandr2 \ 59 libxrender1 \ 60 libxss1 \ 61 libxtst6 \ 62 xdg-utils && \ 63 apt-get clean 64 65 RUN groupadd --gid 1001 noderunner && \ 66 useradd noderunner --create-home --uid 1001 --gid 1001 67 68 COPY --from=builder /app/build /var/www/frontend-pdf-server/build/ 69 COPY --from=builder /app/pdf-server/ /var/www/frontend-pdf-server/ 70 71 RUN yarn install --frozen-lockfile 72 73 USER noderunner 74 ENTRYPOINT ["/usr/bin/dumb-init", "--"] 75 CMD ["node", "server.js"]
As a bonus, if you also need to generate documents from a @react-pdf/renderer template, I suggest that you create the template in a JSX file and then import it with the import-jsx library like this:
const importJsx = require('import-jsx'); const { renderToStream } = require('@react-pdf/renderer'); const Document = importJsx('./documents/my-document'); const documentsToRenderFunctions = { MyDocument: Document, }; const allowedDocuments = Array.from(Object.keys(documentsToRenderFunctions)); /** * Generate documents based on dedicated React components from react-pdf/renderer. * * All the rendering is done in the NodeJS process. */ app.post('/document', async (req, res) => { if (!req.body.document || !allowedDocuments.includes(req.body.document) || !req.body.pdfData) { res.writeHead(400); res.end(); return; } logger.debug('Handling document generation request', req.body.document); try { const pdfStream = await renderToStream( documentsToRenderFunctions[req.body.document]({ ...req.body.pdfData, timestamp: req.body.timestamp, }), ); res.setHeader('Content-Type', 'application/pdf'); pdfStream.pipe(res); pdfStream.on('end', () => logger.debug('Done streaming, PDF generation succeeded.')); } catch (e) { logger.error(e); res.writeHead(500); res.end(); throw e; } });
To conclude, it's not as obvious as it seems. I even wander after all the time I spent to create and stabilize this service if it wouldn't have been shorter to just create a proper template to render the PDFs. Maybe yes, maybe no. Our product team really wanted to have almost the same display in the PDF as in the page, so for them it was best this way. I also think the way I inject data is not optimum: I could try to use a template and render a page that already has the data in it instead of injecting it and relying on a setInterval to read it. But this would involve more work to make it work without interfering with the build process.
It's also hard to detect when React is done redering the page. My method is probably not the best (wait for an element, then wait a bit for a timeout, then render the page and check we have text in the PDF), but it seems to work.
I'd say, despite all the issue this solution can have, right now it serves its purpose and allow us to move forward. I hope you enjoyed this post and if you have any comments or remarks, please leave a comment!