node-red-contrib-pdfjs 0.0.2

Node-RED node to extract pdf data using pdfjs

npm install node-red-contrib-pdfjs

A node to help extract text contents from a pdf. Uses the Mozilla library found at https://github.com/mozilla/pdfjs-dist to extract text data.

Inputs

payload

Either a buffer object that corresponds to a pdf file or a filepath leading to a pdf file to be decoded.

Config

filename

If a file path/object is not provided in the payload, A file path to a pdf file should be provided here to be decoded.

Order text

Check this option to force the text to be ordered top down using the y value if 'from top to bottom' is selected, or ordered left to right by it's x value if 'from left to right' is selected. If both options are selected, it will order from top to bottom, then left to right.

Merge text with next text

When inserting text into output payload array, if the previous text inserted has the same x value (are in the same column), or same y value (are in the same row), it will concatenate the string to be inserted with the previous string with a space instead.

Outputs

payload

Results of the parsing will be returned as an array with each element in the array corresponding to a page in the pdf. Each page in the array is stored as an array of objects which can be seen below.

[
	{
		"p": 1, // order on the page
		"x": 328.78, // distance away from the right edge
		"y": 1175.676, // distance away from the bottom of the page
		"t": "Survey Responses 1/02/19 - 31/04/19" // text content
	},
	{
		"p": 2, 
		"x": 428.78, 
		"y": 1175.676, 
		"t": "Survey Responses 1/05/19 - 31/07/19"
	}
]

Node Info

Version: 0.0.2

Updated 5 years, 10 months ago

License: ISC

Rating: 3.0 2

View on npm

View on GitHub

Actions

Downloads

11 in the last week

Nodes

Keywords

node-red
pdf

Maintainers

redbackspider0

Node-RED: Low-code programming for event-driven applications.

Copyright OpenJS Foundation and Node-RED contributors. All rights reserved. The OpenJS Foundation has registered trademarks and uses trademarks. For a list of trademarks of the OpenJS Foundation, please see our Trademark Policy and Trademark List. Trademarks and logos not indicated on the list of OpenJS Foundation trademarks are trademarks™ or registered® trademarks of their respective holders. Use of them does not imply any affiliation with or endorsement by them.

node-red-contrib-pdfjs 0.0.2

Inputs

payload

Config

filename

Order text

Merge text with next text

Outputs

payload

Node Info

Categories

Actions

Downloads

Nodes

Keywords

Maintainers

Report this module