Originally published: 14/05/2019 10:03
Publication number: ELQ-92419-1

Batch PDF Data Extraction

Extract text and data from multiple PDF files into structured tabular format based on start and end pattern matching.

by Business Spreadsheets
Purpose built Excel solutins for business and finanacial decision making

12 reviews1,889 views|Start the discussion!|download for free

pdf vba automated pdf extraction bulk data extraction text conversion batch structured data pdf to text

Description
The bulk extraction of PDF information is designed to compile and consolidate multiple data points from many similarly structured PDF files in one process. Examples such as application forms, bank statements and survey data can result in many individual PDF files with the need to extract specific data from each.

The batch extraction works by specifying multiple rules for the text surrounding the content required to be extracted. Rule options include wild cards and line feeds. Results are structured in tabular form with the name of each file in rows and content extractions in columns.

The extraction tool uses and relies on a provided executable file which transforms the PDF into a plain text file. This file, as well as the extraction tool needs to be placed in the same file directory as the PDF files from which extraction is required. The process analyzes the folder and processes every PDF file that exists in it.

Additional options include to retain generated text files and append results to existing ones for iterative extraction routines.

The resulting output can be cleared if results are not as expected in order to modify the extraction rules accordingly. The VBA code is open for viewing and modification if required.

This Best Practice includes
1 Excel file, 1 converter executable, 6 demo testing files

Business Spreadsheets offers you this Best Practice for free!

download for free

Add to bookmarks

Discuss

Further information

Objectives

Extract multiple data streams from similarly structured PDF files into a structured table

Use it if

Many PDF files require the same data to be extracted for subsequent use and analysis

Don't use it if

Scanned PDF files or PDF files that are not able to be converted to plain text.

Batch PDF Data Extraction

Further information

Objectives

Use it if

Don't use it if

%product_add_cart_title%

Login

Create an account

Are you using this Best Practice for...

Message

Certificate of publication date

Add to your library to review

Add to cart to continue reading

Add to cart to view the video

Please sign-up to download this free Best Practice