Unable to read PDF using tabula due to column difference

Unable to read PDF using tabula due to column difference
0

I am working on a project to extract PDF data to Excel which are in tabular format. Firstly I used a code which merged all the PDF to single PDF and then tried extracting tables using tabula package. But I am getting an error.

I feel the error is because of the numbers of columns. Maybe some have 8 columns and some tables with 9 columns.

Firstly I used a code which merged all the PDF to single PDF and then tried extracting tables using tabula package.

import os
from PyPDF2 import PdfFileMerger

folder = 'C:/Users/User.LAPTOP-2TC2V5HI/Documents/WOD PDF/'
x = [folder + fn for fn in os.listdir(folder) if fn.endswith('.pdf')]

# folder = 'C:/Users/User.LAPTOP-2TC2V5HI/Documents/WOD PDF/'

# x = [a for a in os.listdir(folder) if a.endswith(".pdf")]

merger = PdfFileMerger()

for pdf in x:
    merger.append(open(pdf, 'rb'))

with open("result.pdf", "wb") as fout:
    merger.write(fout)

I used the below code:

from tabula import read_pdf
from tabulate import tabulate

df = read_pdf('result.pdf', pages='all', mulitple_tables=True, names = ('col1','col2','col3','col4','col5','col6','col7','col8','col9'), error_bad_lines=False)
df

but getting this error:

'CSVParseError: Error failed to create DataFrame with different column tables. Try to set `multiple_tables=True` or set `names` option for `pandas_options`.  , caused by ParserError('Error tokenizing data. C error: Expected 8 fields in line 169, saw 9\n',)'

Please someone help.

Regards,
Raj