Skip to content
This repository has been archived by the owner on Dec 11, 2021. It is now read-only.

Commit

Permalink
Handle empty PDF files.
Browse files Browse the repository at this point in the history
  • Loading branch information
Andrew Ferrier committed Sep 6, 2020
1 parent 5a805b7 commit 49ad58d
Showing 1 changed file with 10 additions and 3 deletions.
13 changes: 10 additions & 3 deletions tests/BaseTestClasses.py
Original file line number Diff line number Diff line change
Expand Up @@ -367,9 +367,16 @@ def getMetadataField(self, pdf_filename, field_name):
return None

def getPDFText(self, filename):
text = pdfminer.high_level.extract_text(filename)
text = text.replace("\t", " ")
return text
if os.path.exists(filename):
try:
text = pdfminer.high_level.extract_text(filename)
except pdfminer.pdfparser.PDFSyntaxError:
return None

text = text.replace("\t", " ")
return text
else:
return None

def touch(self, fname):
open(fname, 'w').close()
Expand Down

0 comments on commit 49ad58d

Please sign in to comment.