-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature request: Custom output filename #193
Comments
Option 1 is a bit tricky, since I'd be happy to review a PR for Option 2 - currently there's a pretty-print argument in the CLI that uses Lines 143 to 146 in 8dfc42a
|
If monopoly can differentiate between single-file input vs directory then the first option is possible. The flag would be considered invalid if the input is a directory. For option 2, the challenge will be in differentiating multiple files. Since my main purpose of doing this was to pipe the output into other tools. Perhaps a new column could be add with the filename. However, that means every row would have the same value for that column. Example:
|
After some thought, I think it might actually be easier to run the library directly (or create some kind of wrapper over it) to give the level of control you're looking for. Would something like this work? import csv
from pathlib import Path
from monopoly.banks import BankDetector, banks
from monopoly.pdf import PdfDocument
from monopoly.pipeline import PdfParser, Pipeline
def generate_parser(file_path: str):
"""Generates a parser using the input file path."""
document = PdfDocument(file_path).unlock_document()
analyzer = BankDetector(document)
bank = analyzer.detect_bank(banks)
parser = PdfParser(bank, document)
return parser
def main():
# alternatively, you could also use glob here to grab multiple statements at once
input_file = "statements/dbs/dbs-2024-10.pdf"
output_directory = Path("output")
output_directory.mkdir(parents=True, exist_ok=True)
output_path = output_directory / "my-statement.csv"
parser = generate_parser(input_file)
pipeline = Pipeline(parser)
statement = pipeline.extract()
transactions = pipeline.transform(statement)
print(f"Writing CSV to file path: {output_path}")
with open(output_path, mode="w", encoding="utf8", newline='') as file:
writer = csv.writer(file)
# Write header
writer.writerow(statement.columns)
for transaction in transactions:
writer.writerow([transaction.date, transaction.description, transaction.amount])
if __name__ == "__main__":
main() |
Unfortunately CSV is purely suited for handling metadata, I think a better solution here would be supporting some kind of JSON output with a metadata field - I think this would then be relatively easy to read/parse with something like |
It would be great to have some way to have a deterministic filename for the output CSV file. This would help me automate my transaction import process.
I can think of 2 ways to do this:
stdout
(via CLI flag). This provides more flexibility as it can be piped into other apps & tools.I'd be happy to contribute code if there is interest.
The text was updated successfully, but these errors were encountered: