Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Custom output filename #193

Open
siddhantac opened this issue Feb 9, 2025 · 4 comments
Open

Feature request: Custom output filename #193

siddhantac opened this issue Feb 9, 2025 · 4 comments

Comments

@siddhantac
Copy link

It would be great to have some way to have a deterministic filename for the output CSV file. This would help me automate my transaction import process.

I can think of 2 ways to do this:

  1. specify an output file name (via CLI flag). If missing, monopoly can default to the generated filename that it currently uses.
  2. dump CSV output to stdout (via CLI flag). This provides more flexibility as it can be piped into other apps & tools.

I'd be happy to contribute code if there is interest.

@siddhantac siddhantac changed the title Custom output filename Feature request: Custom output filename Feb 9, 2025
@benjamin-awd
Copy link
Owner

benjamin-awd commented Feb 18, 2025

Option 1 is a bit tricky, since monopoly is built to support multi-file/directory input

I'd be happy to review a PR for Option 2 - currently there's a pretty-print argument in the CLI that uses tabulate to generate a neatly formatted table, but I think it could be extended to optionally dump to CSV using the default csv module, since tabulate doesn't support it.

if print_df:
pprint_transactions(transactions, statement, file)
# don't load to CSV if pprint
return None

@siddhantac
Copy link
Author

If monopoly can differentiate between single-file input vs directory then the first option is possible. The flag would be considered invalid if the input is a directory.

For option 2, the challenge will be in differentiating multiple files. Since my main purpose of doing this was to pipe the output into other tools.

Perhaps a new column could be add with the filename. However, that means every row would have the same value for that column.

Example:

date,description,amount,filename
2025-02-24,Food,$23,2025_01_dbs.csv
2025-02-13,Rent,$1000,2025_01_dbs.csv
2025-02-17,Gym,$150,2025_01_dbs.csv

@benjamin-awd
Copy link
Owner

benjamin-awd commented Feb 25, 2025

After some thought, I think it might actually be easier to run the library directly (or create some kind of wrapper over it) to give the level of control you're looking for. Would something like this work?

import csv
from pathlib import Path
from monopoly.banks import BankDetector, banks
from monopoly.pdf import PdfDocument
from monopoly.pipeline import PdfParser, Pipeline

def generate_parser(file_path: str):
    """Generates a parser using the input file path."""
    document = PdfDocument(file_path).unlock_document()
    analyzer = BankDetector(document)
    bank = analyzer.detect_bank(banks)
    parser = PdfParser(bank, document)
    return parser

def main():
    # alternatively, you could also use glob here to grab multiple statements at once
    input_file = "statements/dbs/dbs-2024-10.pdf"
    output_directory = Path("output")
    output_directory.mkdir(parents=True, exist_ok=True)
    output_path = output_directory / "my-statement.csv"
    
    parser = generate_parser(input_file)
    pipeline = Pipeline(parser)
    statement = pipeline.extract()
    transactions = pipeline.transform(statement)
    
    print(f"Writing CSV to file path: {output_path}")
    
    with open(output_path, mode="w", encoding="utf8", newline='') as file:
        writer = csv.writer(file)
        
        # Write header
        writer.writerow(statement.columns)
        
        for transaction in transactions:
            writer.writerow([transaction.date, transaction.description, transaction.amount])

if __name__ == "__main__":
    main()

@benjamin-awd
Copy link
Owner

Perhaps a new column could be add with the filename. However, that means every row would have the same value for that column

Unfortunately CSV is purely suited for handling metadata, I think a better solution here would be supporting some kind of JSON output with a metadata field - I think this would then be relatively easy to read/parse with something like jq

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants