Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merging documents #54

Closed
dan-zeman opened this issue Apr 5, 2016 · 4 comments
Closed

Merging documents #54

dan-zeman opened this issue Apr 5, 2016 · 4 comments

Comments

@dan-zeman
Copy link
Member

Do we have a block that merges two documents? I have two files, which (should) have identical number of bundles, and only a single zone in each bundle (but the problem could be generalized to multiple zones). I want to read both documents into one so that each bundle has two zones, one from original document 1 and the other from original document 2.

I cannot solve this by putting two Read blocks in one scenario (with different zone selectors) because only one reader per scenario is allowed.

@tuetschek
Copy link
Member

As far as I know, we don't... I once also needed an AlignedReader for Treex files, but I had no time to implement it (and no desire to delve into the depths of Treex::PML).

@martinpopel
Copy link
Member

If the two documents contain only a-trees, it is possible to export them to CoNLL format and then use Read::AlignedCoNLL, e.g. with parameters en_selector1='!en1*.conll' en_selector2='!en2*.conll'. The problem with merging two treex files is that they may contain duplicate node IDs, so the aligned treex reader (or rather block AddTreexFile) would need to handle this (rename all IDs, including all links).

@dan-zeman
Copy link
Member Author

Thanks, Read::AlignedCoNLL works for me (after adjusting it to handle the 2006 flavor of the format). Closing the issue.

@tuetschek
Copy link
Member

Just FYI, I just stumbled upon the block Misc::AddZonesFromFile, which is basically what we need... but I don't know if it handles duplicate IDs properly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants