-
Notifications
You must be signed in to change notification settings - Fork 201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add subcommand for searching nf-core test-datasets #3487
Comments
Hi there, neat idea. I'll get on it. A few things to clarify up front. @jfy133 Is there a curated text-format list of datasets other than this list of github branches that is referred over on the test-dataset repo? @mirpedrol is it ok to fetch a list here from somewhere or do we want to maintain a static list in the code? |
I think there is two levels that we would need to work on:
In terms of curated-lists, not really. For modules there is is the data descriptions: https://github.com/nf-core/test-datasets/tree/modules?tab=readme-ov-file#data-description but it is likely missing a lot For each pipeline branch, it depends on the pipeline how well documented it is. The idea in my head in the original comment is simply just to search/display the file path and name of each file in the modules branch - no additional information about each file. I also didn't consider the branch (but I think it does make sense). In any case the modules branch is quite well structured/sorted so just by the directories in the file path that should be sufficient for identifying relevant files. Does that sort of help? |
It is ok to fetch a list from the repo 🙂 As @jfy133 says, I would list the file paths for now, we can think about extending this later |
I don't think the list of pipelines alone is helpful for we are trying to achieve here @mirpedrol or am I missing something? If I understand @jfy133 correctly, he wants a list of all files in all branches and then be able to search through that list.
I see two solutions that are feasible:
Please let me know if I am missing something obvious ;) |
My idea was to use the list of pipelines that we have in the JSON I sent to know the branch names + Regarding autocompletion, I wouldn't allow this for now. |
Thinking about this again, with the modules repo, we can clone the repo to our cache. We could do something similar here. But I see this is turning into a bigger project now 😄 |
As Júlia says, the list of pipeline names is indeed useful - we aren't so strict on the modules repo in terms of branches so there is a lot of potential 'junk'. @mirpedrol is there a reason why you don't like the autocompletion? I would find this extremely helpful - if I can't remember the exact, having to re-run the command every time to search a new term would be quite annoying and put someone off. Pulling teh filetree once, and being able to rapidly explore it would be helpful. Is it a technical reason? One way to get around this is could be just allowing So I think we are going towards something like:
|
It was more of a practical concern. Having to parse all branches + all files to get the complete list for autocompletion sounded like too much. But if we use the list of possible branches and fetch files from that branch only it reduces things considerably 👍 |
Oh yes definitely! My original proposal only had modules branch in mind the first time 😅 |
Alright! Feel free to check it out on my fork and let me know if you have any sugestions: https://github.com/JulianFlesch/nf-core-tools/tree/feature/test-datasets. I'll be wrap this up (hopefully by the end of the week) with tests for the new functions and classes and then open a pull request. Due to the concerns mentioned above, I dropped the autocompletion completely in favor of the Important: To work properly, github Authentication by means of the New subcommands: $ python nf_core/__main__.py test-datasets --help
,--./,-.
___ __ __ __ ___ /,-._.--~\
|\ | |__ __ / ` / \ |__) |__ } {
| \| | \__, \__/ | \ |___ \`-._,-`-,
`._,._,'
nf-core/tools version 3.2.0 - https://nf-co.re
Usage: __main__.py test-datasets [OPTIONS] COMMAND [ARGS]...
Commands to manage nf-core test datasets.
╭─ Commands ───────────────────────────────────────────────────────────────────────────────────────╮
│ list List all data files available in the nf-core/test-datasets repository on github │
│ list-branches List all remote branches in the nf-core/test-dataset repository on github │
│ search Search for files in the nf-core/test-datasets repository on github │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Options ────────────────────────────────────────────────────────────────────────────────────────╮
│ --help -h Show this message and exit. │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ Example Search: $ python nf_core/__main__.py test-datasets search -ib modules sarscov2/genome/bed
,--./,-.
___ __ __ __ ___ /,-._.--~\
|\ | |__ __ / ` / \ |__) |__ } {
| \| | \__, \__/ | \ |___ \`-._,-`-,
`._,._,'
nf-core/tools version 3.2.0 - https://nf-co.re
(Branch: modules) data/genomics/sarscov2/genome/bed/baits.bed
(Branch: modules) data/genomics/sarscov2/genome/bed/bed6alt.as
(Branch: modules) data/genomics/sarscov2/genome/bed/test.bed
(Branch: modules) data/genomics/sarscov2/genome/bed/test.bed.gz
(Branch: modules) data/genomics/sarscov2/genome/bed/test.bed12
(Branch: modules) data/genomics/sarscov2/genome/bed/test.bedpe
(Branch: modules) data/genomics/sarscov2/genome/bed/test2.bed |
please open the PR, so it is easier to check the what has been done 🙂 can be set as draft and labeled as WIP to make sure it is not merged pre-maturely. |
Description of feature
When writing modules or pipelines, I often have problems remembering the exact URLs/location of test datafiles on the nf-core/test-datasets repository. I normally have to resort to going to GitHub in my browser and navigating to the write directory which takes a lot of time.
It would be cool to have a
nf-core
subcommand that allows you to 'search' with clever autocomplete prompts for the exact path, and ideally print the line as you would want to copy/paste into your test.e.g.
nf-core test-dataset search <start typing keywords for autocomplete>
nf-core test-dataset search sarscov2/genome
And spits out:
The text was updated successfully, but these errors were encountered: