Replies: 4 comments 8 replies
-
Can you clarify what data you need, exactly? |
Beta Was this translation helpful? Give feedback.
-
Also, have you investigated using the index, either the git repo or the HTTP sparse index? |
Beta Was this translation helpful? Give feedback.
-
So one thing I'm not understanding is why do you need to fetch all the crates all the time? You can sort by recent updates https://crates.io/crates?sort=recent-updates and then stop requesting pages when you see updates you've already seen? 1400 * 100 crates are not being updated every hour. |
Beta Was this translation helpful? Give feedback.
-
Unfortunately, setting up a new data feed also requires development and maintenance, and given that this data is available in at least two ways (the Git index and database dumps), we don't believe that this complexity is warranted within the crates.io service itself. docs.rs uses the |
Beta Was this translation helpful? Give feedback.
-
Hi, I'm author of https://repology.org, a service which aggregates package repository data and reports new versions to package maintainers. It uses crates.io data as one of its sources, and it's important because more and more end-user software written in Rust is packaged in repositories. However, a problem with getting data from crates.io arised.
Historically, crates data was fetched through API, in a batches of 100, with 1 sec delay between fetches and custom user agent pointing to repology.org (and I think that fulfills api access requirements), however that was broken recently, with crates.io closing connection after a few pages (example failed fetch log). Honestly I wasn't happy with the approach as it took a long time (to fetch 1400+ pages!) and was rather unreliable.
Another option would be to switch to database dumps, but it's too heavy to be processed by repology
So, is there a possibility of solution free from disadvantages of these two, which could be a static json dump generated ~hourly and containing the same info which API provides? It would require both less CPU time to generate and process and less traffic to transfer for both crates.io and clients, while providing decent freshness and completeness of the data.
Beta Was this translation helpful? Give feedback.
All reactions