Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PULP-214] Added docs about on-demand content streaming caveats #6101

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGES/1975.doc
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Added docs about on-demand content limitations and caveats.
1 change: 1 addition & 0 deletions CHANGES/3212.doc
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Added docs about on-demand content limitations and caveats.
78 changes: 69 additions & 9 deletions docs/user/learn/on-demand-downloading.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,14 @@
# On-Demand Downloading
# On-Demand Download/Sync

## Overview

Pulp can sync content in a few modes: 'immediate', 'on_demand', and 'streamed'. Each provides a
Pulp can sync content in a few modes: `immediate`, `on_demand`, and `streamed`. Each provides a
different behavior on how and when Pulp acquires content. These are set as the `policy` attribute
of the `Remote` performing the sync. Policy is an optional parameter and defaults to
`immediate`.

## Sync Modes

### immediate

When performing the sync, download all `Artifacts` now. Also download all metadata
Expand Down Expand Up @@ -39,25 +41,83 @@ instance, syncing from a nightly repo would cause Pulp to store every nightly ev
is likely not valuable. Units created from this mode are
`on-demand content units<on-demand content>`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Also download all metadata is likely not valuable" is nearly incomprehensible

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Also download all metadata" and "is likely not valuable" are 17 lines apart 😅


## Does Plugin X Support 'on_demand' or 'streamed'?
## Plugin support for on-demand/streamed

Unless a plugin has enabled either the 'on_demand' or 'streamed' values for the `policy` attribute
you will receive an error. Check that plugin's documentation also.

Example of the "Create Remote" endpoints for some plugins that supports these features:

* [pulp-rpm](https://pulpproject.org/pulp_rpm/restapi/#tag/Remotes:-Rpm/operation/remotes_rpm_rpm_create)
* [pulp-container](https://pulpproject.org/pulp_container/restapi/#tag/Remotes:-Container/operation/remotes_container_container_create)

!!! note
Want to add on-demand support to your plugin? See the
[On-Demand Support](site:pulpcore/docs/dev/learn/other/on-demand-support/)
documentation for more details on how to add on-demand support to a plugin.


## Associating On-Demand Content with Additional Repository Versions
## On-Demand Content and Repository Versions

An `on-demand content unit` can be associated and unassociated from a `repository version` just like a normal unit. Note that the original `Remote` will be used to download content should a client request it, even as that content is
made available in multiple places.

!!! warning
Deleting a `Remote` that was used in a sync with either the `on_demand` or `streamed`
options can break published data. Specifically, clients who want to fetch content that a
`Remote` was providing access to would begin to 404. Recreating a `Remote` and
re-triggering a sync will cause these broken units to recover again.
!!! warning "Deleting a Remote"
Learn about the dangers of [deleting a Remote](#remote-deletion-and-content-sharing) in the context of on-demand content.

## On-Demand/Streamed limitations

On-demand/streamed content can be very useful, but it comes with some caveats.

### External dependency and error handling

The content might become unavailable or corrupted on the remote server.
This makes it hard for Pulp to provide an accurate error message.

Here are some scenarios involving remote failure:

* Unreachable
* Given all remote sources for the content are unavailable/corrupted
* When the user requests that content through a distribution
* Then it fails to deliver the content and it is effectively unreachable
* Reachable after failure(s)
* Given there is more than one remote for the content and at least one of them is good.
* When the user requests that content through a distribution
* Then some requests for the content might fail with close connection errors* and future requests will try the next ones, eventually reaching the good remote.
Even though the content might be reachable, the failures can be confusing.

Copy link
Contributor

@dralley dralley Dec 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this could be structured better. For instance, if all remote sources are "corrupted" then you'll get connection closed failures, but those aren't mentioned in the first example, only in the second example.

I also have a mild preference for normal paragraphs here I think? Like:

In the case where the selected remote source for the content is unreachable, then when a user requests that content through a distribution they will receive a 404 error message

In the case where the selected remote source for the content is corrupt, then when a user requests that content through a distribution they will receive a connection closed error <..expand..>

In the case where multiple remote sources exist, then Pulp will cycle through potential remote sources trying each of them in turn until a valid one is found. Users may therefore see failures <..expand..> before a request for the content is successful.

If all remote sources are found to be invalid <....>

Feel free to adjust or rewrite it differently. Making them bullet points is fine too

!!! note "* Why do we close the connection?"
The connection close happens because Pulp streams content directly from the remote.
If the content is bad (and we can only know that after streaming everything) we prefer to close the connection over finalizing a bad response.

Context: <https://github.com/pulp/pulpcore/issues/5012>.

### Remote deletion and content sharing

Deleting a `Remote` that was used in a sync with either the `on_demand` or `streamed`
options can break published data.

Specifically, clients who want to fetch content that a `Remote` was providing access to would begin to 404.
Recreating a `Remote` and re-triggering a sync will cause these broken units to recover again.

In the worst case, the Content is shared across multiple Repositories, and the Remote's removal
can invalidate all those repositories at once.

In either case, proceed with the deletion of a remote with great care.

Context: <https://github.com/pulp/pulpcore/issues/1975>.

### Implicit credential sharing within a domain

In the same domain, a request for on-demand content may use any available Remotes associated with that content,
regardless of which user created it.

An example:

* Given User A and User B both synced the same on-demand content from their separate remotes (there are two different sources for the same content).
* When User B requests the content
* Then the credentials used for the download could potentially be User A's.

If a user doesn't want their registered Remotes to be indirectly used by other users, they should use a separate domain.

Context: <https://github.com/pulp/pulpcore/issues/3212>.
Loading