-
Notifications
You must be signed in to change notification settings - Fork 111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gateway: CAR handler shouldn't return 200 & a CAR header if data is unavailable #458
Comments
This seems fine. It's some added complexity to give a status code error in one particular case where almost any other case would result in a streaming error. I suspect the same applies to the other functions where we return streamable content from the backend (e.g. I don't have a strong preference for if the added complexity shows up in the gateway or the backends as long as any modification to the contracts between the components is explicitly defined on the interface.
That's not what the spec indicates 404s are for https://specs.ipfs.tech/http-gateways/path-gateway/#404-not-found so I don't see boxo having that behavior unless there's a change to the spec. |
I'm not arguing for buffering, there shouldn't be any need to hold bytes up for what I'm suggesting—just don't start anything until you know you can continue with more than just a header.
It is in the Lassie case:
Because it is missing, we have no way of finding it. I'm not suggesting that the same be done here, just explaining what we're doing. |
iiuc the gist here is to delay sending HTTP status code until the first block is retrieved from the backend (which may be a remote HTTP server, like in Rhea/Saturn). Waiting for the first block and returning HTTP 5XX on error/missing sounds sensible, but
After all the Rhea refactors in bogo/gateway we have If we want a surgical fix, we would keep
nit: on IPFS gateways this produces 502 not 404. PL's Grafana for ipfs.io/Rhea also expects 502 not 404 for routing errors (semantically, the fact that HTTP server at cid.contact can't find providers does not make the content not exist on the planet). I've filled ipfs/specs#435 to fill the gap is gateway specs around this. |
Oops, seems like we needed more information for this issue, please comment with more details or this issue will be closed in 7 days. |
Background
Consider the straightforward case that a block is unfetchable by the block source—this is a somewhat common occurrence for Boost, where there may be a disconnect between advertised CIDs and those that are available unsealed. So requests may come in for CIDs that are supposed to be there, but the block fetcher returns an error that the block is unavailable.
The Boxo gateway code will return a 200, a set of valid headers, and a CARv1 header with the requested CID in the roots array, but nothing else, and termination of the stream is clean. There is no indication of a problem without parsing the CAR. Of course in the Trustless Gateway paradigm, it's up to the user to validate that the CAR contains the expected blocks, so from that perspective we have what we need to determine whether there's a problem or not. But this does present complications for debugging problems. In particular when debugging Rhea retrieval problems I need to have access to Boost logs on the server side to see what the problem might be, I have no indication from the outside that Boost even thinks that there's a problem.
Desired behaviour
While the spec doesn't cover this, here's what I think should happen and how we built the Lassie HTTP handler:
a. If we can't get any candidates from the indexer then we can do a
404
with the bodyno candidates found
b. Other failures are treated as a "gateway timeout",
504
, with the bodyfailed to fetch CID: <error message>
.Code exploration
handler#serveCar
starts setting headers immediately, with no opportunity to change course if there's a problem:boxo/gateway/handler_car.go
Line 56 in 1356946
BlocksBackend#GetCAR
to return aReader
for the CAR and simply does anio.Copy
of that data to the output body.BlocksBackend#GetCAR
immediately sets up a pipe and starts writing a CAR:boxo/gateway/blocks_backend.go
Lines 282 to 289 in 1356946
storage.NewWritable
with a root will immediately write a CARv1 header to theWriter
it's given.Hence with a valid trustless gateway
/ipfs
request, we will always get a valid CARv1 header with the root we request, regardless of whether the requested root CID is even fetchable.In Lassie, we deal with this in two ways, both encapsulated in
DeferredWriter
which is compatible withgithub.com/ipld/go-ipld-prime/storage/WritableStorage
, like what thegithub.com/ipld/go-car/v2/storage/CarWriter
is. https://github.com/filecoin-project/lassie/blob/main/pkg/storage/deferredcarwriter.goPut
operationOnPut
event listener that lets us watch for the first put and set the headers in expectation of a CAR with at least one block.The text was updated successfully, but these errors were encountered: