Skip to content

HADOOP-18987 Various fixes to FileSystem API docs #6292

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 2, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -88,14 +88,13 @@ for example. output streams returned by the S3A FileSystem.
The stream MUST implement `Abortable` and `StreamCapabilities`.

```python
if unsupported:
if unsupported:
throw UnsupportedException

if not isOpen(stream):
no-op

StreamCapabilities.hasCapability("fs.capability.outputstream.abortable") == True

```


Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -64,13 +64,13 @@ a protected directory, result in such an exception being raised.

### `boolean isDirectory(Path p)`

def isDirectory(FS, p)= p in directories(FS)
def isDir(FS, p) = p in directories(FS)


### `boolean isFile(Path p)`


def isFile(FS, p) = p in files(FS)
def isFile(FS, p) = p in filenames(FS)


### `FileStatus getFileStatus(Path p)`
Expand Down Expand Up @@ -250,7 +250,7 @@ process.
changes are made to the filesystem, the result of `listStatus(parent(P))` SHOULD
include the value of `getFileStatus(P)`.

* After an entry at path `P` is created, and before any other
* After an entry at path `P` is deleted, and before any other
changes are made to the filesystem, the result of `listStatus(parent(P))` SHOULD
NOT include the value of `getFileStatus(P)`.

Expand Down Expand Up @@ -305,7 +305,7 @@ that they must all be listed, and, at the time of listing, exist.
All paths must exist. There is no requirement for uniqueness.

forall p in paths :
exists(fs, p) else raise FileNotFoundException
exists(FS, p) else raise FileNotFoundException

#### Postconditions

Expand Down Expand Up @@ -381,7 +381,7 @@ being completely performed.

Path `path` must exist:

exists(FS, path) : raise FileNotFoundException
if not exists(FS, path) : raise FileNotFoundException

#### Postconditions

Expand Down Expand Up @@ -432,7 +432,7 @@ of data which must be collected in a single RPC call.

#### Preconditions

exists(FS, path) else raise FileNotFoundException
if not exists(FS, path) : raise FileNotFoundException
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was using the concept exists() holds or raise FNFE, but if this suits then I'm not going to argue about details. The goal of this spec is for other people and tests, not mathematical purism.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either form is understandable imo, just thought the suggested form is more in line with the remainder of the docs which uses pythonesque code.


### Postconditions

Expand Down Expand Up @@ -463,7 +463,7 @@ and 1 for file count.

#### Preconditions

exists(FS, path) else raise FileNotFoundException
if not exists(FS, path) : raise FileNotFoundException

#### Postconditions

Expand Down Expand Up @@ -567,7 +567,7 @@ when writing objects to a path in the filesystem.
#### Postconditions


result = integer >= 0
result = integer >= 0

The outcome of this operation is usually identical to `getDefaultBlockSize()`,
with no checks for the existence of the given path.
Expand All @@ -591,12 +591,12 @@ on the filesystem.

#### Preconditions

if not exists(FS, p) : raise FileNotFoundException
if not exists(FS, p) : raise FileNotFoundException


#### Postconditions

if len(FS, P) > 0: getFileStatus(P).getBlockSize() > 0
if len(FS, P) > 0 : getFileStatus(P).getBlockSize() > 0
result == getFileStatus(P).getBlockSize()

1. The outcome of this operation MUST be identical to the value of
Expand Down Expand Up @@ -654,12 +654,12 @@ No ancestor may be a file

forall d = ancestors(FS, p) :
if exists(FS, d) and not isDir(FS, d) :
raise [ParentNotDirectoryException, FileAlreadyExistsException, IOException]
raise {ParentNotDirectoryException, FileAlreadyExistsException, IOException}

#### Postconditions


FS' where FS'.Directories' = FS.Directories + [p] + ancestors(FS, p)
FS' where FS'.Directories = FS.Directories + [p] + ancestors(FS, p)
result = True


Expand Down Expand Up @@ -688,7 +688,7 @@ The return value is always true—even if a new directory is not created

The file must not exist for a no-overwrite create:

if not overwrite and isFile(FS, p) : raise FileAlreadyExistsException
if not overwrite and isFile(FS, p) : raise FileAlreadyExistsException

Writing to or overwriting a directory must fail.

Expand All @@ -698,7 +698,7 @@ No ancestor may be a file

forall d = ancestors(FS, p) :
if exists(FS, d) and not isDir(FS, d) :
raise [ParentNotDirectoryException, FileAlreadyExistsException, IOException]
raise {ParentNotDirectoryException, FileAlreadyExistsException, IOException}

FileSystems may reject the request for other
reasons, such as the FS being read-only (HDFS),
Expand All @@ -712,8 +712,8 @@ For instance, HDFS may raise an `InvalidPathException`.
#### Postconditions

FS' where :
FS'.Files'[p] == []
ancestors(p) is-subset-of FS'.Directories'
FS'.Files[p] == []
ancestors(p) subset-of FS'.Directories

result = FSDataOutputStream

Expand All @@ -734,7 +734,7 @@ The behavior of the returned stream is covered in [Output](outputstream.html).
clients creating files with `overwrite==true` to fail if the file is created
by another client between the two tests.

* The S3A and potentially other Object Stores connectors not currently change the `FS` state
* The S3A and potentially other Object Stores connectors currently don't change the `FS` state
until the output stream `close()` operation is completed.
This is a significant difference between the behavior of object stores
and that of filesystems, as it allows >1 client to create a file with `overwrite=false`,
Expand Down Expand Up @@ -762,15 +762,15 @@ The behavior of the returned stream is covered in [Output](outputstream.html).
#### Implementation Notes

`createFile(p)` returns a `FSDataOutputStreamBuilder` only and does not make
change on filesystem immediately. When `build()` is invoked on the `FSDataOutputStreamBuilder`,
changes on the filesystem immediately. When `build()` is invoked on the `FSDataOutputStreamBuilder`,
the builder parameters are verified and [`create(Path p)`](#FileSystem.create)
is invoked on the underlying filesystem. `build()` has the same preconditions
and postconditions as [`create(Path p)`](#FileSystem.create).

* Similar to [`create(Path p)`](#FileSystem.create), files are overwritten
by default, unless specify `builder.overwrite(false)`.
by default, unless specified by `builder.overwrite(false)`.
* Unlike [`create(Path p)`](#FileSystem.create), missing parent directories are
not created by default, unless specify `builder.recursive()`.
not created by default, unless specified by `builder.recursive()`.

### <a name='FileSystem.append'></a> `FSDataOutputStream append(Path p, int bufferSize, Progressable progress)`

Expand All @@ -780,14 +780,14 @@ Implementations without a compliant call SHOULD throw `UnsupportedOperationExcep

if not exists(FS, p) : raise FileNotFoundException

if not isFile(FS, p) : raise [FileAlreadyExistsException, FileNotFoundException, IOException]
if not isFile(FS, p) : raise {FileAlreadyExistsException, FileNotFoundException, IOException}

#### Postconditions

FS' = FS
result = FSDataOutputStream

Return: `FSDataOutputStream`, which can update the entry `FS.Files[p]`
Return: `FSDataOutputStream`, which can update the entry `FS'.Files[p]`
by appending data to the existing list.

The behavior of the returned stream is covered in [Output](outputstream.html).
Expand All @@ -813,7 +813,7 @@ Implementations without a compliant call SHOULD throw `UnsupportedOperationExcep

#### Preconditions

if not isFile(FS, p)) : raise [FileNotFoundException, IOException]
if not isFile(FS, p)) : raise {FileNotFoundException, IOException}

This is a critical precondition. Implementations of some FileSystems (e.g.
Object stores) could shortcut one round trip by postponing their HTTP GET
Expand Down Expand Up @@ -842,7 +842,7 @@ The result MUST be the same for local and remote callers of the operation.
symbolic links

1. HDFS throws `IOException("Cannot open filename " + src)` if the path
exists in the metadata, but no copies of any its blocks can be located;
exists in the metadata, but no copies of its blocks can be located;
-`FileNotFoundException` would seem more accurate and useful.

### `FSDataInputStreamBuilder openFile(Path path)`
Expand All @@ -861,7 +861,7 @@ Implementations without a compliant call MUST throw `UnsupportedOperationExcepti

let stat = getFileStatus(Path p)
let FS' where:
(FS.Directories', FS.Files', FS.Symlinks')
(FS'.Directories, FS.Files', FS'.Symlinks)
p' in paths(FS') where:
exists(FS, stat.path) implies exists(FS', p')

Expand Down Expand Up @@ -931,16 +931,16 @@ metadata in the `PathHandle` to detect references from other namespaces.

### `FSDataInputStream open(PathHandle handle, int bufferSize)`

Implementaions without a compliant call MUST throw `UnsupportedOperationException`
Implementations without a compliant call MUST throw `UnsupportedOperationException`

#### Preconditions

let fd = getPathHandle(FileStatus stat)
if stat.isdir : raise IOException
let FS' where:
(FS.Directories', FS.Files', FS.Symlinks')
p' in FS.Files' where:
FS.Files'[p'] = fd
(FS'.Directories, FS.Files', FS'.Symlinks)
p' in FS'.Files where:
FS'.Files[p'] = fd
if not exists(FS', p') : raise InvalidPathHandleException

The implementation MUST resolve the referent of the `PathHandle` following
Expand All @@ -951,7 +951,7 @@ encoded in the `PathHandle`.

#### Postconditions

result = FSDataInputStream(0, FS.Files'[p'])
result = FSDataInputStream(0, FS'.Files[p'])

The stream returned is subject to the constraints of a stream returned by
`open(Path)`. Constraints checked on open MAY hold to hold for the stream, but
Expand Down Expand Up @@ -1006,7 +1006,7 @@ A directory with children and `recursive == False` cannot be deleted

If the file does not exist the filesystem state does not change

if not exists(FS, p):
if not exists(FS, p) :
FS' = FS
result = False

Expand Down Expand Up @@ -1089,7 +1089,7 @@ Some of the object store based filesystem implementations always return
false when deleting the root, leaving the state of the store unchanged.

if isRoot(p) :
FS ' = FS
FS' = FS
result = False

This is irrespective of the recursive flag status or the state of the directory.
Expand Down Expand Up @@ -1152,7 +1152,7 @@ has been calculated.

Source `src` must exist:

exists(FS, src) else raise FileNotFoundException
if not exists(FS, src) : raise FileNotFoundException

`dest` cannot be a descendant of `src`:

Expand All @@ -1162,7 +1162,7 @@ This implicitly covers the special case of `isRoot(FS, src)`.

`dest` must be root, or have a parent that exists:

isRoot(FS, dest) or exists(FS, parent(dest)) else raise IOException
if not (isRoot(FS, dest) or exists(FS, parent(dest))) : raise IOException

The parent path of a destination must not be a file:

Expand Down Expand Up @@ -1240,7 +1240,8 @@ There is no consistent behavior here.

The outcome is no change to FileSystem state, with a return value of false.

FS' = FS; result = False
FS' = FS
result = False

*Local Filesystem*

Expand Down Expand Up @@ -1319,28 +1320,31 @@ Implementations without a compliant call SHOULD throw `UnsupportedOperationExcep

All sources MUST be in the same directory:

for s in sources: if parent(S) != parent(p) raise IllegalArgumentException
for s in sources:
if parent(s) != parent(p) : raise IllegalArgumentException

All block sizes must match that of the target:

for s in sources: getBlockSize(FS, S) == getBlockSize(FS, p)
for s in sources:
getBlockSize(FS, s) == getBlockSize(FS, p)

No duplicate paths:

not (exists p1, p2 in (sources + [p]) where p1 == p2)
let input = sources + [p]
not (exists i, j: i != j and input[i] == input[j])

HDFS: All source files except the final one MUST be a complete block:

for s in (sources[0:length(sources)-1] + [p]):
(length(FS, s) mod getBlockSize(FS, p)) == 0
(length(FS, s) mod getBlockSize(FS, p)) == 0


#### Postconditions


FS' where:
(data(FS', T) = data(FS, T) + data(FS, sources[0]) + ... + data(FS, srcs[length(srcs)-1]))
and for s in srcs: not exists(FS', S)
(data(FS', p) = data(FS, p) + data(FS, sources[0]) + ... + data(FS, sources[length(sources)-1]))
for s in sources: not exists(FS', s)


HDFS's restrictions may be an implementation detail of how it implements
Expand All @@ -1360,7 +1364,7 @@ Implementations without a compliant call SHOULD throw `UnsupportedOperationExcep

if not exists(FS, p) : raise FileNotFoundException

if isDir(FS, p) : raise [FileNotFoundException, IOException]
if isDir(FS, p) : raise {FileNotFoundException, IOException}

if newLength < 0 || newLength > len(FS.Files[p]) : raise HadoopIllegalArgumentException

Expand All @@ -1369,8 +1373,7 @@ Truncate cannot be performed on a file, which is open for writing or appending.

#### Postconditions

FS' where:
len(FS.Files[p]) = newLength
len(FS'.Files[p]) = newLength

Return: `true`, if truncation is finished and the file can be immediately
opened for appending, or `false` otherwise.
Expand Down Expand Up @@ -1399,7 +1402,7 @@ Source and destination must be different
if src = dest : raise FileExistsException
```

Destination and source must not be descendants one another
Destination and source must not be descendants of one another
```python
if isDescendant(src, dest) or isDescendant(dest, src) : raise IOException
```
Expand Down Expand Up @@ -1429,7 +1432,7 @@ Given a base path on the source `base` and a child path `child` where `base` is

```python
def final_name(base, child, dest):
is base = child:
if base == child:
return dest
else:
return dest + childElements(base, child)
Expand Down Expand Up @@ -1557,7 +1560,7 @@ while (iterator.hasNext()) {

As raising exceptions is an expensive operation in JVMs, the `while(hasNext())`
loop option is more efficient. (see also [Concurrency and the Remote Iterator](#RemoteIteratorConcurrency)
for a dicussion on this topic).
for a discussion on this topic).

Implementors of the interface MUST support both forms of iterations; authors
of tests SHOULD verify that both iteration mechanisms work.
Expand Down
Loading