Skip to content

Attempt to fix race conditions with non-atomic file copy for worker jars #90

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .bazelversion
Original file line number Diff line number Diff line change
@@ -1 +1 @@
lucidsoftware/8.1.999
lucidsoftware/8.1.10002
2 changes: 1 addition & 1 deletion MODULE.bazel.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Original file line number Diff line number Diff line change
Expand Up @@ -4,16 +4,22 @@ package workers.common
import xsbti.compile.ScalaInstance
import java.io.File
import java.net.URLClassLoader
import java.nio.file.{FileAlreadyExistsException, Files, Path, Paths}
import java.nio.file.{FileAlreadyExistsException, Files, Path, Paths, StandardCopyOption}
import java.util.Properties
import java.util.concurrent.ConcurrentHashMap
import scala.collection.immutable.TreeMap
import scala.util.control.NonFatal

object AnnexScalaInstance {
// See the comment on getAnnexScalaInstance as to why this is necessary
private val instanceCache: ConcurrentHashMap[Set[Path], AnnexScalaInstance] =
new ConcurrentHashMap[Set[Path], AnnexScalaInstance]()

// The worker will use this directory to store temp files in order to better perform
// atomic file copies.
private val tmpWorkerJarDir = Paths.get("annex-tmp-worker-jars")
Files.createDirectories(tmpWorkerJarDir)

/**
* We only need to care about minimizing the number of AnnexScalaInstances we create if things are being run as a
* worker. Otherwise just create the AnnexScalaInstance and be done with it because the process won't be long lived.
Expand Down Expand Up @@ -106,18 +112,30 @@ object AnnexScalaInstance {
// This should only happen once per compiler version, so it shouldn't happen often.
workRequestJarToWorkerJar.foreach { case (workRequestJar, workerJar) =>
this.synchronized {
// Check for existence of the file just in case another request is also writing these jars
// Copying a file is not atomic, so we don't want to end up in a funky state where two
// copies of the same file happen at the same time and cause something bad to happen.
if (!Files.exists(workerJar)) {
// Do a more atomic copy of a file by creating a temp file and then moving
// the temp file to the destination. We can do a move atomically, but cannot do
// a copy atomically. Copying risks the file existing at the destination in a
// partially completed state.
if (Files.notExists(workerJar)) {
var tmpWorkerJar: Option[Path] = None
try {
tmpWorkerJar = Some(Files.createTempFile(tmpWorkerJarDir, workerJar.getFileName.toString, "tmp"))
Files.copy(workRequestJar, tmpWorkerJar.get, StandardCopyOption.REPLACE_EXISTING)

Files.createDirectories(workerJar.getParent())
Files.copy(workRequestJar, workerJar)
Files.move(tmpWorkerJar.get, workerJar, StandardCopyOption.ATOMIC_MOVE)
Copy link

@jnowjack-lucidchart jnowjack-lucidchart Apr 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ATOMIC_MOVE: ... If the target file exists then it is implementation specific if the existing file is replaced or this method fails by throwing an IOException (documentation)

Do you know what happens on our systems? Does it replace or throw an error?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not know. REPLACE_EXISTING is not set, so I'm hoping it would throw an error.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That documentation also states for ATOMIC_MOVE:

The move is performed as an atomic file system operation and all other options are ignored.

so I don't think it matters if REPLACE_EXISTING is set or not, it may or may not throw an error.

} catch {
// We do not care if the file already exists
case _: FileAlreadyExistsException => {}
case e: Throwable => throw new Exception("Error adding file to instance cache", e)
case NonFatal(e) =>
throw new Exception(s"Error copying worker jar: ${workerJar}", e)
} finally {
tmpWorkerJar.foreach { tmpWorkerJar =>
Files.deleteIfExists(tmpWorkerJar)
}
}
} else if (!Files.exists(workerJar)) {
// Files.exists is not the complement of Files.notExists because both return false
// when the existence of the file cannot be determined.
throw new Exception(s"Cannot determine existence of worker jar: ${workerJar}")
}
}
}
Expand Down
2 changes: 1 addition & 1 deletion tests/.bazelversion
Original file line number Diff line number Diff line change
@@ -1 +1 @@
lucidsoftware/8.1.999
lucidsoftware/8.1.10002
2 changes: 1 addition & 1 deletion tests/MODULE.bazel.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.