-
Notifications
You must be signed in to change notification settings - Fork 79
Require all packages to solve / compile and include all valid compilers in their metadata #669
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've included a few review comments that describe how various parts of the code work. But I'm also happy to jump on a call in the PureScript chat to walk through this and answer any questions.
publish :: forall r. PackageSource -> PublishData -> Run (PublishEffects + r) Unit | ||
publish source payload = do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We no longer need the PackageSource
type because we no longer have exemptions for "legacy" vs. "current" packages. All packages must solve and compile. We had exemptions before because we weren't sure what compiler version to use to publish legacy packages but we now manually verify one that works before we ever run publish.
app/src/App/API.purs
Outdated
Operation.Validation.validatePursModules files >>= case _ of | ||
Left formattedError | payload.compiler < unsafeFromRight (Version.parse "0.15.0") -> do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As in the comment above, this code will fail packages that have syntax that our version of language-cst-parser
doesn't support. In our case that's 0.15.0+. I've therefore relaxed this requirement for packages before 0.15.0 or else they would be spuriously rejected.
We could potentially sub-in a regex check for "module where", even though it's fragile, for pre-0.15.0 packages.
app/src/App/API.purs
Outdated
@@ -504,20 +515,30 @@ publish source payload = do | |||
Right versions -> pure versions | |||
|
|||
case Map.lookup manifest.version published of | |||
Nothing | payload.compiler < unsafeFromRight (Version.parse "0.14.7") -> do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
purs publish
will fail for packages prior to 0.14.7 because that's when we added support for the purs.json file format. Before that the compiler looks for specific Bowerfile fields that aren't present in the purs.json file. Since all these packages ought to already be published to Pursuit I think this is fine. We can't do anything about it anyway until #525.
case compilationResult of | ||
Left error | ||
-- We allow legacy packages to fail compilation because we do not | ||
-- necessarily know what compiler to use with them. | ||
| source == LegacyPackage -> do | ||
Log.debug error | ||
Log.warn "Failed to compile, but continuing because this package is a legacy package." | ||
| otherwise -> | ||
Except.throw error | ||
Right _ -> | ||
pure unit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is no longer needed because packages must compile.
@@ -168,7 +168,6 @@ handleMemoryFs env = case _ of | |||
case inFs of | |||
Nothing -> pure $ reply Nothing | |||
Just entry -> do | |||
Log.debug $ "Fell back to on-disk entry for " <> memory |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are just so noisy. Maybe we can introduce a Log.superDebug
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this log useful at all? I think it's ok to just remove it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, I think they're not really useful now that we're confident the cache works correctly. I had them in there from when I first developed it and would either sometimes see things I thought should be cached not get cached, or I wanted to make sure something I removed from the cache really was.
scripts/src/LegacyImporter.purs
Outdated
publishLegacyPackage :: Manifest -> Run _ Unit | ||
publishLegacyPackage (Manifest manifest) = do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is where we solve, compile, and then publish each package in turn. Publish failures are saved to cache and, at the end of the process, written to a publish-failures.json
file that records every version that failed and its reason so we can hand-review it. I've run this on a few hundred packages and it's looking correct.
An example from the publish-failures.json file: {
"string-parsers": {
"3.0.1": {
"reason": "No versions found in the registry for lists in range\n >=4.0.0 (declared dependency)\n <5.0.0 (declared dependency)",
"tag": "SolveFailed"
},
"3.1.0": {
"reason": "No versions found in the registry for lists in range\n >=4.0.0 (declared dependency)\n <5.0.0 (declared dependency)",
"tag": "SolveFailed"
}
},
} |
I've uncovered two rare but significant issues affecting the legacy importer (unrelated to this PR). Topological sorting
This run fails to produce a valid index because [email protected] is unsatisfied in its dependency on contravariant, but of course we see contravariant get inserted a little later on. The solution is to always consider version bounds when reading an index from disk where we expect bounds to be at least reasonably correct. I've implemented and tested that and situations like this no longer happen. The reason we ignored ranges at first is because we had far fewer checks around correct bounds and because in the package sets we want to explicitly ignore ranges when working with an index (when doing, for example, the 'self-contained' check). I've preserved that behavior — you can always opt-in to either considering or ignoring ranges when working with an index. Incorrect dependencies detected in legacy manifests derived from package sets Second, in some cases a package like [email protected] will have its dependency list pruned to only those listed in its package sets entry, and those turn out to be overly-restrictive. In this case, specifically, the dependency on This shouldn't happen because [email protected] has a Bowerfile that explicitly lists a dependency on integers, which we trimmed out by deferring to the package sets entry. We did this assuming package sets entries are correct and because we didn't want overly-constrained dependency lists. The second concern is no longer valid because with #667 we will remove unused dependencies. The first concern is no longer reasonable because we have at least one example of package sets dependency lists being incorrect. The solution is simple: instead of preferring package sets entries over other manifests, just union them all and defer to the 'unused dependencies' pruning in the publishing pipeline to trim out ones that aren't actually needed. |
…most manifest index ops
@@ -428,7 +428,7 @@ validatePackageSet (PackageSet set) = do | |||
-- We can now attempt to produce a self-contained manifest index from the | |||
-- collected manifests. If this fails then the package set is not | |||
-- self-contained. | |||
Tuple unsatisfied _ = ManifestIndex.maximalIndex (Set.fromFoldable success) | |||
Tuple unsatisfied _ = ManifestIndex.maximalIndex ManifestIndex.IgnoreRanges (Set.fromFoldable success) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We always ignore ranges in package sets, but we should rely on them otherwise, especially now that we're actually solving packages as part of publishing and can be more trusting that they aren't bogus.
scripts/src/LegacyImporter.purs
Outdated
let metadataPackage = unsafeFromRight (PackageName.parse "metadata") | ||
Registry.readMetadata metadataPackage >>= case _ of | ||
Nothing -> do | ||
Log.info "Writing empty metadata file for the 'metadata' package" | ||
let location = GitHub { owner: "purescript", repo: "purescript-metadata", subdir: Nothing } | ||
let entry = Metadata { location, owners: Nothing, published: Map.empty, unpublished: Map.empty } | ||
Registry.writeMetadata metadataPackage entry | ||
Just _ -> pure unit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We agreed to not reserve package names pre-0.13.0, so this reserves only metadata for the "metadata" package used by legacy package sets.
Can confirm that the fix is working with regards to generating manifests with full dependency lists to be pruned later — this is {
"name": "strings",
"version": "3.5.0",
"license": "MIT",
"description": "String and char utility functions, regular expressions.",
"location": {
"githubOwner": "purescript",
"githubRepo": "purescript-strings"
},
"dependencies": {
"arrays": ">=4.0.1 <5.0.0",
"either": ">=3.0.0 <4.0.0",
"gen": ">=1.1.0 <2.0.0",
"integers": ">=3.2.0 <4.0.0",
"maybe": ">=3.0.0 <4.0.0",
"partial": ">=1.2.0 <2.0.0",
"unfoldable": ">=3.0.0 <4.0.0"
}
} ...and the manifest index sorting is working as far as I can tell as well. |
We also need to support spago.yaml files in the legacy importer, as some packages now use that format. Otherwise they will be excluded with a |
Here's another fun one: some packages, like That means we can't get away with simply removing unused dependencies because we may also remove direct imports that were pulled in transitively by the unused dependency. Either we give up on removing unused dependencies, or we both remove unused dependencies and insert direct imports that weren't mentioned. For dependencies we insert into a manifest we have their exact version via the solver; we could potentially do a I think that's preferable to giving up on the unused dependencies check but I'm curious if you disagree @f-f or @colinwahl. |
As of the latest commit: we no longer simply remove unused dependencies. Instead, we loop. We remove unused dependencies, then bring in any transitive dependencies they would have brought in, and then check the new dependency list for unused dependencies and so on. The result is that we remove unused dependencies while preserving any transitive dependencies they brought it which are used in the source code. Note that we don't go through and add all packages your code directly imports, we just do this for dependencies that are being removed. |
Resolved merge conflicts. Unfortunately, the integration test fails on my home system (which is pop_os, rather than NixOS) so that's going to be annoying to debug in the future; fails to git clone the effect repository due to the tmp directory already including it. We're at least in a good position to resume work to get this over the line. |
Integration test was already passing in CI, but it was failing for me locally. Diagnosed the issue as arising from the |
Now that I finally got around to running the importer all the way through and have a local cache I can get back to splitting out the compiler jobs to be separate from the normal publish pipeline as discussed.
package-failures.json |
@thomashoneyman I would like to merge this so we can base further work on the main branch - do you recall any good reasions for not merging and keeping it separate until we are ready with the |
Unfortunately it’s been long enough I can’t quite remember. Some possible issues come to mind:
Otherwise I think it’s fine, and these may all be minor or nonissues. Is there a disadvantage to working against this branch vs merging to main? |
I should emphasize too that if this merges we will need to do the reupload right away, because in compilation the solver will consider package dependencies with the compiler version included; if no packages have any compiler versions then new uploads are going to fail |
Right, thanks for the clarifications @thomashoneyman! So the plan is:
|
You would need to disable the GitHub actions on the registry repository and sleep/kill the server while you reupload |
Fixes #577. Fixes #255. Merging blocked by #696.
The core problem solved here is identifying what compilers are compatible with a specific package version, such as
[email protected]
. We need this to support an oft-requested feature for Pursuit: filtering search results by a compiler version (or range). It's also useful for other things; for one, it allows us to add the compiler version as a constraint to the solver to produce a build plan that works with the specific compiler given.Metadata files now include a
compilers
key in published metadata that lists either a bare version (the version used to publish the package) or an array of versions (the full set of compilers known to work with the package). The reason for two representations is that computing the full set of compilers can take a long time; this approach lets us upload the package right away and compute the rest of the valid compilers in a fixup pass. A bare version means the full set has not been computed yet.All packages must now be solvable. We can't compile a package version if we can't fetch its dependencies, so this becomes a requirement for all packages.
There are only 2 scenarios in which we need to compute the available compilers for a package version:
This PR is focused on the first case, and we should do a followup for the second case. (The second case is straightforward, and @colinwahl's compiler versions script already essentially implements it. It's been omitted from this PR for brevity).
A new package version can be published via the legacy importer or via a user submitting an API call, but the result is the same: eventually the
publish
pipeline is run. For that reason I've decided to compute the compiler versions for a package version as part of the publish pipeline where we're already determining resolutions and building with a specific compiler. That centralizes the logic to a single place.Therefore this PR centers on two things: trying compilers to find all that work for a package version at the end of publishing, and updating the legacy importer to determine a valid compiler version and resolutions before calling
publish
.I've added some tests and I've run the legacy importer locally; it's about 500 packages in so far and every failure appears to be correct. More comments in the PR.