Skip to content

Commit

Permalink
CDF-20156: fixup fatjar pom file (#858)
Browse files Browse the repository at this point in the history
Currently
https://repo1.maven.org/maven2/com/cognite/spark/datasource/cdf-spark-datasource-fatjar_2.13/3.2.1044/cdf-spark-datasource-fatjar_2.13-3.2.1044.pom
includes
```xml
<dependencies>
  <dependency>
    <groupId>org.scala-lang</groupId>
    <artifactId>scala-library</artifactId>
    <version>2.13.8</version>
  </dependency>
  <dependency>
    <groupId>com.cognite.spark.datasource</groupId>
    <artifactId>cdf-spark-datasource_2.13</artifactId>
    <version>3.2.1044</version>
  </dependency>
  <dependency>
    <groupId>io.scalaland</groupId>
    <artifactId>chimney_2.13</artifactId>
    <version>0.5.3</version>
  </dependency>
</dependencies>
```
Which isn't expected, fatjar should have (almost) no dependencies.
The jar itself though contains all the classes and applies shading
as expected.
It is then possible that pulling fatjar from maven central would pull duplicated classes
and if non-fatjar unshaded versions are picked up or a mix with fatjar ones, the
datasource would runtime-crash when used for example in Spark in DataBricks.

Let's remove chimney from commonSettings and move to per-target dependencies.
And for main library, looks like the simplest way to both
have it as part of fatJar and not have it in pom is to patch generated pom.

As for scala-library keeping it as-is, older fatjar setup also had it,
so not investigating for now.

Verified pom contents by running `fatJarShaded/publishLocal` and inspecting pom
from filesystem.
And used `fatJarShaded/assembly` to generate fatjar and inspect with `jar tf`
to check that is has our classes and applies shading rules.
Did not yet verify in databricks, but even if this PR isn't enough to fix
running there it is still a step in the right directiion.

[CDF-20156]
  • Loading branch information
dmivankov authored Nov 1, 2023
1 parent d473c47 commit 7ae467d
Showing 1 changed file with 22 additions and 2 deletions.
24 changes: 22 additions & 2 deletions build.sbt
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
import com.typesafe.sbt.packager.docker.Cmd
import sbtassembly.AssemblyPlugin.autoImport._
import sbtassembly.MergeStrategy
import scala.xml.{Node => XmlNode, NodeSeq => XmlNodeSeq, _}
import scala.xml.transform.{RewriteRule, RuleTransformer}

val scala212 = "2.12.15"
val scala213 = "2.13.8"
Expand All @@ -26,7 +28,7 @@ lazy val commonSettings = Seq(
organization := "com.cognite.spark.datasource",
organizationName := "Cognite",
organizationHomepage := Some(url("https://cognite.com")),
version := "3.3." + patchVersion,
version := "3.4." + patchVersion,
isSnapshot := patchVersion.endsWith("-SNAPSHOT"),
crossScalaVersions := supportedScalaVersions,
semanticdbEnabled := true,
Expand All @@ -35,7 +37,6 @@ lazy val commonSettings = Seq(
description := "Spark data source for the Cognite Data Platform.",
licenses := List("Apache 2" -> new URL("http://www.apache.org/licenses/LICENSE-2.0.txt")),
homepage := Some(url("https://github.com/cognitedata/cdp-spark-datasource")),
libraryDependencies ++= Seq("io.scalaland" %% "chimney" % "0.5.3"),
scalacOptions ++= Seq("-Xlint:unused", "-language:higherKinds", "-deprecation", "-feature"),
resolvers ++= Seq(
Resolver.sonatypeRepo("releases")
Expand Down Expand Up @@ -91,6 +92,7 @@ lazy val structType = (project in file("struct_type"))
name := "cdf-spark-datasource-struct-type",
crossScalaVersions := supportedScalaVersions,
libraryDependencies ++= Seq(
"io.scalaland" %% "chimney" % "0.6.1",
"org.typelevel" %% "cats-core" % "2.9.0",
"org.apache.spark" %% "spark-sql" % sparkVersion % Provided,
),
Expand Down Expand Up @@ -186,6 +188,24 @@ lazy val fatJarShaded = project
)
},
assembly / assemblyOption := (assembly / assemblyOption).value.withIncludeScala(false),
pomPostProcess := { (node: XmlNode) =>
new RuleTransformer(new RewriteRule {
override def transform(node: XmlNode): XmlNodeSeq = node match {
case e: Elem if e.label == "dependency"
&& e.child.filter(_.label == "groupId").flatMap(_.text).mkString == "com.cognite.spark.datasource"
&& e.child.filter(_.label == "artifactId").flatMap(_.text).mkString.startsWith("cdf-spark-datasource") =>
// Omit library artifact from pom's dependencies.
// All sbt-assembly settings are kept here and we can't depend on
// Compile / packageBin := (library / assembly).value
// as it would try to run sbt-assembly on library too and it doesn't have
// all the configuration.
// Otoh library itself should remain pure non-fatjar library and know nothing about sbt-assembly.
// There could be other ways to achieve the same effect, so far this one is found and it is simple enough.
Seq()
case _ => node
}
}).transform(node).head
}
)

addCompilerPlugin("com.olegpy" %% "better-monadic-for" % "0.3.1")
Expand Down

0 comments on commit 7ae467d

Please sign in to comment.