-
Notifications
You must be signed in to change notification settings - Fork 8.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HADOOP-18682. Move hadoop docker scripts under the main source code #6483
Conversation
@ayushtkn Can you take a look at this PR? |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi thanks for working on this, but I'm curious
does it mean building Hadoop requires Docker?
does it increase build time, because now it must build the docker image?
Like, if I make a small incremental change to my code, does it rebuild the entire docker image from scratch
Hi @jojochuang, it's not the same as running When you run Here, when you enable the For reference, these are the times on my machine for this branch
> mvn clean install -Dmaven.javadoc.skip=true -DskipTests -DskipShade
------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 02:24 min
> mvn clean install -Dmaven.javadoc.skip=true -DskipTests -DskipShade -Pdist,src
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 06:18 min That's how long it takes in trunk as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does it increase build time, because now it must build the docker image?
The build time should increase theoretically when using the Pdist
profile, because we are doing stuff for docker image, not sure how much, we need to see the time diff with & without the patch....
Like, if I make a small incremental change to my code, does it rebuild the entire docker image from scratch
I believe yes, I don't think there is any incremental thing while building docker images
I think Ozone even builds the docker images during: https://github.com/apache/ozone/blob/master/hadoop-ozone/dist/pom.xml#L372
there is documentation for that:
https://ozone.apache.org/docs/1.0.0/start/fromsource.html
so, it is similar thing for hadoop
@@ -0,0 +1,48 @@ | |||
<!-- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this readme will not get published to the website, either add details here:
https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/site/markdown/SingleCluster.md.vm
or create a heading in the end like it is there for Fully-Distributed Operation
in this page & then add a new page for docker
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ayushtkn I've updated it with a new page for docker. I don't know how to test if it will get published in the website.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do a mvn clean site and it will build the website locally, and you can check that
@ayushtkn Thanks for the review.
BTW, that's where I got the idea for this PR.
The times I posted above, are for this branch. Let me get the times for trunk as well and post them in another comment. Because we are mounting the dist directory as a docker volume for the containers, there isn't any extra time needed in the build other than the time we spend building the dist directory. It's not building an official hadoop image using a Dockerfile. |
Thanx @xBis7 for the details. I haven't tried the code locally yet, but on a quick look, it looks fine. If @jojochuang is convinced, it is good with me |
💔 -1 overall
This message was automatically generated. |
Hi @jojochuang any comments on this? |
💔 -1 overall
This message was automatically generated. |
@xBis7 the compilation is failing in your PR, |
Instead of building |
@adoroszlai you are right. Now that I'm going over this PR again, it can be simplified. All we need is a base image and to mount the generated distribution files. Is there a way to run the CI locally before pushing to the PR? |
I tried it locally and it looks like it's working like that
❯ docker image ls | grep hadoop
apache/hadoop-runner docker-hadoop-3 0bcc0ce3086d 5 years ago 1.06GB The Also, the build succeeds. |
💔 -1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@adoroszlai I updated the README and also changed the hadoop-runner version as suggested. I tested it locally and there are no issues. |
hadoop-common-project/hadoop-common/src/site/markdown/HadoopDocker.md
Outdated
Show resolved
Hide resolved
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
test failures doesn't look related, some OOM related. @adoroszlai does this look good now? |
Looks good. Since the new |
Thanx @xBis7 & @adoroszlai |
Thanks @ayushtkn @adoroszlai for the help! |
Description of PR
There are special branches for running hadoop in docker
docker-hadoop-runner-latest
docker-hadoop-runner-jdk11
docker-hadoop-runner-jdk8
docker-hadoop-runner
docker-hadoop-3
docker-hadoop-2
These branches, run specific versions of hadoop. For example, branch
docker-hadoop-3
runshadoop-3.3.6
.This patch is adding a similar setup under the main source code which will be running the latest trunk. This will be useful for testing code changes and debugging. It will also make it easier for new hadoop contributors to test the project and get familiar with it.
The files were originally copied from the branch
hadoop-docker-3
and were modified so that the code comes from trunk and maven builds the docker image instead of the user.More specifically, in branch
hadoop-docker-3
With these changes, in trunk
hadoop-<version>
underhadoop-dist/target
Jira: https://issues.apache.org/jira/browse/HADOOP-18682
How was this patch tested?
This patch was tested manually in a local env. It's adding a setup for running hadoop in docker. A
README.md
has been added with instruction for testing the changes locally.I've also created a patch and tested it against trunk with
dev-support/bin/test-patch /tmp/1.patch
.To test that the docker environment uses the latest code from trunk, I added a LOG prefix to this
and then built the project again, started the docker env and checked the logs
To test the patch locally
For code changes:
LICENSE
,LICENSE-binary
,NOTICE-binary
files?