-
Notifications
You must be signed in to change notification settings - Fork 877
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ATLAS-4922: Atlas Async Import using Kafka #307
base: master
Are you sure you want to change the base?
Conversation
a7dabe8
to
d678555
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@DishaTalreja3 - I haven't completed review yet; sending comments so far. Please review and update the patch.
atlas-examples/sample-app/src/main/java/org/apache/atlas/examples/sampleapp/SampleApp.java
Outdated
Show resolved
Hide resolved
client/client-v2/src/main/java/org/apache/atlas/AtlasClientV2.java
Outdated
Show resolved
Hide resolved
client/client-v2/src/main/java/org/apache/atlas/AtlasClientV2.java
Outdated
Show resolved
Hide resolved
client/client-v2/src/main/java/org/apache/atlas/AtlasClientV2.java
Outdated
Show resolved
Hide resolved
client/client-v2/src/main/java/org/apache/atlas/AtlasClientV2.java
Outdated
Show resolved
Hide resolved
@@ -170,6 +177,7 @@ public class NotificationHookConsumer implements Service, ActiveStateChangeHandl | |||
private static final int KAFKA_CONSUMER_SHUTDOWN_WAIT = 30000; | |||
private static final String ATLAS_HOOK_CONSUMER_THREAD_NAME = "atlas-hook-consumer-thread"; | |||
private static final String ATLAS_HOOK_UNSORTED_CONSUMER_THREAD_NAME = "atlas-hook-unsorted-consumer-thread"; | |||
private static final String ATLAS_IMPORT_CONSUMER_THREAD_PREFIX = "atlas-import-hook-consumer-thread-"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
atlas-import-hook-consumer-thread-
=> atlas-import-consumer-thread-
consumersIterator.remove(); | ||
} | ||
} | ||
notificationInterface.closeConsumer(ASYNC_IMPORT); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would close all consumers of type ASYNC_IMPORT
. Shouldn't only consumers of topic
be closed instead? Perhaps only one consumer will be active for type ASYNC_IMPORT
in the current implementation status; but this part of code should better handle multiple simultaneous imports.
|
||
startConsumers(executorService); | ||
public void startImportNotificationConsumer(NotificationType notificationType, String importId, String topic) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
startImportNotificationConsumer(notificationType, importId, topic)
==> startAsyncImportConsumer(importId, topic)
ListIterator<HookConsumer> consumersIterator = consumers.listIterator(); | ||
while (consumersIterator.hasNext()) { | ||
HookConsumer consumer = consumersIterator.next(); | ||
if (consumer.getName().contains(consumerName)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
contains(consumerName)
=> startsWith(consumerName)
new SynchronousQueue<>(), // Direct handoff queue | ||
new ThreadFactoryBuilder().setNameFormat(THREADNAME_PREFIX + " thread-%d").build()); | ||
} | ||
executors = executorService; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review all writes to executors
i.e. following methods:
- startInternal()
- startConsumers()
- stop()
Prior to this patch, these methods are to be called only once in an Atlas instance. However, now executors
can be overwritten for every call to async import. Review the usage carefully and update to avoid inappropriate overwrites.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this new approach the executors are overWritten in
startConsumers() - only when the passed in executorService is null (essentially at the start of the instance or start of the first async import request). Please correct me f I am wrong.
8c56a08
to
5a3a995
Compare
5a3a995
to
69ecd1b
Compare
@@ -224,7 +224,7 @@ public enum AtlasErrorCode { | |||
GLOSSARY_IMPORT_FAILED(409, "ATLAS-409-00-011", "Glossary import failed"), | |||
METRICSSTAT_ALREADY_EXISTS(409, "ATLAS-409-00-012", "Metric Statistics already collected at {0}"), | |||
PENDING_TASKS_ALREADY_IN_PROGRESS(409, "ATLAS-409-00-013", "There are already {0} pending tasks in queue"), | |||
IMPORT_ALREADY_IN_PROGRESS(409, "ATLAS-409-00-016", "Given import request {0} is already in progress or completed"), | |||
ABORT_IMPORT(409, "ATLAS-409-00-016", "Import id={0} is currently in state {1}, cannot be aborted"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ABORT_IMPORT
=> IMPORT_ABORT_NOT_ALLOWED
intg/src/main/java/org/apache/atlas/model/impexp/AtlasAsyncImportRequest.java
Outdated
Show resolved
Hide resolved
@JsonIgnoreProperties(ignoreUnknown = true) | ||
public class AtlasAsyncImportRequest extends AtlasBaseModelObject implements Serializable { | ||
private static final long serialVersionUID = 1L; | ||
public static final String ASYNC_IMPORT_TYPE_NAME = "__AtlasAsyncImportRequest"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Constants defined in line 47-49 seem to be used/needed only in Atlas server side. If yes, please move these to server side (like repository module).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These constants are used in this model class as well. Moving them to the repository module might introduce unnecessary dependencies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
None of the classes in model package referenes ASYNC_IMPORT_TYPE_NAME
. This is referenced only from AsyncImportService
and AtlasAsyncImportRequestDTO
.
} | ||
|
||
@JsonIgnoreProperties(ignoreUnknown = true) | ||
public static class ImportDetails { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- add
implements Serializable
failures
is marked asfinal
. How does this work drring deserialization?failures
is initialized withConcurrentHashMap
. Is this necessary? Json deserializer might assign a Hashmap instance.
@JsonIgnoreProperties(ignoreUnknown = true) | ||
@XmlRootElement | ||
@XmlAccessorType(XmlAccessType.PROPERTY) | ||
public class ImportNotification extends HookNotification implements Serializable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ImportNotification
is not instantiated anywhere. Consider following updates:
- make
ImportNotification
constructor as protected - have
AtlasTypeDefImportNotification
andAtlasEntityImportNotification
extendImportNotification
- move
importId
field toImportNotification
private String importId; | ||
|
||
@JsonProperty | ||
private AtlasEntityWithExtInfo entities; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
entities
=> entity
KafkaProducer producer = getOrCreateProducer(topic); | ||
|
||
sendInternalToProducer(producer, topic, messages); | ||
sendInternal(topic, messages, SORT_NOT_NEEDED); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok. I get it now. SORT_NOT_NEEDED
doesn't add much to readability, given the called method is right above. I suggest to simply send false
here.
notification/src/main/java/org/apache/atlas/notification/rest/RestNotification.java
Show resolved
Hide resolved
notification/src/main/java/org/apache/atlas/notification/spool/Spooler.java
Show resolved
Hide resolved
notification/src/test/java/org/apache/atlas/notification/spool/AtlasFileSpoolTest.java
Show resolved
Hide resolved
…mments and made improvements
7b2eb97
to
0f215e7
Compare
public class AsyncImportStatus { | ||
private String importId; | ||
private AtlasAsyncImportRequest.ImportStatus status; | ||
private String importRequestReceivedAt; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should importRequestReceivedAt
be of type Date
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kept importRequestReceivedAt as a String to avoid the default timestamp serialization of Date and to ensure a more readable format.
private String importId; | ||
private AtlasAsyncImportRequest.ImportStatus status; | ||
private String importRequestReceivedAt; | ||
private String importRequestReceivedBy; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does field importRequestReceivedBy
capture? Is it username or hostname or anything else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the username
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
importRequestReceivedBy
is username? Which user would it be? Atlas server principal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is the requesting user, retrieved using Servlets.getUserName(httpServletRequest)
. This is also tracked in the existing import API and is part of the AtlasImportResult object.
intg/src/main/java/org/apache/atlas/model/impexp/AsyncImportStatus.java
Outdated
Show resolved
Hide resolved
@JsonIgnoreProperties(ignoreUnknown = true) | ||
public class AtlasAsyncImportRequest extends AtlasBaseModelObject implements Serializable { | ||
private static final long serialVersionUID = 1L; | ||
public static final String ASYNC_IMPORT_TYPE_NAME = "__AtlasAsyncImportRequest"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
None of the classes in model package referenes ASYNC_IMPORT_TYPE_NAME
. This is referenced only from AsyncImportService
and AtlasAsyncImportRequestDTO
.
intg/src/main/java/org/apache/atlas/model/impexp/AtlasAsyncImportRequest.java
Outdated
Show resolved
Hide resolved
intg/src/main/java/org/apache/atlas/model/notification/ImportNotification.java
Outdated
Show resolved
Hide resolved
intg/src/main/java/org/apache/atlas/model/notification/ImportNotification.java
Outdated
Show resolved
Hide resolved
intg/src/main/java/org/apache/atlas/utils/AtlasAsyncImportTestUtil.java
Outdated
Show resolved
Hide resolved
intg/src/test/java/org/apache/atlas/utils/TestAtlasAsyncImportTestUtil.java
Outdated
Show resolved
Hide resolved
repository/src/main/java/org/apache/atlas/repository/impexp/AsyncImportService.java
Outdated
Show resolved
Hide resolved
repository/src/main/java/org/apache/atlas/repository/impexp/ImportService.java
Show resolved
Hide resolved
repository/src/main/java/org/apache/atlas/repository/impexp/ImportService.java
Show resolved
Hide resolved
...ory/src/main/java/org/apache/atlas/repository/store/graph/v2/bulkimport/MigrationImport.java
Outdated
Show resolved
Hide resolved
webapp/src/main/java/org/apache/atlas/notification/NotificationHookConsumer.java
Outdated
Show resolved
Hide resolved
af12b9f
to
099b166
Compare
099b166
to
83ed723
Compare
…ortTaskListenerImplTest
What changes were proposed in this pull request?
The existing synchronous import API causes performance bottlenecks with large or multiple requests. This patch introduces asynchronous imports using Kafka to queue requests via dedicated topics, enabling users to submit multiple import requests concurrently.
New APIs added to:
How was this patch tested?
Manual Testing, Unit Tests