Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segment stuck in bad state until server restart when download from deepstore failed while adding new segment #14571

Open
chrajeshbabu opened this issue Nov 30, 2024 · 2 comments

Comments

@chrajeshbabu
Copy link
Contributor

chrajeshbabu commented Nov 30, 2024

Segment stuck in bad state when the download from deep store failed with EOF exception while adding new segment.

{
  "segmentName": <segment_name>,
  "serverState": {
    "Server_<host>_<port>": {
      "idealState": "ONLINE",
      "externalView": "ERROR",
      "segmentSize": "0 bytes",
      "consumerInfo": null,
      "errorInfo": {
        "timestamp": "2024-11-28 19:24:50 GMT",
        "errorMessage": "Caught exception while adding ONLINE segment",
        "stackTrace": "java.io.EOFException\n\tat org.apache.commons.compress.compressors.gzip.GzipCompressorInputStream.read(GzipCompressorInputStream.java:316)\n\tat org.apache.commons.compress.archivers.tar.TarArchiveInputStream.read(TarArchiveInputStream.java:634)\n\tat java.base/java.io.FilterInputStream.read(FilterInputStream.java:106)\n\tat org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:1483)\n\tat org.apache.commons.io.IOUtils.copy(IOUtils.java:1107)\n\tat org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:1456)\n\tat org.apache.commons.io.IOUtils.copy(IOUtils.java:1085)\n\tat org.apache.pinot.common.utils.TarGzCompressionUtils.untarWithRateLimiter(TarGzCompressionUtils.java:202)\n\tat org.apache.pinot.common.utils.TarGzCompressionUtils.untar(TarGzCompressionUtils.java:148)\n\tat org.apache.pinot.common.utils.TarGzCompressionUtils.untar(TarGzCompressionUtils.java:138)\n\tat org.apache.pinot.core.data.manager.BaseTableDataManager.untarSegment(BaseTableDataManager.java:835)\n\tat org.apache.pinot.core.data.manager.BaseTableDataManager.downloadSegmentFromDeepStore(BaseTableDataManager.java:783)\n\tat org.apache.pinot.core.data.manager.BaseTableDataManager.downloadSegment(BaseTableDataManager.java:730)\n\tat org.apache.pinot.core.data.manager.BaseTableDataManager.downloadAndLoadSegment(BaseTableDataManager.java:389)\n\tat org.apache.pinot.core.data.manager.BaseTableDataManager.addNewOnlineSegment(BaseTableDataManager.java:360)\n\tat org.apache.pinot.core.data.manager.offline.OfflineTableDataManager.doAddOnlineSegment(OfflineTableDataManager.java:54)\n\tat org.apache.pinot.core.data.manager.BaseTableDataManager.addOnlineSegment(BaseTableDataManager.java:313)\n\tat org.apache.pinot.server.starter.helix.HelixInstanceDataManager.addOnlineSegment(HelixInstanceDataManager.java:275)\n\tat org.apache.pinot.server.starter.helix.SegmentOnlineOfflineStateModelFactory$SegmentOnlineOfflineStateModel.onBecomeOnlineFromOffline(SegmentOnlineOfflineStateModelFactory.java:131)\n\tat jdk.internal.reflect.GeneratedMethodAccessor147.invoke(Unknown Source)\n\tat java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat java.base/java.lang.reflect.Method.invoke(Method.java:569)\n\tat org.apache.helix.messaging.handling.HelixStateTransitionHandler.invoke(HelixStateTransitionHandler.java:350)\n\tat org.apache.helix.messaging.handling.HelixStateTransitionHandler.handleMessage(HelixStateTransitionHandler.java:278)\n\tat org.apache.helix.messaging.handling.HelixTask.call(HelixTask.java:97)\n\tat org.apache.helix.messaging.handling.HelixTask.call(HelixTask.java:49)\n\tat java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)\n\tat java.base/java.lang.Thread.run(Thread.java:840)\n"
      }
    }
  }
}

Tried to reload the segment from the rest API which is loading the segment further because of the segment registration not happened yet and _segmentDataManagerMap doesn't have entry for the segment.

2024/11/30 00:21:11.702 WARN [HelixInstanceDataManager] [HelixTaskExecutor-message_handle_thread_54] Failed to get segment data manager for segments: [<segment_name>] of table: org.apache.pinot.core.data.manager.offline.OfflineTableDataManager@52a09a91, skipping reloading them

New segment addition flow

  public void downloadAndLoadSegment(SegmentZKMetadata zkMetadata, IndexLoadingConfig indexLoadingConfig)
      throws Exception {
    String segmentName = zkMetadata.getSegmentName();
    _logger.info("Downloading and loading segment: {}", segmentName);
    File indexDir = downloadSegment(zkMetadata);
    addSegment(ImmutableSegmentLoader.load(indexDir, indexLoadingConfig));
    _logger.info("Downloaded and loaded segment: {} with CRC: {} on tier: {}", segmentName, zkMetadata.getCrc(),
        TierConfigUtils.normalizeTierName(zkMetadata.getTier()));
  }
@chrajeshbabu
Copy link
Contributor Author

chrajeshbabu commented Nov 30, 2024

@xiangfu0 @Jackie-Jiang would be better to add new segment as new segment when segment data manager could not be able to have it during the reload?

@Jackie-Jiang
Copy link
Contributor

From what I can tell from the exception, seems the segment file is corrupted.
In order to retry, you want to use reset segment rest API

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants