Skip to content

Expand "Configure a lifecycle policy" docs #1906

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

kilfoyle
Copy link
Contributor

@kilfoyle kilfoyle commented Jun 24, 2025

This PR takes a chunk out of internal issue Requested fixes for data lifecycle docs that was initiated by a review of the ILM docs by our Support team.

It updates the "Configure a lifecycle policy" page to:

  • Add Kibana steps where we currently show only the API steps.
    • In particular, when creating an index template users aren't always sure what to specify on the "Index settings" tab, so this adds an example config.
  • Add a page overview
  • Add a section about viewing the ILM status for an index or datastream
  • Fix up smaller items, such as:
    • Explicitly call out the Kibana "Data retention" option.
    • Emphasize that lifecycle phase changes are based on time since rollover rather than index creation time
    • Warn about updating the logs@lifecycle and metrics@lifecycle policies since they affect a LOT of indices.
  • Provide links for things like the index lifecycle actions, mappings, etc., to help people understand these options.

Please see preview page: Configure a lifecycle policy

ES Data Management team, if any of you can please give this a technical review I'd be very grateful! 🙏 The API instructions aren't changed, with the exception that I added this section about calling the ILM explain API. The Kibana steps are all new.

@kilfoyle kilfoyle changed the title 1572/ilm create template Expand "Configure a lifecycle policy" docs Jun 24, 2025
@kilfoyle kilfoyle marked this pull request as ready for review June 24, 2025 17:38
@kilfoyle kilfoyle requested a review from a team as a code owner June 24, 2025 17:38
@kilfoyle kilfoyle requested a review from a team June 24, 2025 17:39
Copy link
Contributor

@samxbr samxbr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making this documentation change, I love the new Kibana steps! I just left a few comments.

Copy link

@gmarouli gmarouli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kilfoyle thank you for doing this, how nice to see it getting some love ❤️ .

I found some spots that might be misleading or incorrect, let me know if you want to go over through some of them offline as well, if that helps.

Comment on lines 130 to 134
1. If you're storing continuously generated, append-only data, you can opt to create [data streams](/manage-data/data-store/data-streams.md) instead of indices for more efficient storage. If you enable this option, you can also enable **Data retention** to configure how long your indexed data is kept.

::::{tip}
An `index.lifecycle.rollover_alias` setting is only required if using {{ilm}} with an alias. It is unnecessary when using [Data Streams](../../data-store/data-streams.md).
::::
:::{important}
When the **Data retention** option is set, data is guaranteed to be stored for the specified retention duration. Elasticsearch is allowed at a later time to delete data older than this duration. This setting replaces any data retention settings that may be defined in an ILM policy. Refer to the [Data stream retention](/manage-data/lifecycle/data-stream/tutorial-data-stream-retention.md) tutorial to learn more.
:::

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Referring to data retention here is incorrect. Data retention as described in the referenced tutorial is only applicable if the user is using the data stream lifecycle which is an alternative to ILM. Considering this is an ILM tutorial I think we should refrain from mentioning it all together.

A user can still enable the data stream option, it's just that their data stream will be managed by ILM.

Copy link
Contributor Author

@kilfoyle kilfoyle Jun 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I've changed it. The Support person who reviewed these docs with us told me that that data retention setting is causing some confusion and that it's not currently documented anywhere. I'll try to fix that separately, but for here, rather than not mentioning data retention do you think this note would be okay instead of what I have above?

NOTE: Since you're creating an index lifecycle policy to manage indices, the Data retention option should be left disabled. Data retention is applicable only if you're using a data stream lifecycle, which is an alternative to ILM. Refer to the Data stream lifecycle to learn more.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I see. Can you explain to me what the confusion is about and what do you mean when you say Data retention?

I am asking because it's not clear to me what we mean when we say he Data retention option should be left disabled. Is there a screen in kibana or something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gmarouli Yup. I understand that it's a new Kibana setting.

On the "Create template" wizard, "Create data stream" is enabled by default and the "Data retention" setting appears but is disabled by default. If I disable "Create data stream" the "Data retention" setting disappears.

The concern that Zoia mentioned is that if someone enables "Data retention" and sets the retention to, say, 30 days, "it doesn't matter how many tiers you have, Elasticsearch will only keep the data on the hot tier for 30 days and then will delete it." So I guess we want people to understand that this setting would effectively enable data stream lifecycle and override any ILM policy they have configured.

(By the way, I'm happy to share the recording of that feedback session or my long, messy set of notes.)


data-retention-setting

Comment on lines 464 to 466
:::::{warning}
Be careful when changing either the `logs@lifecycle` or `metrics@lifecycle` policies as these typically manage many indices. In {{kib}}, the **Index Lifecycle Policies** table shows the number of indices currently associated with each policy.
:::::

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit misleading. If I am not mistaken these are managed policies, meaning they are shipped along with elasticsearch. The recommendation for such policies is that the user should not change them, ever. If a user changes them, there are no guarantees that a future upgrade will not overwrite them. In general we recommend to create a new policy and associate them with the intended index templates or the index they want.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect. I've changed this to:

screen6

(see here)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a course of action to ensure the user can recover from removing a policy? Although it's not easy to anticipate everything that could go wrong.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a course of action to ensure the user can recover from removing a policy

I think that's something we could add to the Troubleshooting section of the docs, and then have a link from this page. If that's something we should write up I'd open a separate issue for it rather than tackle it in this PR.

@kilfoyle
Copy link
Contributor Author

@gmarouli and @samxbr Thanks so much for the careful review. 🙏

If you don't mind I'd love if you can take a second look. And do let me know if you don't think the new structure makes sense (I tried to explain the rationale in this comment).

You can also manually apply a lifecycle policy to an existing index, as described here. You can do this in {{kib}} or using the {{es}} API.

::::{important}
Do not manually apply a policy that uses the rollover action. Policies that use rollover must be applied by the index template. Otherwise, the policy is not carried forward when the rollover action creates a new index.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This statement is good advice & also we apply policies with rollover all the time in Support outside the template. I believe your call out is to not apply a policy to an index where rollover has yet to occur.

Ex: index 1+2 associated to policy A, policy A removed from 1 for manual intervention (ex rehydrating frozen tier), policy A re-added to 1, 1 ILM Move Step pushed past rollover.

Copy link
Contributor Author

@kilfoyle kilfoyle Jun 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stefnestor Thanks for the explanation! I'm still not sure exactly what the warning should be (it's from the current docs), so based on what you've said I've changed it to:

WARNING: Do not manually apply a policy that uses the rollover action to an index which has not yet rolled over. Otherwise, the policy may not be carried forward when the rollover action creates a new index.

::::
* To use a policy to manage a single index, you can specify a lifecycle policy when you create the index, or apply a policy directly to an existing index.

* {{ilm-init}} policies are stored in the global cluster state and can be included in snapshots by setting `include_global_state` to `true` when you [take the snapshot](../../../deploy-manage/tools/snapshot-and-restore/create-snapshots.md). When the snapshot is restored, all of the policies in the global state are restored and any local policies with the same names are overwritten.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe also any existing policies not in previous snapshot delete, no? It's not just update existing, it'll fully reset to previous.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry but I'm really not sure. Maybe someone else can weigh in here?

1. On the **Index settings** page:
1. Configure ILM by specifying the [ILM settings](https://www.elastic.co/docs/api/doc/elasticsearch/configuration-reference/index-lifecycle-management-settings#_index_level_settings_2) to apply to the indices:
* `index.lifecycle.name` - The lifecycle policy to manage the created indices.
* `index.lifecycle.rollover_alias` - The index [alias](/manage-data/data-store/aliases.md) used for querying and managing the set of indices associated with a lifecycle policy that contains a rollover action.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point, could we consider collapsing the existing doc into this? This has screenshots but the previous does exist & is commonly linked by Support : https://www.elastic.co/docs/manage-data/lifecycle/index-lifecycle-management/tutorial-automate-rollover

Copy link
Contributor Author

@kilfoyle kilfoyle Jun 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stefnestor I'm not sure about why we have both the "Tutorial: Automate rollover" and "Configure a lifecycle policy" that I'm working on here. There's overlap for sure. I suppose the former is intended only for data streams while the latter is more general, more basic. Combining these would be tricky I think but I'm open to ideas.

One note: In the recording I have of the review of these docs the number one reported problem is that users aren't sure what to put in the "index settings" tab when they create an index template for ILM. I was asked to make sure there's an example that they can follow, so this is how it would look. If we should change this or pull in some content from that rollover tutorial I'm happy to do that.

--

Screenshot 2025-06-27 at 12 33 57 PM

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyhow, I think we'd need to keep this page and the rollover tutorial separate, or else this PR will become super complicated. Let me if you think that's okay, please.

You can do this procedure in {{kib}} or using the {{es}} API.

::::{warning}
Do not manually apply a policy that uses the rollover action to an index which has not yet rolled over. Otherwise, the policy may not be carried forward when the rollover action creates a new index.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙈 Sorry, why not as long as the template is setup to catch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the note as it appears in the current docs here:

rollover-action-note

Please let me know if you think we should rephrase it somehow, or otherwise I can just remove it.


:::{tab-item} API
:sync: api
Use the [update settings API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-put-settings) to apply a lifecycle policy to an index.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question on what you're going for vs what I'd expect this section to append:

when you apply an ILM policy it always sequentially runs from the top (so hot maybe rollover) so even if your index previously rolled over or something and you apply a new ilm policy or one for the first time, you're going back to the top of the queue. so if you may actually want the rollover_alias and/or indexing_complete mentioned before and/or do an ILM Move step to move to wherever in the new policy flow you want to be.

Copy link
Contributor Author

@kilfoyle kilfoyle Jun 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This content comes from the current docs: Apply a lifecycle policy manually section which is part of the "Configure a lifecycle policy" page. As Sam explained here, it doesn't seem to fit on that page because if I create an index with the right name, the ILM policy will be applied automatically.

I thought we could move this "Apply a lifecycle policy manually" content to be a separate page, describing the simplest case of someone adding an ILM policy to an index that doesn't already have one. We could also just remove the content if it's not useful. If we want something more complex like using indexing_complete, ILM Move, etc., I can document that but it would help me a lot if someone can demo the procedure since I'm a complete ILM amateur. :-)

- id: elasticsearch
---

# View the lifecycle status of an index [view-lifecycle-status]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I may request there is no built-in way to view this across all indices (only filter index management to a phase/errors but no aggregate stats) so I had requested https://github.com/elastic/elasticsearch/pull/99612/files which Dev didn't want to document before rather than improving the product but Support still ends up sending out that JQ as our only option for aggrgate "how's it doing"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about the ILM indicator in the health report? This should report any stagnating indices. Would that be sufficient?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hypothetically yes, but in practice it's a good data point but an insufficient overview for Support due to bugs

@stefnestor
Copy link
Contributor

👋 @kilfoyle hiya

I'm feeling a little frazzled at my EOD today so sorry if any of my comments below or in-line are miss-not-hit.

Apologies if it's out of scope of what you're intending, but I have comments which don't currently fit into the PR>files ➕ icon to just edit in-line & am not sure how to express

  • data tiers
    • could/should header#4 show on the right-side TOC?
      • image
    • Per this searchable snapshots can happen on hot+cold+frozen for quotes but only guaranteed frozen. there's various wrong logic from not understanding SS can happen on hot in the steps after these quotes

      The hot and warm tiers store regular indices, while the frozen tier stores searchable snapshots. However, the cold tier can store either regular indices or searchable snapshots.

      When data reaches the cold or frozen phases, it is automatically converted to a searchable snapshot by ILM.

    • "Move shards off the nodes to be removed from the cluster."
      • they may want to push farther down not up (e.g. warm>cold not warm>hot, maybe noting frozen would be invalid option)
      • needs to ensure to call out they need to drain all shards (confirm with CAT Shards) not just update allocation (not sure how you want to handle node attrs saying "we'll do that during plan") but calling out because frequent support volume for total_shards_per_node and/or watermark on destination to block migrations off
    • "If you do not intend to delete this data, you should manually restore each of the searchable snapshot indices to a regular index before disabling the data tier, by following these steps" fully mounted could just be ported across hot+cold, no? it doesn't have to be rehydrated.
    • "Capture a comprehensive list of index and searchable snapshot names." I believe this assumes repo is found-snapshots but users can have custom repos so this isn't sufficiently valid
    • "Remove the associated ILM policy (set it to null). If you want to apply a different ILM policy, follow the steps to Switch lifecycle policies." if you're modifying it you always set to null and then remove and then add the new ilm, you do not set <new-policy-name>
    • "Optionally, specify the desired number of replica shards." > "index.lifecycle.rollover_alias": "<alias-for-rollover>": we usually don't re-add a rollover alias because we're not ingesting more into the index, instead we usually set "index.lifecycle.indexing_complete": true to bypass the potential new ilm policy's rollover action (which noop if NA)
    • FWIW
    • these are not equal nor even generally related sub-sections. IMO maybe "remove a data tier" should be a par-header since it applies across the board saying something like "to remove a data tier we recommend draining off shards first to avoid data loss, ECE+ECK+ECH will enforce this"
      • image
      • if you disagree then FWIW "Elastic Cloud Hosted and Elastic Cloud Enterprise try to move all data from the nodes that are removed during plan changes." also applies to ECK software-side.

@kilfoyle
Copy link
Contributor Author

kilfoyle commented Jun 27, 2025

Thanks @stefnestor. For your comments above about the Data Tiers docs I've opened an issue: #1956. I'd need guidance from you or others about exactly what doc updates to make based on the questions in that issue, but for now your feedback is stored safely. :-)

I've tried to address your other comments but I had a few questions. Please reply whenever you can (no rush :-) ).

@kilfoyle kilfoyle requested a review from stefnestor June 27, 2025 17:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants