Skip to content

Improve LinstorSR.py to handle thick SR creation #85

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: 3.2.12-8.3
Choose a base branch
from

Conversation

rushikeshjadhav
Copy link

@rushikeshjadhav rushikeshjadhav commented Apr 21, 2025

In some cases thick SR creation may fail due to get_online_hosts as the metrics could take time. Thus, changed the mechanism to get_enabled_hosts.

Comment on lines 589 to 596
for attempt in range(3):
online_hosts = util.get_online_hosts(self.session)
if len(online_hosts) >= len(host_adresses):
util.SMlog("DEBUG: All hosts online")
break # Ok and proceed to create
else:
util.SMlog("DEBUG: Online host: {} ; Adresses: {}".format(online_hosts, host_adresses))
time.sleep(15) # Sleep and retry
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to modify the logic of get_online_hosts instead (without using host metrics, just enabled flag I guess) and not try to add a sleep call during creation.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we look for reason on why host metrics were used here? Or we can introduce get_enabled_hosts to be at par with semantics and use it while leaving get_online_hosts as is. Will go with your call.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function get_online_hosts predate the git history, it won't be easy to know why

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rushikeshjadhav I agree with you, a new function is probably a good idea. This would avoid breaking the use of the current function in cleanup.py.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By looking at code, online_hosts seems necessary to fetch hostname and its ip address. There is verification logic about Multiple hosts with same hostname.

Further it calls _prepare_sr_on_all_hosts with enabled=True which subsequently calls _prepare_sr that works if host is enabled. So checking for enabled may be redundant.

This might have to be verified with someone from xapi that if fetching ip address matters in cases of enabled vs online. We can switch to enabled if ip address and hostnames are available at this stage or have to revert to online and wait for it.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think enabled = True should be a sufficient condition for hostname and address to be available. Host_metrics.live is true depending on connectivity with the coordinator host checked during heartbeat, so if this is important instead of just the fact of availability of the data in the database, it should be used instead.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@last-genius Thanks. I have updated the code accordingly.

@rushikeshjadhav rushikeshjadhav force-pushed the 3.2.3-8.3-online-hosts branch 2 times, most recently from 697676c to 2b681af Compare April 28, 2025 08:47
@Wescoeur Wescoeur changed the base branch from 3.2.3-8.3 to 3.2.12-8.3 April 29, 2025 07:33
@Wescoeur
Copy link
Member

I changed the base branch to 3.2.12-8.3, can you rebase on it? Thanks! :)

drivers/util.py Outdated
"""
Returns a list of host refs that are enabled in the pool.
"""
enabled_hosts = [host[0] for host in session.xenapi.host.get_all_records_where('field "enabled" = "true"').items()]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
enabled_hosts = [host[0] for host in session.xenapi.host.get_all_records_where('field "enabled" = "true"').items()]
enabled_hosts = list(session.xenapi.host.get_all_records_where('field "enabled" = "true"').keys())

Wouldn't that do the trick just fine?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pythonic! Done.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if only refs are needed, session.xenapi.host.get_all_where('field "enabled" = "true"') is available, no need to get the records

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if only refs are needed, session.xenapi.host.get_all_where('field "enabled" = "true"') is available, no need to get the records

I'm getting XenAPI.Failure: ['MESSAGE_METHOD_UNKNOWN', 'host.get_all_where'], is it due to my pending updates?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, are you running 8.2? Apparently this was only added in March 2024, so should probably keep the get_all_records_where for now, could just leave a note to upgrade it in the future (and for the neighboring functions too)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its 8.3.0 with xapi-core-24.19.2-1.9.xcpng8.3.x86_64

…In some drivers e.g. Linstor, we need to ensure that hosts are enabeld before performing operations, hence this function is needed.

Signed-off-by: Rushikesh Jadhav <[email protected]>
…as the metrics could take time.

Thus, changed the mechanism to `get_enabled_hosts`.

Signed-off-by: Rushikesh Jadhav <[email protected]>
@rushikeshjadhav rushikeshjadhav force-pushed the 3.2.3-8.3-online-hosts branch from 2b681af to 2395ddb Compare April 29, 2025 08:21
@Nambrok
Copy link

Nambrok commented Apr 29, 2025

@rushikeshjadhav Have you tested this code in real condition already?

@rushikeshjadhav
Copy link
Author

@rushikeshjadhav Have you tested this code in real condition already?

Yes, it's being referred and tested in xcp-ng/xcp-ng-tests#282. It's in Draft and will convert once done.

@stormi stormi removed their request for review April 29, 2025 15:55
@stormi
Copy link
Member

stormi commented Apr 29, 2025

I won't have time to review this one.

@rushikeshjadhav rushikeshjadhav removed the request for review from last-genius April 30, 2025 13:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants