Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

provider: refactor bidengine startup #281

Closed
troian opened this issue Feb 5, 2025 · 1 comment · Fixed by akash-network/provider#267
Closed

provider: refactor bidengine startup #281

troian opened this issue Feb 5, 2025 · 1 comment · Fixed by akash-network/provider#267
Assignees
Labels
repo/provider Akash provider-services repo issues

Comments

@troian
Copy link
Member

troian commented Feb 5, 2025

The bidengine on start up queries for open orders from the blockchain.
Due to current implementation of the store and queries in the blockchain it has to iterate over all orders since genesis to find open.
Last time I checked there were over 500M total orders in the store.

The provider order query implementation causes a few issues:

  • Bid engine starts after cluster service, and if bidengine startup fails, and if there is any existing deployment in pending state (due to startup check), the cluster service tears down all pending deployments.
  • bid engine failure at start-up often happens due to context timeout or RPC node is being unresponsive due to previous orders query.

Fixes

  1. Disable teardown of pending leases in cluster service
  2. refactor bidengine in such a way that querying for existing order is done in separate goroutine after bid engine service starts.
    set limit for orders pagination to 2000-3000 instead of 10000
    it allows two:
    • bid engine to start immediately and bid on all new orders
    • prevent unnecessary restarts of provider service due to query failure
@troian troian self-assigned this Feb 5, 2025
@troian troian added the repo/provider Akash provider-services repo issues label Feb 5, 2025
@troian troian moved this to In Progress (prioritized) in Core Product and Engineering Roadmap Feb 5, 2025
@brewsterdrinkwater brewsterdrinkwater moved this from In Progress (prioritized) to In Test (or staging) in Core Product and Engineering Roadmap Feb 5, 2025
@brewsterdrinkwater
Copy link
Collaborator

Feb 5th, 2025:

  • Thoroughly tested feature by a few providers.
  • Updates will be shared with providers

troian added a commit to akash-network/provider that referenced this issue Feb 5, 2025
due to increased number of orders (over 500M)
querying via traditional way is ineffective due to
- original implementation does only one query with limit set to 10000
  which may put RPC into overloaded mode
- this leads to provider service restart followed by teardown of all
  leases

refs troian/pubsub#3
fixes akash-network/support#281

Signed-off-by: Artur Troian <[email protected]>
troian added a commit to akash-network/provider that referenced this issue Feb 5, 2025
due to increased number of orders (over 500M)
querying via traditional way is ineffective due to
- original implementation does only one query with limit set to 10000
  which may put RPC into overloaded mode
- this leads to provider service restart followed by teardown of all
  leases

refs troian/pubsub#3
fixes akash-network/support#281

Signed-off-by: Artur Troian <[email protected]>
troian added a commit to akash-network/provider that referenced this issue Feb 5, 2025
due to increased number of orders (over 500M)
querying via traditional way is ineffective due to
- original implementation does only one query with limit set to 10000
  which may put RPC into overloaded mode
- this leads to provider service restart followed by teardown of all
  leases

refs troian/pubsub#3
fixes akash-network/support#281

Signed-off-by: Artur Troian <[email protected]>
troian added a commit to akash-network/provider that referenced this issue Feb 5, 2025
due to increased number of orders (over 500M)
querying via traditional way is ineffective due to
- original implementation does only one query with limit set to 10000
  which may put RPC into overloaded mode
- this leads to provider service restart followed by teardown of all
  leases

refs troian/pubsub#3
fixes akash-network/support#281

Signed-off-by: Artur Troian <[email protected]>
troian added a commit to akash-network/provider that referenced this issue Feb 5, 2025
due to increased number of orders (over 500M)
querying via traditional way is ineffective due to
- original implementation does only one query with limit set to 10000
  which may put RPC into overloaded mode
- this leads to provider service restart followed by teardown of all
  leases

refs troian/pubsub#3
fixes akash-network/support#281

Signed-off-by: Artur Troian <[email protected]>
troian added a commit to akash-network/provider that referenced this issue Feb 5, 2025
due to increased number of orders (over 500M)
querying via traditional way is ineffective due to
- original implementation does only one query with limit set to 10000
  which may put RPC into overloaded mode
- this leads to provider service restart followed by teardown of all
  leases

refs troian/pubsub#3
fixes akash-network/support#281

Signed-off-by: Artur Troian <[email protected]>
troian added a commit to akash-network/provider that referenced this issue Feb 5, 2025
due to increased number of orders (over 500M)
querying via traditional way is ineffective due to
- original implementation does only one query with limit set to 10000
  which may put RPC into overloaded mode
- this leads to provider service restart followed by teardown of all
  leases

refs troian/pubsub#3
fixes akash-network/support#281

Signed-off-by: Artur Troian <[email protected]>
troian added a commit to akash-network/provider that referenced this issue Feb 5, 2025
due to increased number of orders (over 500M)
querying via traditional way is ineffective due to
- original implementation does only one query with limit set to 10000
  which may put RPC into overloaded mode
- this leads to provider service restart followed by teardown of all
  leases

refs troian/pubsub#3
fixes akash-network/support#281

Signed-off-by: Artur Troian <[email protected]>
troian added a commit to akash-network/provider that referenced this issue Feb 5, 2025
due to increased number of orders (over 500M)
querying via traditional way is ineffective due to
- original implementation does only one query with limit set to 10000
  which may put RPC into overloaded mode
- this leads to provider service restart followed by teardown of all
  leases

refs troian/pubsub#3
fixes akash-network/support#281

Signed-off-by: Artur Troian <[email protected]>
troian added a commit to akash-network/provider that referenced this issue Feb 5, 2025
due to increased number of orders (over 500M)
querying via traditional way is ineffective due to
- original implementation does only one query with limit set to 10000
  which may put RPC into overloaded mode
- this leads to provider service restart followed by teardown of all
  leases

refs troian/pubsub#3
fixes akash-network/support#281

Signed-off-by: Artur Troian <[email protected]>
@github-project-automation github-project-automation bot moved this from In Test (or staging) to Released (in Prod) in Core Product and Engineering Roadmap Feb 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
repo/provider Akash provider-services repo issues
Projects
Status: Released (in Prod)
Development

Successfully merging a pull request may close this issue.

2 participants