-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Question]: #17936
Comments
The error you're encountering, Regarding managing rate limiting when parallelizing tasks, you can use an import asyncio
from tenacity import retry, stop_after_attempt, wait_exponential
async def create_index_nodes(self, nodes, embed_model):
semaphore = asyncio.Semaphore(10) # Limit to 10 concurrent tasks
@retry(reraise=True, stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, min=4, max=10))
async def add_nodes_with_retry(index_struct, nodes):
async with semaphore:
await self.vector_index._async_add_nodes_to_index(index_struct=index_struct, nodes=nodes, show_progress=True)
try:
if self.vector_index is None:
self.vector_index = VectorStoreIndex(
nodes=nodes,
use_async=False,
embed_model=embed_model,
storage_context=self.storage_context,
show_progress=True
)
print(self.vector_index.index_struct)
print("Created new index.")
else:
print("Loading existing index and inserting nodes.")
await self.load_index(embed_model)
index_struct = self.vector_index._build_index_from_nodes(nodes)
await add_nodes_with_retry(index_struct, nodes)
except Exception as e:
print(e)
print(f"An error occurred while embedding the file: {e}") This code uses a semaphore to limit the number of concurrent tasks and the To continue talking to Dosu, mention @dosu. Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other |
@dosubot is my code correct to load existing index and add nodes in async way. I wasnt sure about the index_struct. Plus, how to add parameters for rate limit and retry |
To load an existing index and add nodes asynchronously, your approach seems mostly correct. However, ensure that the import asyncio
from tenacity import retry, stop_after_attempt, wait_exponential
async def create_index_nodes(self, nodes, embed_model):
semaphore = asyncio.Semaphore(10) # Limit to 10 concurrent tasks
@retry(reraise=True, stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, min=4, max=10))
async def add_nodes_with_retry(index_struct, nodes):
async with semaphore:
await self.vector_index._async_add_nodes_to_index(index_struct=index_struct, nodes=nodes, show_progress=True)
try:
if self.vector_index is None:
# Create a new index if not already created
self.vector_index = VectorStoreIndex(
nodes=nodes,
use_async=False,
embed_model=embed_model,
storage_context=self.storage_context,
show_progress=True
)
print(self.vector_index.index_struct)
print("Created new index.")
else:
# Load existing index and insert nodes
print("Loading existing index and inserting nodes.")
await self.load_index(embed_model)
# Ensure index_struct is correctly initialized
index_struct = self.vector_index._build_index_from_nodes(nodes)
await add_nodes_with_retry(index_struct, nodes)
except Exception as e:
print(e)
print(f"An error occurred while embedding the file: {e}") Key Points:
This setup should help manage rate limits and ensure nodes are added to the index correctly. To continue talking to Dosu, mention @dosu. Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other |
@dosubot but when im printing index_struct it is actually empty so i dont understand the use of it. Also i dont want to use Semaphore anymore, I will use my previous code , the one I sent in the beginning. I would like to know to manage rate limit (min_seconds, max_seconds) with VectorStoreIndex |
The Regarding rate limiting with To manage rate limiting without using To continue talking to Dosu, mention @dosu. Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other |
Question Validation
Question
I have this error when running the code on databricks: Client.init() got an unexpected keyword argument 'proxies'
An error occurred while embedding the file: Client.init() got an unexpected keyword argument 'proxies'
Also, I would like to know how to manage RateLimit Error since I am paralelizing the creation of embeddings.
can you help me @dosubot
This is my code
`
async def create_index_nodes(self, nodes, embed_model):
"""
Asynchronously creates an index in Qdrant based on the provided nodes,
or loads the existing index and inserts nodes if the index already exists.
The text was updated successfully, but these errors were encountered: