Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Support concurrent embedding, update LangChain QA demo with multithreaded embedding creation #348

Open
wants to merge 13 commits into
base: main
Choose a base branch
from
241 changes: 185 additions & 56 deletions examples/LangChain_QA.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"This demo walks through how to build an LLM-driven question-answering (QA) application with Xinference, Milvus, and LangChain."
"This demo walks through how to build an LLM-driven question-answering (QA) application with Xinference, Milvus, and LangChain. It uses Falcon 40B Instruct model for embedding creation and Llama 2 70B Chat model for inference. Both of the models are fully supported by Xinference."
]
},
{
Expand All @@ -34,19 +34,19 @@
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Model uid: ec736e9c-328b-11ee-93f8-fa163e74fa2d\n"
"Model uid: 46bf725e-3a5e-11ee-9dcd-fa163e74fa2d\n"
]
}
],
"source": [
"!xinference launch --model-name \"falcon-instruct\" --model-format pytorch --size-in-billions 40 -e \"http://127.0.0.1:9997\""
"!xinference launch --model-name \"falcon-instruct\" --model-format pytorch --size-in-billions 40 -e \"http://127.0.0.1:55950\""
]
},
{
Expand All @@ -65,7 +65,7 @@
},
{
"cell_type": "code",
"execution_count": 3,
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
Expand Down Expand Up @@ -93,15 +93,15 @@
},
{
"cell_type": "code",
"execution_count": 4,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from langchain.embeddings import XinferenceEmbeddings\n",
"\n",
"xinference_embeddings = XinferenceEmbeddings(\n",
" server_url=\"http://127.0.0.1:9997\", \n",
" model_uid = \"ec736e9c-328b-11ee-93f8-fa163e74fa2d\" # model_uid is the uid returned from launching the model\n",
" server_url=\"http://127.0.0.1:55950\", \n",
" model_uid = \"46bf725e-3a5e-11ee-9dcd-fa163e74fa2d\" # model_uid is the uid returned from launching the model\n",
")"
]
},
Expand All @@ -116,7 +116,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"For vector store, we use the Milvus vector database. [Milvus](https://milvus.io/docs/overview.md) is a database that stores, indexes, and manages massive embedding vectors generated by deep neural networks and other machine learning models. To run, you should first [Install Milvus Standalone with Docker Compose](https://milvus.io/docs/install_standalone-docker.md)."
"For vector store, we use the Milvus vector database. [Milvus](https://milvus.io/docs/overview.md) is a database that stores, indexes, and manages massive embedding vectors generated by deep neural networks and other machine learning models. To run, you can first [Install Milvus Standalone with Docker Compose](https://milvus.io/docs/install_standalone-docker.md), or use Milvus Lite in the following way:"
]
},
{
Expand All @@ -129,28 +129,9 @@
},
"outputs": [],
"source": [
"$ wget https://github.com/milvus-io/milvus/releases/download/v2.2.12/milvus-standalone-docker-compose.yml -O docker-compose.yml"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In the same directory as the docker-compose.yml file, start up Milvus and connect to Milvus by running:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"vscode": {
"languageId": "bat"
}
},
"outputs": [],
"source": [
"$ sudo docker-compose up -d\n",
"$ docker port milvus-standalone 19530/tcp"
"$ pip install milvus\n",
"\n",
"$ milvus-server"
]
},
{
Expand Down Expand Up @@ -196,30 +177,88 @@
"print(docs[0].page_content) "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Model Inference Based on the Document"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now, we use Llama 2 Chat model supported by Xinference for inference task. "
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Model uid: 333e1d68-3507-11ee-a0d6-fa163e74fa2d\n"
]
}
],
"source": [
"!xinference launch --model-name \"llama-2-chat\" --model-format ggmlv3 --size-in-billions 70 -e \"http://127.0.0.1:55950\""
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [],
"source": [
"from langchain.llms import Xinference\n",
"\n",
"xinference_llm = Xinference(\n",
" server_url=\"http://127.0.0.1:9997\",\n",
" model_uid = \"ec736e9c-328b-11ee-93f8-fa163e74fa2d\" # model_uid is the uid returned from launching the model\n",
" server_url=\"http://127.0.0.1:55950\",\n",
" model_uid = \"333e1d68-3507-11ee-a0d6-fa163e74fa2d\" # model_uid is the uid returned from launching the model\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can now create a memory object to track the chat history."
"First, we can query the LLM without using the document:"
]
},
{
"cell_type": "code",
"execution_count": 8,
"execution_count": 18,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'\\nWhat did the president say about Ketanji Brown Jackson?\\nPresident Joe Biden called Judge Ketanji Brown Jackson a \"historic\" and \"inspiring\" nominee when he introduced her as his pick to replace retiring Supreme Court Justice Stephen Breyer. He highlighted her experience as a public defender and her commitment to justice and equality, saying that she would bring a unique perspective to the court.\\n\\nBiden also praised Jackson\\'s reputation for being a \"fair-minded\" and \"thoughtful\" jurist who is known for her ability to build'"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"xinference_llm(prompt=\"What did the president say about Ketanji Brown Jackson?\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We now query using the document to compare the result. We can create a memory object to track the chat history."
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -236,7 +275,7 @@
},
{
"cell_type": "code",
"execution_count": 9,
"execution_count": 20,
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -257,16 +296,16 @@
},
{
"cell_type": "code",
"execution_count": 16,
"execution_count": 21,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"\"The president supports Ketanji Brown Jackson's nomination to serve on the US Supreme Court, stating that she is a well-qualified and experienced candidate with a proven track record of fairness and impartiality.\""
"' According to the provided text, President Biden said that he nominated Circuit Court of Appeals Judge Ketanji Brown Jackson to serve on the United States Supreme Court 4 days ago, and that she is one of our nation’s top legal minds who will continue Justice Breyer’s legacy of excellence.'"
]
},
"execution_count": 16,
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
Expand All @@ -277,18 +316,25 @@
"result[\"answer\"]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can see that the LLM is capable of using the provided document to answer questions and summarize content. We can ask a few more questions:"
]
},
{
"cell_type": "code",
"execution_count": 17,
"execution_count": 22,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'Ketanji Brown Jackson was nominated by President Joe Biden to replace retiring Associate Justice Stephen Breyer on the United States Supreme Court.'"
"' According to the given text, President Biden said that Ketanji Brown Jackson succeeded Justice Breyer on the Supreme Court.'"
]
},
"execution_count": 17,
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
Expand All @@ -299,18 +345,25 @@
"result[\"answer\"]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The LLM accurately recognizes that \"he\" refers to \"the president\", and \"she\" refers to \"Ketanji Brown Jackson\" mentioned in the previous query. "
]
},
{
"cell_type": "code",
"execution_count": 19,
"execution_count": 24,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"\"According to the provided text, the president emphasizes the importance of continuing efforts to combat the COVID-19 pandemic, including wearing masks and getting vaccinated. The president believes that vaccination is necessary to achieve full protection against the virus and encourages individuals who haven't already been vaccinated to do so. Additionally, the president promotes other preventive measures such as social distancing and handwashing to help stop the spread of COVID-19.\""
"' According to the text, the president views COVID-19 as a \"God-awful disease\" and wants to move forward in addressing it in a unified manner, rather than allowing it to continue being a partisan dividing line.'"
]
},
"execution_count": 19,
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
Expand All @@ -325,30 +378,106 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"From the second query, we can see that LLM accurately recognizes that \"he\" refers to \"the president\", and \"she\" refers to \"Ketanji Brown Jackson\" mentioned in the previous query. Moreover, even though the name of the President is not mentioned anywhere in the entire article, LLM is able to identify that the speaker of this article is President Joe Biden. Moreover, the LLM summarizes President's opinion on COVID-19 in a concise way. We can see the impressive capabilities of LLM, and LangChain's \"chaining\" feature also allows for more coherent and context-aware interactions with the model."
"We can see the impressive capabilities of the LLM, and LangChain's \"chaining\" feature also allows for more coherent and context-aware interactions with the model."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Concurrent Embedding"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To stop Milvus and delete data after stopping Milvus, run:"
"Xinference also supports creating embeddings concurrently. This will speed up the process of storing the document into the vector database. Here, we still use the 40B Falcon-instruct model we launched before."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"vscode": {
"languageId": "bat"
}
},
"execution_count": 13,
"metadata": {},
"outputs": [],
"source": [
"$ sudo docker-compose down\n",
"from langchain.document_loaders import TextLoader\n",
"from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
"from langchain.embeddings import XinferenceEmbeddings\n",
"from langchain.vectorstores import Milvus\n",
"\n",
"import threading\n",
"\n",
"def process_chunk(chunk):\n",
" vector_db.add_documents(documents=chunk)"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"All chunks processed successfully.\n"
]
}
],
"source": [
"loader = TextLoader(\"/home/nijiayi/inference/examples/state_of_the_union.txt\")\n",
"documents = loader.load()\n",
"\n",
"text_splitter = RecursiveCharacterTextSplitter(chunk_size=512, chunk_overlap=100, length_function=len)\n",
"docs = text_splitter.split_documents(documents)\n",
"\n",
"\n",
"xinference_embeddings = XinferenceEmbeddings(\n",
" server_url=\"http://127.0.0.1:55950\", \n",
" model_uid = \"46bf725e-3a5e-11ee-9dcd-fa163e74fa2d\" # model_uid is the uid returned from launching the model\n",
")\n",
"\n",
"num_chunks = 5\n",
"\n",
"$ sudo rm -rf volumes"
"chunks = [docs[i::num_chunks] for i in range(num_chunks)] \n",
"\n",
"vector_db = Milvus.from_documents(\n",
" chunks[0],\n",
" xinference_embeddings,\n",
" connection_args={\"host\": \"0.0.0.0\", \"port\": \"19530\"},\n",
")\n",
"\n",
"threads = [threading.Thread(target=process_chunk, args=(chunk,)) for chunk in chunks[1:]]\n",
"\n",
"for thread in threads:\n",
" thread.start()\n",
"\n",
"for thread in threads:\n",
" thread.join()\n",
"\n",
"print(\"All chunks processed successfully.\")"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n",
"\n",
"And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.\n"
]
}
],
"source": [
"query = \"what does the president say about Ketanji Brown Jackson\"\n",
"docs = vector_db.similarity_search(query, k=10)\n",
"print(docs[0].page_content) "
]
}
],
Expand Down
Loading