Skip to content

Commit 987f0f3

Browse files
Readme fixes (#593)
* fixed readme for evals tutorials * fix uptrain website links * Update README.md * Update README.md * Update README.md * minor fixes --------- Co-authored-by: Dhruv Chawla <[email protected]>
1 parent 564dacb commit 987f0f3

File tree

17 files changed

+212
-335
lines changed

17 files changed

+212
-335
lines changed

README.md

+26-15
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
<h4 align="center">
2-
<a href="https://www.uptrain.ai">
2+
<a href="https://uptrain.ai">
33
<img alt="Logo of UpTrain - an open-source platform to evaluate and improve LLM applications" src="https://github.com/uptrain-ai/uptrain/assets/108270398/b6a4905f-63fd-47ab-a894-1026a6669c86"/>
44
</a>
55
</h4>
@@ -21,7 +21,7 @@
2121
<img src="https://github.com/uptrain-ai/uptrain/assets/108270398/10d0faeb-c4f8-422f-a01e-49a891fa7ada" alt="Demo of UpTrain's LLM evaluations with scores for hallucinations, retrieved-context quality, response tonality for a customer support chatbot"/>
2222
</h4>
2323

24-
**[UpTrain](https://www.uptrain.ai)** is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured checks (covering language, code, embedding use-cases), perform root cause analysis on failure cases and give insights on how to resolve them.
24+
**[UpTrain](https://uptrain.ai)** is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured checks (covering language, code, embedding use-cases), perform root cause analysis on failure cases and give insights on how to resolve them.
2525

2626
<br />
2727

@@ -55,14 +55,17 @@ UpTrain provides tons of ways to **customize evaluations**. You can customize ev
5555

5656
Support for **40+ operators** such as BLEU, ROUGE, Embeddings Similarity, Exact match, etc.
5757

58+
<img width="1088" alt="Interactive Dashboards" src="https://github.com/uptrain-ai/uptrain/assets/36454110/eb1c8239-dd99-4e66-ba8a-cbaee2beec10">
59+
60+
UpTrain Dashboard is a web-based interface that runs on your **local machine**. You can use the dashboard to evaluate your LLM applications, view the results, and perform root cause analysis.
61+
5862

5963
### Coming Soon:
6064

61-
1. Experiment Dashboards
62-
2. Collaborate with your team
63-
3. Embedding visualization via UMAP and Clustering
64-
4. Pattern recognition among failure cases
65-
5. Prompt improvement suggestions
65+
1. Collaborate with your team
66+
2. Embedding visualization via UMAP and Clustering
67+
3. Pattern recognition among failure cases
68+
4. Prompt improvement suggestions
6669

6770
<br />
6871

@@ -71,18 +74,19 @@ Support for **40+ operators** such as BLEU, ROUGE, Embeddings Similarity, Exact
7174

7275
| Eval | Description |
7376
| ---- | ----------- |
74-
|[Reponse Completeness](https://docs.uptrain.ai/predefined-evaluations/response-quality/response-completeness) | Grades whether the response has answered all the aspects of the question specified. |
75-
|[Reponse Conciseness](https://docs.uptrain.ai/predefined-evaluations/response-quality/response-conciseness) | Grades how concise the generated response is or if it has any additional irrelevant information for the question asked. |
76-
|[Reponse Relevance](https://docs.uptrain.ai/predefined-evaluations/response-quality/response-relevance)| Grades how relevant the generated context was to the question specified.|
77-
|[Reponse Validity](https://docs.uptrain.ai/predefined-evaluations/response-quality/response-validity)| Grades if the response generated is valid or not. A response is considered to be valid if it contains any information.|
78-
|[Reponse Consistency](https://docs.uptrain.ai/predefined-evaluations/response-quality/response-consistency)| Grades how consistent the response is with the question asked as well as with the context provided.|
77+
|[Response Completeness](https://docs.uptrain.ai/predefined-evaluations/response-quality/response-completeness) | Grades whether the response has answered all the aspects of the question specified. |
78+
|[Response Conciseness](https://docs.uptrain.ai/predefined-evaluations/response-quality/response-conciseness) | Grades how concise the generated response is or if it has any additional irrelevant information for the question asked. |
79+
|[Response Relevance](https://docs.uptrain.ai/predefined-evaluations/response-quality/response-relevance)| Grades how relevant the generated context was to the question specified.|
80+
|[Response Validity](https://docs.uptrain.ai/predefined-evaluations/response-quality/response-validity)| Grades if the response generated is valid or not. A response is considered to be valid if it contains any information.|
81+
|[Response Consistency](https://docs.uptrain.ai/predefined-evaluations/response-quality/response-consistency)| Grades how consistent the response is with the question asked as well as with the context provided.|
7982

8083
<img width="1088" alt="quality of retrieved context and response groundedness" src="https://github.com/uptrain-ai/uptrain/assets/43818888/a7e384a3-c857-4a71-a938-7a2a70f8db1e">
8184

85+
8286
| Eval | Description |
8387
| ---- | ----------- |
8488
|[Context Relevance](https://docs.uptrain.ai/predefined-evaluations/context-awareness/context-relevance) | Grades how relevant the context was to the question specified. |
85-
|[Context Utilization](https://docs.uptrain.ai/predefined-evaluations/context-awareness/context-utilization) | Grades how complete the generated response was for the question specified given the information provided in the context. |
89+
|[Context Utilization](https://docs.uptrain.ai/predefined-evaluations/context-awareness/context-utilization) | Grades how complete the generated response was for the question specified, given the information provided in the context. |
8690
|[Factual Accuracy](https://docs.uptrain.ai/predefined-evaluations/context-awareness/factual-accuracy)| Grades whether the response generated is factually correct and grounded by the provided context.|
8791
|[Context Conciseness](https://docs.uptrain.ai/predefined-evaluations/context-awareness/context-conciseness)| Evaluates the concise context cited from an original context for irrelevant information.
8892
|[Context Reranking](https://docs.uptrain.ai/predefined-evaluations/context-awareness/context-reranking)| Evaluates how efficient the reranked context is compared to the original context.|
@@ -91,7 +95,7 @@ Support for **40+ operators** such as BLEU, ROUGE, Embeddings Similarity, Exact
9195

9296
| Eval | Description |
9397
| ---- | ----------- |
94-
|[Language Features](https://docs.uptrain.ai/predefined-evaluations/language-quality/fluency-and-coherence) | Grades whether the response has answered all the aspects of the question specified. |
98+
|[Language Features](https://docs.uptrain.ai/predefined-evaluations/language-quality/fluency-and-coherence) | Grades the quality and effectiveness of language in a response, focusing on factors such as clarity, coherence, conciseness, and overall communication. |
9599
|[Tonality](https://docs.uptrain.ai/predefined-evaluations/code-evals/code-hallucination) | Grades whether the generated response matches the required persona's tone |
96100

97101
<img width="1088" alt="language quality of the response" src="https://github.com/uptrain-ai/uptrain/assets/36454110/2fba9f0b-71b3-4d90-90f8-16ef38cef3ab">
@@ -123,9 +127,15 @@ Support for **40+ operators** such as BLEU, ROUGE, Embeddings Similarity, Exact
123127

124128
| Eval | Description |
125129
| ---- | ----------- |
126-
|[Prompt Injection](https://docs.uptrain.ai/predefined-evaluations/safeguarding/prompt-injection) | Grades whether the generated response is leaking any system prompt. |
130+
|[Prompt Injection](https://docs.uptrain.ai/predefined-evaluations/safeguarding/prompt-injection) | Grades whether the user's prompt is an attempt to make the LLM reveal its system prompts. |
127131
|[Jailbreak Detection](https://docs.uptrain.ai/predefined-evaluations/safeguarding/jailbreak) | Grades whether the user's prompt is an attempt to jailbreak (i.e. generate illegal or harmful responses). |
128132

133+
<img width="1088" alt="evaluate the clarity of user queries" src="https://github.com/uptrain-ai/uptrain/assets/36454110/50ed622f-0b92-468c-af48-2391ff6ab8e0">
134+
135+
| Eval | Description |
136+
| ---- | ----------- |
137+
|[Sub-Query Completeness](https://docs.uptrain.ai/predefined-evaluations/query-quality/sub-query-completeness) | Evaluate whether all of the sub-questions generated from a user's query, taken together, cover all aspects of the user's query or not |
138+
129139
<br />
130140

131141
# Get started 🙌
@@ -147,6 +157,7 @@ cd uptrain
147157
# Run UpTrain
148158
bash run_uptrain.sh
149159
```
160+
> **_NOTE:_** UpTrain Dashboard is currently in **Beta version**. We would love your feedback to improve it.
150161
151162
## Using the UpTrain package
152163

docs/dashboard/evaluations.mdx

+2
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,8 @@ You can look at the complete list of UpTrain's supported metrics [here](/predefi
5555
</Step>
5656
</Steps>
5757

58+
<Note>UpTrain Dashboard is currently in Beta version. We would love your feedback to improve it.</Note>
59+
5860
<CardGroup cols={1}>
5961
<Card
6062
title="Have Questions?"

docs/dashboard/getting_started.mdx

+3
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@ You can use the dashboard to evaluate your LLM applications, view the results, m
1212

1313
<Note>Before you start, ensure you have docker installed on your machine. If not, you can install it from [here](https://docs.docker.com/get-docker/). </Note>
1414

15+
1516
### How to install?
1617

1718
The following commands will download the UpTrain dashboard and start it on your local machine:
@@ -24,6 +25,8 @@ cd uptrain
2425
bash run_uptrain.sh
2526
```
2627

28+
<Note>UpTrain Dashboard is currently in Beta version. We would love your feedback to improve it.</Note>
29+
2730
<CardGroup cols={1}>
2831
<Card
2932
title="Have Questions?"

docs/dashboard/project.mdx

+2
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,8 @@ There are 2 types of projects we support:
3838

3939
Now that you have created a project, you can run evaluations or experiment with prompts
4040

41+
<Note>UpTrain Dashboard is currently in Beta version. We would love your feedback to improve it.</Note>
42+
4143
<CardGroup cols={1}>
4244
<Card
4345
title="Have Questions?"

docs/dashboard/prompts.mdx

+2
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,8 @@ You can look at the complete list of UpTrain's supported metrics [here](/predefi
5555
</Step>
5656
</Steps>
5757

58+
<Note>UpTrain Dashboard is currently in Beta version. We would love your feedback to improve it.</Note>
59+
5860
<CardGroup cols={1}>
5961
<Card
6062
title="Have Questions?"

docs/predefined-evaluations/ground-truth-comparison/response-matching.mdx

+2-4
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,9 @@
11
---
22
title: Response Matching
3-
description: Grades how relevant the generated context was to the question specified.
3+
description: Grades how well the response generated by the LLM aligns with the provided ground truth.
44
---
55

6-
Response relevance is the measure of how relevant the generated response is to the question asked.
7-
8-
It helps evaluate how well the response addresses the question asked and if it contains any additional information that is not relevant to the question asked.
6+
Response Matching compares the LLM-generated text with the gold (ideal) response using the defined score metric.
97

108
Columns required:
119
- `question`: The question asked by the user

0 commit comments

Comments
 (0)