You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: README.md
+26-15
Original file line number
Diff line number
Diff line change
@@ -1,5 +1,5 @@
1
1
<h4align="center">
2
-
<ahref="https://www.uptrain.ai">
2
+
<ahref="https://uptrain.ai">
3
3
<imgalt="Logo of UpTrain - an open-source platform to evaluate and improve LLM applications"src="https://github.com/uptrain-ai/uptrain/assets/108270398/b6a4905f-63fd-47ab-a894-1026a6669c86"/>
4
4
</a>
5
5
</h4>
@@ -21,7 +21,7 @@
21
21
<imgsrc="https://github.com/uptrain-ai/uptrain/assets/108270398/10d0faeb-c4f8-422f-a01e-49a891fa7ada"alt="Demo of UpTrain's LLM evaluations with scores for hallucinations, retrieved-context quality, response tonality for a customer support chatbot"/>
22
22
</h4>
23
23
24
-
**[UpTrain](https://www.uptrain.ai)** is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured checks (covering language, code, embedding use-cases), perform root cause analysis on failure cases and give insights on how to resolve them.
24
+
**[UpTrain](https://uptrain.ai)** is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured checks (covering language, code, embedding use-cases), perform root cause analysis on failure cases and give insights on how to resolve them.
25
25
26
26
<br />
27
27
@@ -55,14 +55,17 @@ UpTrain provides tons of ways to **customize evaluations**. You can customize ev
55
55
56
56
Support for **40+ operators** such as BLEU, ROUGE, Embeddings Similarity, Exact match, etc.
UpTrain Dashboard is a web-based interface that runs on your **local machine**. You can use the dashboard to evaluate your LLM applications, view the results, and perform root cause analysis.
61
+
58
62
59
63
### Coming Soon:
60
64
61
-
1. Experiment Dashboards
62
-
2. Collaborate with your team
63
-
3. Embedding visualization via UMAP and Clustering
64
-
4. Pattern recognition among failure cases
65
-
5. Prompt improvement suggestions
65
+
1. Collaborate with your team
66
+
2. Embedding visualization via UMAP and Clustering
67
+
3. Pattern recognition among failure cases
68
+
4. Prompt improvement suggestions
66
69
67
70
<br />
68
71
@@ -71,18 +74,19 @@ Support for **40+ operators** such as BLEU, ROUGE, Embeddings Similarity, Exact
71
74
72
75
| Eval | Description |
73
76
| ---- | ----------- |
74
-
|[Reponse Completeness](https://docs.uptrain.ai/predefined-evaluations/response-quality/response-completeness)| Grades whether the response has answered all the aspects of the question specified. |
75
-
|[Reponse Conciseness](https://docs.uptrain.ai/predefined-evaluations/response-quality/response-conciseness)| Grades how concise the generated response is or if it has any additional irrelevant information for the question asked. |
76
-
|[Reponse Relevance](https://docs.uptrain.ai/predefined-evaluations/response-quality/response-relevance)| Grades how relevant the generated context was to the question specified.|
77
-
|[Reponse Validity](https://docs.uptrain.ai/predefined-evaluations/response-quality/response-validity)| Grades if the response generated is valid or not. A response is considered to be valid if it contains any information.|
78
-
|[Reponse Consistency](https://docs.uptrain.ai/predefined-evaluations/response-quality/response-consistency)| Grades how consistent the response is with the question asked as well as with the context provided.|
77
+
|[Response Completeness](https://docs.uptrain.ai/predefined-evaluations/response-quality/response-completeness)| Grades whether the response has answered all the aspects of the question specified. |
78
+
|[Response Conciseness](https://docs.uptrain.ai/predefined-evaluations/response-quality/response-conciseness)| Grades how concise the generated response is or if it has any additional irrelevant information for the question asked. |
79
+
|[Response Relevance](https://docs.uptrain.ai/predefined-evaluations/response-quality/response-relevance)| Grades how relevant the generated context was to the question specified.|
80
+
|[Response Validity](https://docs.uptrain.ai/predefined-evaluations/response-quality/response-validity)| Grades if the response generated is valid or not. A response is considered to be valid if it contains any information.|
81
+
|[Response Consistency](https://docs.uptrain.ai/predefined-evaluations/response-quality/response-consistency)| Grades how consistent the response is with the question asked as well as with the context provided.|
79
82
80
83
<imgwidth="1088"alt="quality of retrieved context and response groundedness"src="https://github.com/uptrain-ai/uptrain/assets/43818888/a7e384a3-c857-4a71-a938-7a2a70f8db1e">
81
84
85
+
82
86
| Eval | Description |
83
87
| ---- | ----------- |
84
88
|[Context Relevance](https://docs.uptrain.ai/predefined-evaluations/context-awareness/context-relevance)| Grades how relevant the context was to the question specified. |
85
-
|[Context Utilization](https://docs.uptrain.ai/predefined-evaluations/context-awareness/context-utilization)| Grades how complete the generated response was for the question specified given the information provided in the context. |
89
+
|[Context Utilization](https://docs.uptrain.ai/predefined-evaluations/context-awareness/context-utilization)| Grades how complete the generated response was for the question specified, given the information provided in the context. |
86
90
|[Factual Accuracy](https://docs.uptrain.ai/predefined-evaluations/context-awareness/factual-accuracy)| Grades whether the response generated is factually correct and grounded by the provided context.|
87
91
|[Context Conciseness](https://docs.uptrain.ai/predefined-evaluations/context-awareness/context-conciseness)| Evaluates the concise context cited from an original context for irrelevant information.
88
92
|[Context Reranking](https://docs.uptrain.ai/predefined-evaluations/context-awareness/context-reranking)| Evaluates how efficient the reranked context is compared to the original context.|
@@ -91,7 +95,7 @@ Support for **40+ operators** such as BLEU, ROUGE, Embeddings Similarity, Exact
91
95
92
96
| Eval | Description |
93
97
| ---- | ----------- |
94
-
|[Language Features](https://docs.uptrain.ai/predefined-evaluations/language-quality/fluency-and-coherence)| Grades whether the response has answered all the aspects of the question specified. |
98
+
|[Language Features](https://docs.uptrain.ai/predefined-evaluations/language-quality/fluency-and-coherence)| Grades the quality and effectiveness of language in a response, focusing on factors such as clarity, coherence, conciseness, and overall communication. |
95
99
|[Tonality](https://docs.uptrain.ai/predefined-evaluations/code-evals/code-hallucination)| Grades whether the generated response matches the required persona's tone |
96
100
97
101
<imgwidth="1088"alt="language quality of the response"src="https://github.com/uptrain-ai/uptrain/assets/36454110/2fba9f0b-71b3-4d90-90f8-16ef38cef3ab">
@@ -123,9 +127,15 @@ Support for **40+ operators** such as BLEU, ROUGE, Embeddings Similarity, Exact
123
127
124
128
| Eval | Description |
125
129
| ---- | ----------- |
126
-
|[Prompt Injection](https://docs.uptrain.ai/predefined-evaluations/safeguarding/prompt-injection)| Grades whether the generated response is leaking any system prompt. |
130
+
|[Prompt Injection](https://docs.uptrain.ai/predefined-evaluations/safeguarding/prompt-injection)| Grades whether the user's prompt is an attempt to make the LLM reveal its system prompts. |
127
131
|[Jailbreak Detection](https://docs.uptrain.ai/predefined-evaluations/safeguarding/jailbreak)| Grades whether the user's prompt is an attempt to jailbreak (i.e. generate illegal or harmful responses). |
128
132
133
+
<imgwidth="1088"alt="evaluate the clarity of user queries"src="https://github.com/uptrain-ai/uptrain/assets/36454110/50ed622f-0b92-468c-af48-2391ff6ab8e0">
134
+
135
+
| Eval | Description |
136
+
| ---- | ----------- |
137
+
|[Sub-Query Completeness](https://docs.uptrain.ai/predefined-evaluations/query-quality/sub-query-completeness)| Evaluate whether all of the sub-questions generated from a user's query, taken together, cover all aspects of the user's query or not |
138
+
129
139
<br />
130
140
131
141
# Get started 🙌
@@ -147,6 +157,7 @@ cd uptrain
147
157
# Run UpTrain
148
158
bash run_uptrain.sh
149
159
```
160
+
> **_NOTE:_** UpTrain Dashboard is currently in **Beta version**. We would love your feedback to improve it.
Copy file name to clipboardexpand all lines: docs/dashboard/getting_started.mdx
+3
Original file line number
Diff line number
Diff line change
@@ -12,6 +12,7 @@ You can use the dashboard to evaluate your LLM applications, view the results, m
12
12
13
13
<Note>Before you start, ensure you have docker installed on your machine. If not, you can install it from [here](https://docs.docker.com/get-docker/). </Note>
14
14
15
+
15
16
### How to install?
16
17
17
18
The following commands will download the UpTrain dashboard and start it on your local machine:
@@ -24,6 +25,8 @@ cd uptrain
24
25
bash run_uptrain.sh
25
26
```
26
27
28
+
<Note>UpTrain Dashboard is currently in Beta version. We would love your feedback to improve it.</Note>
Copy file name to clipboardexpand all lines: docs/predefined-evaluations/ground-truth-comparison/response-matching.mdx
+2-4
Original file line number
Diff line number
Diff line change
@@ -1,11 +1,9 @@
1
1
---
2
2
title: Response Matching
3
-
description: Grades how relevant the generated context was to the question specified.
3
+
description: Grades how well the response generated by the LLM aligns with the provided ground truth.
4
4
---
5
5
6
-
Response relevance is the measure of how relevant the generated response is to the question asked.
7
-
8
-
It helps evaluate how well the response addresses the question asked and if it contains any additional information that is not relevant to the question asked.
6
+
Response Matching compares the LLM-generated text with the gold (ideal) response using the defined score metric.
0 commit comments