-
-
Notifications
You must be signed in to change notification settings - Fork 11
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #155 from lyskouski/BP-130
[#130] [BP] Benchmarking Tool. Integration tests
- Loading branch information
Showing
58 changed files
with
1,042 additions
and
289 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,4 @@ | ||
% Copyright 2023 The terCAD team. All rights reserved. | ||
% Use of this content is governed by a CC BY-NC-ND 4.0 license that can be found in the LICENSE file. | ||
|
||
\markboth{Unleashing Unparalleled Features}{Unleashing Unparalleled Features} | ||
|
||
[TBD] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,269 @@ | ||
% Copyright 2023 The terCAD team. All rights reserved. | ||
% Use of this content is governed by a CC BY-NC-ND 4.0 license that can be found in the LICENSE file. | ||
|
||
\subsection{Benchmarking Prototype} | ||
\markboth{Unleashing Features}{Benchmarking Prototype} | ||
|
||
Before adding functionality in the form of muscles to the created prototype skeleton, we need to verify its reliability. | ||
Restructuring the fundamental concepts of the application in the future would not only pose a considerable challenge | ||
but also entail a substantial effort and potential complications. | ||
|
||
|
||
\subsubsection{Providing Integration Tests} | ||
|
||
Unit tests (\ref{ut-unit}) and widget tests (\ref{widget-tests}) serve as valuable tools for assessing isolated classes, | ||
functions, or widgets. However, not all of the problems can be tackled by them. Integration tests are used to identify | ||
systemic flaws (data corruption, concurrency problems, miscommunication between services, etc.) that might not be | ||
evident in unit tests by verifying a synergy of individual assets, validating the application as a whole. | ||
Integration tests are designed to reflect the real-time performance of an application on an actual device or platform. | ||
In conclusion, they provide a vital link in the testing hierarchy by validating a collocation of various components | ||
within an application. In such a way integration tests simulate end-to-end user workflows that we've implemented and | ||
discussed earlier -- \ref{t-gherkin}. | ||
|
||
Integration tests in Flutter can be written by using \q{integration\_test}-package, \q{flutter\_driver}-package would | ||
help us to evaluate our tests on real / virtual devices and environments and track the timeline of tests execution | ||
(both packages are provided by the SDK): | ||
|
||
\begin{lstlisting}[language=yaml] | ||
## ./pubspec.yaml | ||
dev_dependencies: | ||
integration_test: | ||
sdk: flutter | ||
flutter_driver: | ||
sdk: flutter | ||
\end{lstlisting} | ||
|
||
\noindent The implementation's deference from a widget test is in a usage of the next code line, that enables tests | ||
execution on a physical device or platform: | ||
\begin{lstlisting} | ||
IntegrationTestWidgetsFlutterBinding.ensureInitialized(); | ||
\end{lstlisting} | ||
|
||
|
||
\subsubsection{Doing Performance Testing} | ||
|
||
Performance testing is a type of software testing designed to evaluate the speed, responsiveness, stability, and | ||
overall performance of an application under different conditions. It involves subjecting the application to | ||
simulated workloads and stress scenarios to assess how it behaves in terms of speed, scalability, and resource usage. | ||
Performance testing ensures that the software can handle the expected load without degradation in performance. | ||
|
||
By simulating different levels of user traffic, performance testing helps determine the application's scalability by | ||
assessing resources utilization (CPU, memory, network bandwidth, and other parameters), and identify performance | ||
bottlenecks, such as slow database queries, inefficient code, or network latency, and address these issues before | ||
they will impact users. | ||
|
||
The detailed information about performance testing can be taken from the International Software Testing Qualifications | ||
Board (ISTQB) or the Software Engineering Institute (SEI), while here we'll highlight only their types definition | ||
(\cite{Ian15}, \cite{Sag16}, \cite{Sag23}): | ||
\begin{itemize} | ||
\item Load Testing: Evaluates how an application performs under expected load conditions. It helps determine the | ||
application's response time, resource utilization, and overall stability. | ||
|
||
\item Stress Testing: Pushes the application to its limits by subjecting it to extreme conditions, such as excessive | ||
user loads or resource scarcity. It aims to identify the breaking point and understand how the application recovers | ||
from failures. | ||
|
||
\item Endurance Testing: Assesses the application's performance over an extended period to identify issues related to | ||
memory leaks, resource exhaustion, or gradual degradation in performance. | ||
|
||
\item Spike Testing: Simulates sudden spikes in user traffic to assess how the application responds to rapid changes | ||
in load. This helps uncover bottlenecks and issues related to sudden surges in demand. | ||
|
||
\item Volume Testing: Focuses on testing the application's performance with large volumes of data, such as a high | ||
number of records in a database. It helps identify scalability and performance issues associated with data volume. | ||
\end{itemize} | ||
|
||
\noindent Back to our process, it would be used the next command to evaluate performance tests: | ||
|
||
\begin{lstlisting}[language=bash] | ||
# Precondition for Web profiling | ||
chromedriver --port=4444 | ||
# Launch tests | ||
flutter drive \ | ||
--driver=test_driver/perf_driver.dart \ | ||
--target=test/performance/name_of_test.dart \ | ||
--profile | ||
\end{lstlisting} | ||
|
||
The \q{--profile}-option enables the application compilation in "profile mode" that helps the benchmark results to be | ||
closer to what will be experienced by end users. By running on a mobile device or emulator it's proposed to use | ||
\q{--no-dds}-parameter in addition, that will disable unaccessible Dart Development Service (DDS). The \q{--target} | ||
declares the scope of test executions while \q{--driver}-option does track the outcomes. The driver configuration can be | ||
taken from \href{https://docs.flutter.dev/cookbook/testing/integration/profiling}{https://docs.flutter.dev/cookbook/testing/integration/profiling}: | ||
|
||
\begin{lstlisting} | ||
// ./test_driver/perf_driver.dart | ||
import 'package:flutter_driver/flutter_driver.dart' as driver; | ||
import 'package:integration_test/integration_test_driver.dart'; | ||
|
||
Future<void> main() { | ||
return integrationDriver( | ||
responseDataCallback: (data) async { | ||
if (data != null) { | ||
final timeline = driver.Timeline.fromJson(data['timeline']); | ||
final summary = driver.TimelineSummary.summarize(timeline); | ||
await summary.writeTimelineToFile( | ||
'timeline', | ||
pretty: true, | ||
includeSummary: true, | ||
destinationDirectory: './coverage/', | ||
); | ||
} | ||
}, | ||
); | ||
} | ||
\end{lstlisting} | ||
|
||
\noindent Since it's a Widget Tests'-based approach (\ref{widget-tests}, \ref{t-gherkin}), we'll accent only on the | ||
usage of \q{traceAction}-method to store time-based metrics: | ||
|
||
\begin{lstlisting} | ||
// ./test/performance/load/creation_test.dart | ||
void main() { | ||
final binding = IntegrationTestWidgetsFlutterBinding.ensureInitialized(); | ||
testWidgets('Cover Starting Page', (WidgetTester tester) async { | ||
await binding.traceAction(() async { | ||
// ... other steps | ||
final amountField = find.byWidgetPredicate((widget) { | ||
return widget is TextField && widget.decoration?.hintText == 'Set Balance'; | ||
}); | ||
await tester.ensureVisible(amountField); | ||
await tester.tap(amountField); | ||
// In profiling mode some delay is needed: | ||
await tester.pumpAndSettle(const Duration(seconds: 1)); | ||
// await tester.pump(); | ||
await tester.enterText(amountField, '1000'); | ||
await tester.pumpAndSettle(); | ||
expect(find.text('1000'), findsOneWidget); | ||
// ... other steps | ||
}, | ||
reportKey: 'timeline', | ||
); | ||
}); | ||
} | ||
\end{lstlisting} | ||
|
||
\noindent Generated file \q{timeline.timeline.json} can be traced by \q{chrome://tracing/} in Google Chrome browser | ||
(\cref{img:perf-chrome-tracing}): | ||
|
||
\img{features/perf-chrome-tracing}{Google Chrome -- performance trace}{img:perf-chrome-tracing} | ||
|
||
\noindent The \q{timeline.timeline\_summary.json}-file can be opened in IDE as a native \q{JSON}-file and analyzed | ||
manually a performance of the application. For example, the value of \q{average\_frame\_build\_time\_millis}-parameter | ||
is recommended to be below 16 milliseconds to ensure that the app runs at 60 frames per second without glitches. Other | ||
parameters are widely described on the page -- | ||
\href{https://api.flutter.dev/flutter/flutter\_driver/TimelineSummary/summaryJson.html}{https://api.flutter.dev/flutter/flutter\_driver/TimelineSummary}. | ||
|
||
|
||
\paragraph{Load Testing} | ||
Check response time and resource utilization for the first run (Initial Setup) by creating account and budget | ||
category: | ||
|
||
\begin{lstlisting}[language=cucumber] | ||
@start | ||
Feature: Verify Initial Flow | ||
Scenario: Applying basic configuration through the start pages | ||
Given I am firstly opened the app | ||
Then I can see "Initial Setup" component | ||
When I tap "Save to Storage (Go Next)" button | ||
Then I can see "Acknowledge (Go Next)" component | ||
When I tap "Acknowledge (Go Next)" button | ||
Then I can see "Create new Account" component | ||
When I tap on 0 index of "ListSelector" fields | ||
And I tap "Bank Account" element | ||
And I enter "New Account" to "Enter Account Identifier" text field | ||
And I enter "1000" to "Set Balance" text field | ||
And I tap "Create new Account" button | ||
Then I can see "Create new Budget Category" component | ||
When I enter "New Budget" to "Enter Budget Category Name" text field | ||
And I enter "1000" to "Set Balance" text field | ||
When I tap "Create new Budget Category" button | ||
Then I can see "Accounts, total" component | ||
\end{lstlisting} | ||
|
||
\noindent And, what we've identified from our first tests execution is a degraded \q{frame build}-parameter | ||
(\cref{tb:frame-build}) that affects our frames per second (FPS) by generating only 37 frames instead of 60:\\ | ||
|
||
\begin{table}[h!] | ||
\begin{tabular}{ |p{6.8cm}||r|r|r| } | ||
\hline | ||
\multicolumn{4}{|c|}{Frame Build Time, in milliseconds} \\ | ||
\hline | ||
Type of state & Cold Start & Retrial & With Data\\ | ||
\hline | ||
average & 26.00 & 24.28 & 29.65 \\ | ||
90th percentile & 47.20 & 43.38 & 70.33 \\ | ||
99th percentile & 158.31 & 159.41 & 198.03 \\ | ||
\hline | ||
\end{tabular} | ||
\caption{Performance Test Results for Feature "Verify Initial Flow"} \label{tb:frame-build} | ||
\end{table} | ||
|
||
\img{features/perf-slow-frame}{Performance Monitor in Visual Studio Code}{img:perf-slow-frame} | ||
|
||
\noindent This issue (\cref{img:perf-slow-frame}) pertains to a compilation jank in animations due to shaders | ||
calculation (a code snippets executed on a graphics processing unit [GPU] to render a sequence of draw commands). | ||
Their pre-compilation strategy mitigates the compilation-related disruptions during subsequent animations, and improves | ||
frames per second rendering. To run the app with \q{--cache-sksl} turned on to capture shaders in SkSL: | ||
|
||
\begin{lstlisting}[language=bash] | ||
flutter run --profile --cache-sksl --purge-persistent-cache | ||
\end{lstlisting} | ||
|
||
\noindent Warm-up shaders in Skia Shader Language (SkSL) format for an application build: | ||
|
||
\begin{lstlisting}[language=bash] | ||
# Capture shaders in Skia Shader Language (SkSL) format into a file | ||
flutter drive --profile --cache-sksl --write-sksl-on-exit sksl.json -t test_driver/warm_up.dart | ||
# Build app with SkSL warm-up | ||
flutter build ios --bundle-sksl-path sksl.json | ||
\end{lstlisting} | ||
|
||
\begin{lstlisting} | ||
// ./test_driver/warm_up.dart | ||
import 'package:integration_test/integration_test_driver.dart'; | ||
Future<void> main() { | ||
return integrationDriver();(*@ \stopnumber @*) | ||
} | ||
|
||
// ./test_driver/warm_up_test.dart | ||
Future<void> main() async { | ||
IntegrationTestWidgetsFlutterBinding.ensureInitialized(); | ||
SharedPreferencesMixin.pref = await SharedPreferences.getInstance(); | ||
|
||
testWidgets('Warm-up', (WidgetTester tester) async { | ||
await tester.pumpWidget(MultiProvider( | ||
providers: [ | ||
ChangeNotifierProvider<AppData>( | ||
create: (_) => AppData(), | ||
), | ||
ChangeNotifierProvider<AppTheme>( | ||
create: (_) => AppTheme(ThemeMode.system), | ||
), | ||
], | ||
child: const MyApp(), | ||
)); | ||
await tester.pumpAndSettle(const Duration(seconds: 3)); | ||
}); | ||
} | ||
\end{lstlisting} | ||
|
||
\noindent Finally, we've taken \q{56 FPS (average)} as an outcome from that tunning. | ||
|
||
|
||
\paragraph{Stress Testing} | ||
Check initial load (a time before the enabled interaction) with a huge transaction log history (32Mb, 128Mb, | ||
512Mb, 2Gb). | ||
|
||
|
||
\paragraph{Endurance Testing} | ||
Check response time and resource utilization by adding different types of data within a different time | ||
periods (15 minutes, an hour, 4 hours, 16 hours). | ||
|
||
|
||
\paragraph{Spike Testing} | ||
Postponed till the enabled synchronization between different devices. | ||
|
||
|
||
\paragraph{Volume Testing} | ||
Combine reporting of "Load Testing" with data from "Stress Testing". |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
## Tips to evaluate Integration Tests | ||
|
||
``` | ||
flutter drive \ | ||
--driver=test_driver/perf_driver.dart \ | ||
--target=integration_test/{name}_test.dart \ | ||
--no-dds | ||
``` | ||
|
||
P.S. Launch Chrome Driver `chromedriver --port=4444` for Web profiling |
Oops, something went wrong.