Skip to content

Commit

Permalink
Merge pull request #155 from lyskouski/BP-130
Browse files Browse the repository at this point in the history
[#130] [BP] Benchmarking Tool. Integration tests
  • Loading branch information
lyskouski authored Aug 15, 2023
2 parents 16acdb5 + b0fb43e commit ce51a1c
Show file tree
Hide file tree
Showing 58 changed files with 1,042 additions and 289 deletions.
2 changes: 1 addition & 1 deletion docs/implementation-flow/ch04-s01-tests.tex
Original file line number Diff line number Diff line change
Expand Up @@ -498,7 +498,7 @@ \subsubsection{Writing Unit Tests with Wrappers (Code Generators)} \label{ut-cod
\end{lstlisting}


\subsubsection{Adding Behavioral Tests (Gherkin)}
\subsubsection{Adding Behavioral Tests (Gherkin)} \label{t-gherkin}

Improvement cycles are never ends. Previously (\ref{widget-tests}), we've discussed approach to test widgets and applied
\q{When ... Given ... Then ...}-notation. That notation is a part of Behavior-Driven Development (BDD) -- the process
Expand Down
14 changes: 7 additions & 7 deletions docs/implementation-flow/ch04-s05-consequences.tex
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
\subsection{Assessing of Ignorance} \label{ut-fail}

Initially we've declared importance of having tests and ecosystem for their automation but there were not written any
valuable amount of them (10\% coverage, \ref{a-badges}). That was done to show consequences of such a decision -- not
valuable amount of them (10\% coverage, \ref{a-badges}). That was done to show a consequence of such a decision -- not
to write tests. So, let's measure made mistakes during our Increment (four Iterations, two weeks each):

\begin{lstlisting}[language=bash]
Expand All @@ -12,7 +12,7 @@ \subsection{Assessing of Ignorance} \label{ut-fail}
\end{lstlisting}

\noindent \q{git log}-command retrieves a commit history (\q{\%ad} - to include the commit date, \q{--date=iso}-option
converts dates to ISO 8601 [YYY-mm-dd]; \q{\%s} - take subject) with the specified format via \q{--grep} (since we've
converts dates to ISO 8601 [YYYY-mm-dd]; \q{\%s} - take subject) with the specified format via \q{--grep} (since we've
used \q{[BF]}-prefix in a title for created bug-reports [issues] and used it as a part of the commit message). \q{awk}
extracts the date part from each line and delegate sorting to \q{sort}-command by the extracted dates. Finally,
\q{uniq -c}-command counts the occurrences of each unique date. And the same operation we'll do for the \q{fix}-keyword.
Expand Down Expand Up @@ -54,16 +54,16 @@ \subsection{Assessing of Ignorance} \label{ut-fail}

Test-Driven Development approach is known from 1999 year as Extreme Programming flow, but for unknown to me reason
not widely spread. Argumentation that "we do not have a time to write tests" is the same as "we won't use a car since
already running to reach our 200km target within a day" (instead of an hour). In Agile transformations it's a mantra
already running to reach our 200km target within a day" (instead of a few hours). In Agile transformations it's a mantra
that the usage of Scrum (communication framework) will increase development flow 10 times. By looking wider,
Agile, DevOps, Lean, and other approaches put an emphasis on the quality throughout the process. It's so since a
communication itself has a natural limitation in the achievable performance optimization. Next 10x boost can be reached
by growing exceptionally a technical excellence. Observing developers dedicating half a day to test a seemingly
"one-hour" change, I find it perplexing that the idea of investing an additional hour in crafting tests is met with
"one-hour" change, we may find it perplexing that the idea of investing an additional hour in crafting tests is met with
resistance.

As an example, by achieving the technical excellence through a semaphore approach ("red" - write test for the missed
part of a code, assert expectations; "yellow" - write code to pass tests; "green" - refactor your code) the
stabilization phase, "monthly" regression testing, even QA Department won't be needed. All acceptance criteria for
user story, feature, and even epic are transliterated into tests, and controlled by automation. That reinforces the
developer's mental model of the code, boosts confidence and increases productivity.
stabilization phase, "monthly" regression testing, even a separate QA Department won't be needed. All acceptance
criteria for user story, feature, and even epic are transliterated into tests, and controlled by automation. That
reinforces the developer's mental model of the code, boosts confidence and increases productivity.
2 changes: 0 additions & 2 deletions docs/implementation-flow/ch05-features.tex
Original file line number Diff line number Diff line change
@@ -1,6 +1,4 @@
% Copyright 2023 The terCAD team. All rights reserved.
% Use of this content is governed by a CC BY-NC-ND 4.0 license that can be found in the LICENSE file.

\markboth{Unleashing Unparalleled Features}{Unleashing Unparalleled Features}

[TBD]
269 changes: 269 additions & 0 deletions docs/implementation-flow/ch05-s01-tests.tex
Original file line number Diff line number Diff line change
@@ -0,0 +1,269 @@
% Copyright 2023 The terCAD team. All rights reserved.
% Use of this content is governed by a CC BY-NC-ND 4.0 license that can be found in the LICENSE file.

\subsection{Benchmarking Prototype}
\markboth{Unleashing Features}{Benchmarking Prototype}

Before adding functionality in the form of muscles to the created prototype skeleton, we need to verify its reliability.
Restructuring the fundamental concepts of the application in the future would not only pose a considerable challenge
but also entail a substantial effort and potential complications.


\subsubsection{Providing Integration Tests}

Unit tests (\ref{ut-unit}) and widget tests (\ref{widget-tests}) serve as valuable tools for assessing isolated classes,
functions, or widgets. However, not all of the problems can be tackled by them. Integration tests are used to identify
systemic flaws (data corruption, concurrency problems, miscommunication between services, etc.) that might not be
evident in unit tests by verifying a synergy of individual assets, validating the application as a whole.
Integration tests are designed to reflect the real-time performance of an application on an actual device or platform.
In conclusion, they provide a vital link in the testing hierarchy by validating a collocation of various components
within an application. In such a way integration tests simulate end-to-end user workflows that we've implemented and
discussed earlier -- \ref{t-gherkin}.

Integration tests in Flutter can be written by using \q{integration\_test}-package, \q{flutter\_driver}-package would
help us to evaluate our tests on real / virtual devices and environments and track the timeline of tests execution
(both packages are provided by the SDK):

\begin{lstlisting}[language=yaml]
## ./pubspec.yaml
dev_dependencies:
integration_test:
sdk: flutter
flutter_driver:
sdk: flutter
\end{lstlisting}

\noindent The implementation's deference from a widget test is in a usage of the next code line, that enables tests
execution on a physical device or platform:
\begin{lstlisting}
IntegrationTestWidgetsFlutterBinding.ensureInitialized();
\end{lstlisting}


\subsubsection{Doing Performance Testing}

Performance testing is a type of software testing designed to evaluate the speed, responsiveness, stability, and
overall performance of an application under different conditions. It involves subjecting the application to
simulated workloads and stress scenarios to assess how it behaves in terms of speed, scalability, and resource usage.
Performance testing ensures that the software can handle the expected load without degradation in performance.

By simulating different levels of user traffic, performance testing helps determine the application's scalability by
assessing resources utilization (CPU, memory, network bandwidth, and other parameters), and identify performance
bottlenecks, such as slow database queries, inefficient code, or network latency, and address these issues before
they will impact users.

The detailed information about performance testing can be taken from the International Software Testing Qualifications
Board (ISTQB) or the Software Engineering Institute (SEI), while here we'll highlight only their types definition
(\cite{Ian15}, \cite{Sag16}, \cite{Sag23}):
\begin{itemize}
\item Load Testing: Evaluates how an application performs under expected load conditions. It helps determine the
application's response time, resource utilization, and overall stability.

\item Stress Testing: Pushes the application to its limits by subjecting it to extreme conditions, such as excessive
user loads or resource scarcity. It aims to identify the breaking point and understand how the application recovers
from failures.

\item Endurance Testing: Assesses the application's performance over an extended period to identify issues related to
memory leaks, resource exhaustion, or gradual degradation in performance.

\item Spike Testing: Simulates sudden spikes in user traffic to assess how the application responds to rapid changes
in load. This helps uncover bottlenecks and issues related to sudden surges in demand.

\item Volume Testing: Focuses on testing the application's performance with large volumes of data, such as a high
number of records in a database. It helps identify scalability and performance issues associated with data volume.
\end{itemize}

\noindent Back to our process, it would be used the next command to evaluate performance tests:

\begin{lstlisting}[language=bash]
# Precondition for Web profiling
chromedriver --port=4444
# Launch tests
flutter drive \
--driver=test_driver/perf_driver.dart \
--target=test/performance/name_of_test.dart \
--profile
\end{lstlisting}

The \q{--profile}-option enables the application compilation in "profile mode" that helps the benchmark results to be
closer to what will be experienced by end users. By running on a mobile device or emulator it's proposed to use
\q{--no-dds}-parameter in addition, that will disable unaccessible Dart Development Service (DDS). The \q{--target}
declares the scope of test executions while \q{--driver}-option does track the outcomes. The driver configuration can be
taken from \href{https://docs.flutter.dev/cookbook/testing/integration/profiling}{https://docs.flutter.dev/cookbook/testing/integration/profiling}:

\begin{lstlisting}
// ./test_driver/perf_driver.dart
import 'package:flutter_driver/flutter_driver.dart' as driver;
import 'package:integration_test/integration_test_driver.dart';

Future<void> main() {
return integrationDriver(
responseDataCallback: (data) async {
if (data != null) {
final timeline = driver.Timeline.fromJson(data['timeline']);
final summary = driver.TimelineSummary.summarize(timeline);
await summary.writeTimelineToFile(
'timeline',
pretty: true,
includeSummary: true,
destinationDirectory: './coverage/',
);
}
},
);
}
\end{lstlisting}

\noindent Since it's a Widget Tests'-based approach (\ref{widget-tests}, \ref{t-gherkin}), we'll accent only on the
usage of \q{traceAction}-method to store time-based metrics:

\begin{lstlisting}
// ./test/performance/load/creation_test.dart
void main() {
final binding = IntegrationTestWidgetsFlutterBinding.ensureInitialized();
testWidgets('Cover Starting Page', (WidgetTester tester) async {
await binding.traceAction(() async {
// ... other steps
final amountField = find.byWidgetPredicate((widget) {
return widget is TextField && widget.decoration?.hintText == 'Set Balance';
});
await tester.ensureVisible(amountField);
await tester.tap(amountField);
// In profiling mode some delay is needed:
await tester.pumpAndSettle(const Duration(seconds: 1));
// await tester.pump();
await tester.enterText(amountField, '1000');
await tester.pumpAndSettle();
expect(find.text('1000'), findsOneWidget);
// ... other steps
},
reportKey: 'timeline',
);
});
}
\end{lstlisting}

\noindent Generated file \q{timeline.timeline.json} can be traced by \q{chrome://tracing/} in Google Chrome browser
(\cref{img:perf-chrome-tracing}):

\img{features/perf-chrome-tracing}{Google Chrome -- performance trace}{img:perf-chrome-tracing}

\noindent The \q{timeline.timeline\_summary.json}-file can be opened in IDE as a native \q{JSON}-file and analyzed
manually a performance of the application. For example, the value of \q{average\_frame\_build\_time\_millis}-parameter
is recommended to be below 16 milliseconds to ensure that the app runs at 60 frames per second without glitches. Other
parameters are widely described on the page --
\href{https://api.flutter.dev/flutter/flutter\_driver/TimelineSummary/summaryJson.html}{https://api.flutter.dev/flutter/flutter\_driver/TimelineSummary}.


\paragraph{Load Testing}
Check response time and resource utilization for the first run (Initial Setup) by creating account and budget
category:

\begin{lstlisting}[language=cucumber]
@start
Feature: Verify Initial Flow
Scenario: Applying basic configuration through the start pages
Given I am firstly opened the app
Then I can see "Initial Setup" component
When I tap "Save to Storage (Go Next)" button
Then I can see "Acknowledge (Go Next)" component
When I tap "Acknowledge (Go Next)" button
Then I can see "Create new Account" component
When I tap on 0 index of "ListSelector" fields
And I tap "Bank Account" element
And I enter "New Account" to "Enter Account Identifier" text field
And I enter "1000" to "Set Balance" text field
And I tap "Create new Account" button
Then I can see "Create new Budget Category" component
When I enter "New Budget" to "Enter Budget Category Name" text field
And I enter "1000" to "Set Balance" text field
When I tap "Create new Budget Category" button
Then I can see "Accounts, total" component
\end{lstlisting}

\noindent And, what we've identified from our first tests execution is a degraded \q{frame build}-parameter
(\cref{tb:frame-build}) that affects our frames per second (FPS) by generating only 37 frames instead of 60:\\

\begin{table}[h!]
\begin{tabular}{ |p{6.8cm}||r|r|r| }
\hline
\multicolumn{4}{|c|}{Frame Build Time, in milliseconds} \\
\hline
Type of state & Cold Start & Retrial & With Data\\
\hline
average & 26.00 & 24.28 & 29.65 \\
90th percentile & 47.20 & 43.38 & 70.33 \\
99th percentile & 158.31 & 159.41 & 198.03 \\
\hline
\end{tabular}
\caption{Performance Test Results for Feature "Verify Initial Flow"} \label{tb:frame-build}
\end{table}

\img{features/perf-slow-frame}{Performance Monitor in Visual Studio Code}{img:perf-slow-frame}

\noindent This issue (\cref{img:perf-slow-frame}) pertains to a compilation jank in animations due to shaders
calculation (a code snippets executed on a graphics processing unit [GPU] to render a sequence of draw commands).
Their pre-compilation strategy mitigates the compilation-related disruptions during subsequent animations, and improves
frames per second rendering. To run the app with \q{--cache-sksl} turned on to capture shaders in SkSL:

\begin{lstlisting}[language=bash]
flutter run --profile --cache-sksl --purge-persistent-cache
\end{lstlisting}

\noindent Warm-up shaders in Skia Shader Language (SkSL) format for an application build:

\begin{lstlisting}[language=bash]
# Capture shaders in Skia Shader Language (SkSL) format into a file
flutter drive --profile --cache-sksl --write-sksl-on-exit sksl.json -t test_driver/warm_up.dart
# Build app with SkSL warm-up
flutter build ios --bundle-sksl-path sksl.json
\end{lstlisting}

\begin{lstlisting}
// ./test_driver/warm_up.dart
import 'package:integration_test/integration_test_driver.dart';
Future<void> main() {
return integrationDriver();(*@ \stopnumber @*)
}

// ./test_driver/warm_up_test.dart
Future<void> main() async {
IntegrationTestWidgetsFlutterBinding.ensureInitialized();
SharedPreferencesMixin.pref = await SharedPreferences.getInstance();

testWidgets('Warm-up', (WidgetTester tester) async {
await tester.pumpWidget(MultiProvider(
providers: [
ChangeNotifierProvider<AppData>(
create: (_) => AppData(),
),
ChangeNotifierProvider<AppTheme>(
create: (_) => AppTheme(ThemeMode.system),
),
],
child: const MyApp(),
));
await tester.pumpAndSettle(const Duration(seconds: 3));
});
}
\end{lstlisting}

\noindent Finally, we've taken \q{56 FPS (average)} as an outcome from that tunning.


\paragraph{Stress Testing}
Check initial load (a time before the enabled interaction) with a huge transaction log history (32Mb, 128Mb,
512Mb, 2Gb).


\paragraph{Endurance Testing}
Check response time and resource utilization by adding different types of data within a different time
periods (15 minutes, an hour, 4 hours, 16 hours).


\paragraph{Spike Testing}
Postponed till the enabled synchronization between different devices.


\paragraph{Volume Testing}
Combine reporting of "Load Testing" with data from "Stress Testing".
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
14 changes: 8 additions & 6 deletions docs/implementation-flow/index.tex
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@
\crefname{table}{Table}{Tables}
\usepackage{multicol}
\usepackage{pgfplots}
\usepackage{tabularx}

\usepackage{_lib/customization}
\usepackage{_lib/code-style}
Expand Down Expand Up @@ -114,20 +115,21 @@ \section{[WIP] Implementing Core Functionality}
\include{./ch03-s02-subscription}

\newpage
\section{[WIP] Defining Quality Gates}
\section{Defining Quality Gates}
\input{./ch04-quality-gates}
\input{./ch04-s01-tests}
\include{./ch04-s02-automation}
\input{./ch04-s02-automation}
\input{./ch04-s03-telemetry}
\include{./ch04-s04-deployment}
\include{./ch04-s05-consequences}
\input{./ch04-s04-deployment}
\input{./ch04-s05-consequences}

\newpage
\section{[TBD] Unleashing Features}
\section{[WIP] Unleashing Features}
\input{./ch05-features}
\input{./ch05-s01-tests}

\newpage
\section{[WIP] Optimizing UI/UX Flow}
\section{[TBD] Optimizing UI/UX Flow}
\input{./ch06-s01-autofocus}

\newpage
Expand Down
10 changes: 9 additions & 1 deletion docs/implementation-flow/references.tex
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,15 @@
architecture and assessment", \emph{Packt Publishing}, ISBN 9781788299237, p. 230, May 2017

\bibitem[Suz12]{Suz12} Suzanne Robertson, James Robertson, ``Mastering the Requirements Process: Getting Requirements
Right", \emph{Addison-Wesley Professional}, ISBN 978-0321815743, August 2012
Right", \emph{Addison-Wesley Professional}, ISBN 9780321815743, August 2012

\bibitem[Ian15]{Ian15} Ian Molyneaux, ``The Art of Application Performance Testing: From Strategy to Tools",
\emph{O'Reilly Media}, ISBN 9781491900543, p. 275, January 2015

\bibitem[Sag23]{Sag23} Sagar Deshpande, Sagar Tambade, ``Performance Testing Unleashed: A Journey from Novice to Expert",
\emph{Independently published}, ISBN 9798398536317, p. 102, June 2023

\bibitem[Sag16]{Sag16} Sagar Deshpande, Ravindra Sadaphule, ``Demystifying Scalability", \emph{CreateSpace Independent
Publishing Platform}, ISBN 9781533040510, p. 62, April 2016

\end{thebibliography}
10 changes: 10 additions & 0 deletions integration_test/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
## Tips to evaluate Integration Tests

```
flutter drive \
--driver=test_driver/perf_driver.dart \
--target=integration_test/{name}_test.dart \
--no-dds
```

P.S. Launch Chrome Driver `chromedriver --port=4444` for Web profiling
Loading

0 comments on commit ce51a1c

Please sign in to comment.